Commit Graph

13789 Commits

Author SHA1 Message Date
Thomas Lively d8f58bf53a [WebAssembly] Prototype i16x8.q15mulr_sat_s
This saturating, rounding, Q-format multiplication instruction is proposed in
https://github.com/WebAssembly/simd/pull/365.

Differential Revision: https://reviews.llvm.org/D88968
2020-10-09 21:17:53 +00:00
Liu, Chen3 26cfb6e562 [X86] Passing union type through register
For example:

  union M256 {
    double d;
    __m256 m;
  };
  extern void foo1(union M256 A);
  union M256 m1;
  void test() {
    foo1(m1);
  }

clang will pass m1 through stack which does not follow the ABI.

Differential Revision: https://reviews.llvm.org/D78699
2020-10-09 11:24:29 +08:00
Alexandre Ganea 66face6aa0 Re-land [DebugInfo] Add debug location to stubs generated by CGDeclCXX and mark them as artificial
Previously, when clang was compiled with -DLLVM_ENABLE_ASSERTIONS=ON, the added tests were displaying:

inlinable function call in a function with debug info must have a !dbg location
  call void @"??1?$c@UB@@@@QEAA@XZ"(%struct.c* @"?f@?1??d@@YAPEAU?$c@UB@@@@XZ@4U2@A")
fatal error: error in backend: Broken module found, compilation aborted!
Stack dump:
0.      Program arguments: <f:\svn\buildninja\bin\clang -cc1 -emit-llvm debug-info-no-location.cpp> -gcodeview -debug-info-kind=limited
1.      <eof> parser at end of file
2.      Per-function optimization

Fixes PR43012

Differential Revision: https://reviews.llvm.org/D66328
2020-10-08 20:49:17 -04:00
Arthur Eubanks afff74e5c2 [HWAsan][NewPM] Handle hwasan like other sanitizers
Move it as an EP callback (-O[123]) or in addSanitizersAtO0.

This makes it not run in ThinLTO pre-link (like the other sanitizers),
so don't check LTO runs in hwasan-new-pm.c. Changing its position also
seems to change the generated IR. I think we just need to make sure the
pass runs.

Reviewed By: leonardchan

Differential Revision: https://reviews.llvm.org/D88936
2020-10-08 14:43:21 -07:00
Joseph Huber 3cc1f1fc1d [OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to
consolidate more OpenMP code generation into the OMPIRBuilder. Future
additions to the GPU runtime functions should now go in OMPKinds.def

Reviewers: jdoerfert

Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl

Tags: #OpenMP #LLVM #clang

Differential Revision: https://reviews.llvm.org/D88430
2020-10-08 14:00:22 -04:00
diggerlin 92bca12843 [AIX] add new option -mignore-xcoff-visibility
SUMMARY:

In IBM compiler xlclang , there is an option -fnovisibility which suppresses visibility. For more details see: https://www.ibm.com/support/knowledgecenter/SSGH3R_16.1.0/com.ibm.xlcpp161.aix.doc/compiler_ref/opt_visibility.html.

We need to add the option -mignore-xcoff-visibility for compatibility with the IBM AIX OS (as the option is enabled by default in AIX). With this option llvm does not emit any visibility attribute to ASM or XCOFF object file.

The option only work on the AIX OS, for other non-AIX OS using the option will report an unsupported options error.

In AIX OS:

1.1  the option -mignore-xcoff-visibility is enabled by default , if there is not -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command .

1.2 if there is -fvisibility=* explicitly but not -mignore-xcoff-visibility  explicitly in the clang command.  it will generate visibility attributes.

1.3 if there are  both  -fvisibility=* and  -mignore-xcoff-visibility  explicitly in the clang command. The option  "-mignore-xcoff-visibility" wins , it do not emit the visibility attribute.

The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR.

Reviewer: daltenty,Jason Liu

Differential Revision: https://reviews.llvm.org/D87451
2020-10-08 09:34:58 -04:00
Pushpinder Singh 3a12ff0dac [OpenMP][RTL] Remove dead code
RequiresDataSharing was always 0, resulting dead code in device runtime library.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D88829
2020-10-06 05:43:47 -04:00
Fangrui Song a2cc883368 [CUDA] Don't call __cudaRegisterVariable on C++17 inline variables
D17779: host-side shadow variables of external declarations of device-side
global variables have internal linkage and are referenced by
`__cuda_register_globals`.

nvcc from CUDA 11 does not allow `__device__ inline` or `__device__ constexpr`
(C++17 inline variables) but clang has incorrectly supported them for a while:

```
error: A __device__ variable cannot be marked constexpr
error: An inline __device__/__constant__/__managed__ variable must have internal linkage when the program is compiled in whole program mode (-rdc=false)
```

If such a variable (which has a comdat group) is discarded (a copy from another
translation unit is prevailing and selected), accessing the variable from
outside the section group (`__cuda_register_globals`) is a violation of the ELF
specification and will be rejected by linkers:

> A symbol table entry with STB_LOCAL binding that is defined relative to one of a group's sections, and that is contained in a symbol table section that is not part of the group, must be discarded if the group members are discarded. References to this symbol table entry from outside the group are not allowed.

As a workaround, don't register such inline variables for now.
(If we register the variables in all TUs, we will keep multiple instances of the shadow and break the C++ semantics for inline variables).
We should reject such variables in Sema but our internal users need some time to migrate.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D88786
2020-10-05 12:53:59 -07:00
Craig Topper a02b449bb1 [X86] Sync AESENC/DEC Key Locker builtins with gcc.
For the wide builtins, pass a single input and output pointer to
the builtins. Emit the GEPs and input loads from CGBuiltin.
2020-10-04 12:09:41 -07:00
Craig Topper 230c57b0bd [X86] Synchronize the encodekey builtins with gcc. Don't assume void* is 16 byte aligned.
We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.

The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.

Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.
2020-10-04 12:09:35 -07:00
Mark de Wever 1113fbf44c [CodeGen] Improve likelihood branch weights
Bruno De Fraine discovered some issues with D85091. The branch weights
generated for `logical not` and `ternary conditional` were wrong. The
`logical and` and `logical or` differed from the code generated of
`__builtin_predict`.

Adjusted the generated code for the likelihood to match
`__builtin_predict`. The patch is based on Bruno's suggestions.

Differential Revision: https://reviews.llvm.org/D88363
2020-10-04 14:24:27 +02:00
Yaxun (Sam) Liu cbd420c5ed [CUDA][HIP] Fix bound arch for offload action for fat binary
Currently CUDA/HIP toolchain uses "unknown" as bound arch
for offload action for fat binary. This causes -mcpu or -march
with "unknown" added in HIPToolChain::TranslateArgs or
CUDAToolChain::TranslateArgs.

This causes issue for https://reviews.llvm.org/D88377 since
HIP toolchain needs to check -mcpu in HIPToolChain::TranslateArgs.

The bound arch of offload action for fat binary is not really
used, therefore set it to CudaArch::UNUSED.

Differential Revision: https://reviews.llvm.org/D88524
2020-10-02 19:05:51 -04:00
Richard Smith 8fb2a235b0 Don't reject calls to MinGW's unusual _setjmp declaration.
We now recognize this function as a builtin despite it having an
unexpected number of parameters; make sure we don't enforce that it has
only 1 argument for its 2 parameters.
2020-10-02 15:12:15 -07:00
Yaxun (Sam) Liu dc6a0b0ec7 [HIP] Align device binary
To facilitate faster loading of device binaries and share them among processes,
HIP runtime favors their alignment being 4096 bytes. HIP runtime can load
unaligned device binaries, however, aligning them at 4096 bytes results in
faster loading and less shared memory usage.

This patch adds an option -bundle-align to clang-offload-bundler which allows
bundles to be aligned at specified alignment. By default it is 1, which is NFC
compared to existing format.

This patch then aligns embedded fat binary and device binary inside fat binary
at 4096 bytes.

It has been verified this change does not cause significant overall file size increase
for typical HIP applications (less than 1%).

Differential Revision: https://reviews.llvm.org/D88734
2020-10-02 18:10:44 -04:00
Nathan Lanza 14f6bfcb52 [clang] Implement objc_non_runtime_protocol to remove protocol metadata
Summary:
Motivated by the new objc_direct attribute, this change adds a new
attribute that remotes metadata from Protocols that the programmer knows
isn't going to be used at runtime. We simply have the frontend skip
generating any protocol metadata entries (e.g. OBJC_CLASS_NAME,
_OBJC_$_PROTOCOL_INSTANCE_METHDOS, _OBJC_PROTOCOL, etc) for a protocol
marked with `__attribute__((objc_non_runtime_protocol))`.

There are a few APIs used to retrieve a protocol at runtime.
`@protocol(SomeProtocol)` will now error out of the requested protocol
is marked with attribute. `objc_getProtocol` will return `NULL` which
is consistent with the behavior of a non-existing protocol.

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75574
2020-10-02 17:35:50 -04:00
Michael Liao 8c36eaf037 [clang][opencl][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr.
- `-cl-fp32-correctly-rounded-divide-sqrt` is already handled in a
  per-instruction manner by annotating the accuracy required. There's no
  need to add that fn-attr. So far, there's no in-tree backend handling
  that attr and that OpenCL specific option.
- In case that out-of-tree backends are broken, this change could be
  reverted if those backends could not be fixed.

Differential Revision: https://reviews.llvm.org/D88424
2020-10-01 11:07:39 -04:00
Arthur Eubanks ce5379f0f0 [NPM] Add target specific hook to add passes for New Pass Manager
The patch adds a new TargetMachine member "registerPassBuilderCallbacks" for targets to add passes to the pass pipeline using the New Pass Manager (similar to adjustPassManager for the Legacy Pass Manager).

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D88138
2020-09-30 13:29:43 -07:00
Joseph Huber 1b60f63e4f Revert "[OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def"
Failing tests on Arm due to the tests automatically populating
incomatible pointer width architectures. Reverting until the tests are
updated. Failing tests:

OpenMP/distribute_parallel_for_num_threads_codegen.cpp
OpenMP/distribute_parallel_for_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_simd_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp

This reverts commit 90eaedda9b.
2020-09-30 15:12:21 -04:00
Joseph Huber 90eaedda9b [OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to consolidate
more OpenMP code generation into the OMPIRBuilder. This patch also
invalidates specifying target architectures with conflicting pointer
sizes.

Reviewers: jdoerfert

Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl

Tags: #OpenMP #Clang #LLVM

Differential Revision: https://reviews.llvm.org/D88430
2020-09-30 14:00:01 -04:00
Xiangling Liao 3a7487f903 [FE] Use preferred alignment instead of ABI alignment for complete object when applicable
On some targets, preferred alignment is larger than ABI alignment in some cases. For example,
on AIX we have special power alignment rules which would cause that. Previously, to support
those cases, we added a “PreferredAlignment” field in the `RecordLayout` to store the AIX
special alignment values in “PreferredAlignment” as the community suggested.

However, that patch alone is not enough. There are places in the Clang where `PreferredAlignment`
should have been used instead of ABI-specified alignment. This patch is aimed at fixing those
spots.

Differential Revision: https://reviews.llvm.org/D86790
2020-09-30 10:48:28 -04:00
Xiang1 Zhang 413577a879 [X86] Support Intel Key Locker
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88398
2020-09-30 18:08:45 +08:00
Amy Huang 5c4fc581d5 [DebugInfo] Add types from constructor homing to the retained types list.
Add class types to the retained types list to make sure they
don't get dropped if the constructor is optimized out later.

Differential Revision: https://reviews.llvm.org/D88522
2020-09-29 17:00:45 -07:00
John McCall 984744a131 Fix a variety of minor issues with ObjC method mangling:
- Fix a memory leak accidentally introduced yesterday by using CodeGen's
  existing mangling context instead of creating a new context afresh.

- Move GNU-runtime ObjC method mangling into the AST mangler; this will
  eventually be necessary to support direct methods there, but is also
  just the right architecture.

- Make the Apple-runtime method mangling work properly when given an
  interface declaration, fixing a bug (which had solidified into a test)
  where mangling a category method from the interface could cause it to
  be mangled as if the category name was a class name.  (Category names
  are namespaced within their class and have no global meaning.)

- Fix a code cross-reference in dsymutil.

Based on a patch by Ellis Hoag.
2020-09-29 19:51:53 -04:00
Fangrui Song 3681be876f Add -fprofile-update={atomic,prefer-atomic,single}
GCC 7 introduced -fprofile-update={atomic,prefer-atomic} (prefer-atomic is for
best efforts (some targets do not support atomics)) to increment counters
atomically, which is exactly what we have done with -fprofile-instr-generate
(D50867) and -fprofile-arcs (b5ef137c11).
This patch adds the option to clang to surface the internal options at driver level.

GCC 7 also turned on -fprofile-update=prefer-atomic when -pthread is specified,
but it has performance regression
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307). So we don't follow suit.

Differential Revision: https://reviews.llvm.org/D87737
2020-09-29 10:43:23 -07:00
Tres Popp eb9f7c28e5 Revert "OpaquePtr: Add type to sret attribute"
This reverts commit 55c4ff91bd.

Issues were introduced as discussed in https://reviews.llvm.org/D88241
where this change made previous bugs in the linker and BitCodeWriter
visible.
2020-09-29 10:31:04 +02:00
Ellis Hoag 98ef7e29b0 This reduces code duplication between CGObjCMac.cpp and Mangle.cpp
for generating the mangled name of an Objective-C method.

This has no intended functionality change.

https://reviews.llvm.org/D88329
2020-09-29 02:26:51 -04:00
Yaxun (Sam) Liu 187658b8a6 Recommit "[HIP] Change default --gpu-max-threads-per-block value to 1024"
Recommit 04abbb3a78
2020-09-28 22:43:17 -04:00
Craig Topper 288c5776c9 [X86] Use inlineasm flag output for the _bittest* intrinsics.
Instead of expliciting emitting a setc in the inline asm instructions,
we can use flag output. This allows the backend to use the flag
directly if it is needed by a branch. Previously we needed a test
instruction to convert the register back to a flag.

If the flag can't be used directly, the backend will emit a setcc.

Differential Revision: https://reviews.llvm.org/D87888
2020-09-28 13:33:22 -07:00
Vedant Kumar 06bc685fa2 [ubsan] nullability-arg: Fix crash on C++ member pointers
Extend -fsanitize=nullability-arg to handle call sites which accept C++
member pointers.

rdar://62476022

Differential Revision: https://reviews.llvm.org/D88336
2020-09-28 09:41:18 -07:00
Michael Liao 5dbf80cad9 [clang][codegen] Annotate `correctly-rounded-divide-sqrt-fp-math` fn-attr for OpenCL only.
- `-cl-fp32-correctly-rounded-divide-sqrt` is an OpenCL-specific option
  and `correctly-rounded-divide-sqrt-fp-math` should be added for OpenCL
  at most.

Differential revision: https://reviews.llvm.org/D88303
2020-09-28 11:40:32 -04:00
David Sherwood bafdd11326 [SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy
After some recent upstream discussion we decided that it was best
to avoid having the / operator for both ElementCount and TypeSize,
since this could give the impression that these classes can be used
in the same way as basic integer integer types. However, division
for scalable types is a bit odd because we are only dividing the
minimum quantity by a value, as opposed to something like:

  (MinSize * Vscale) / SomeValue

This is why when performing division it's important the caller
first establishes whether the operation makes sense, perhaps by
calling isKnownMultipleOf() prior to division. The caller must now
explictly call divideCoefficientBy() on the class to perform the
operation.

Differential Revision: https://reviews.llvm.org/D87700
2020-09-28 08:03:00 +01:00
Richard Smith 9dcd96f728 Canonicalize declaration pointers when forming APValues.
References to different declarations of the same entity aren't different
values, so shouldn't have different representations.

Recommit of e6393ee813 with fixed handling
for weak declarations. We now look for attributes on the most recent
declaration when determining whether a declaration is weak. (Second
recommit with further fixes for mishandling of weak declarations. Our
behavior here is fundamentally unsound -- see PR47663 -- but this
approach attempts to not make things worse.)
2020-09-27 19:05:26 -07:00
Shilei Tian ebb1092a28 [Clang][OpenMP] Added support for nowait target in CodeGen via regular task
Previously for nowait target, CG emitted a function call to `__tgt_target_nowait`, etc. However, in OpenMP RTL, these functions just directly call the no-nowait version, which means nowait is not working as expected.

OpenMP specification says a target is acutally a target task, which is an untied and detachable task. It is natural to go to the direction that generates a task for a nowait target. However, OpenMP task has a problem that it must be within to a parallel region; otherwise the task will be executed immediately. As a result, if we directly wrap to a regular task, the `target nowait` outside of a parallel region is still a synchronous version.

In D77609, I added the support for unshackled task in OpenMP RTL. Basically, unshackled task is a task that is not bound to any parallel region. So all nowait target will be tranformed into an unshackled task. In order to distinguish from regular task, a new flag bit is set for unshackled task. This flag will be used by RTL for later process.

Since all target tasks are allocated via `__kmpc_omp_target_task_alloc`, and in current `libomptarget`, `__kmpc_omp_target_task_alloc` just calls `__kmpc_omp_task_alloc`. Therefore, we can modify the flag in `__kmpc_omp_target_task_alloc` so that we don't need to modify the FE too much. If users choose to opt out the feature, they just need to use a RTL w/o support of unshackled threads.

As a result, in this patch, the `target nowait` region is simply wrapped into a regular task. Later once we have RTL support for unshackled tasks, the wrapped tasks can be executed by unshackled threads w/o changes in the FE.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D78075
2020-09-25 22:10:36 -04:00
Matt Arsenault 55c4ff91bd OpaquePtr: Add type to sret attribute
Make the corresponding change that was made for byval in
b7141207a4. Like byval, this requires a
bulk update of the test IR tests to include the type before this can
be mandatory.
2020-09-25 14:07:30 -04:00
Chris Bowler f330d9f163 [PPC] [AIX] Implement calling convention IR for C99 complex types on AIX
Add AIX calling convention logic to Clang for C99 complex types on AIX

Differential Revision: https://reviews.llvm.org/D88130
2020-09-25 07:43:31 -04:00
Momchil Velikov a88c722e68 [AArch64] PAC/BTI code generation for LLVM generated functions
PAC/BTI-related codegen in the AArch64 backend is controlled by a set
of LLVM IR function attributes, added to the function by Clang, based
on command-line options and GCC-style function attributes. However,
functions, generated in the LLVM middle end (for example,
asan.module.ctor or __llvm_gcov_write_out) do not get any attributes
and the backend incorrectly does not do any PAC/BTI code generation.

This patch record the default state of PAC/BTI codegen in a set of
LLVM IR module-level attributes, based on command-line options:

* "sign-return-address", with non-zero value means generate code to
  sign return addresses (PAC-RET), zero value means disable PAC-RET.

* "sign-return-address-all", with non-zero value means enable PAC-RET
  for all functions, zero value means enable PAC-RET only for
  functions, which spill LR.

* "sign-return-address-with-bkey", with non-zero value means use B-key
  for signing, zero value mean use A-key.

This set of attributes are always added for AArch64 targets (as
opposed, for example, to interpreting a missing attribute as having a
value 0) in order to be able to check for conflicts when combining
module attributed during LTO.

Module-level attributes are overridden by function level attributes.
All the decision making about whether to not to generate PAC and/or
BTI code is factored out into AArch64FunctionInfo, there shouldn't be
any places left, other than AArch64FunctionInfo, which directly
examine PAC/BTI attributes, except AArch64AsmPrinter.cpp, which
is/will-be handled by a separate patch.

Differential Revision: https://reviews.llvm.org/D85649
2020-09-25 11:47:14 +01:00
Ian Levesque 6f7fbdd285 [xray] Function coverage groups
Add the ability to selectively instrument a subset of functions by dividing the functions into N logical groups and then selecting a group to cover. By selecting different groups over time you could cover the entire application incrementally with lower overhead than instrumenting the entire application at once.

Differential Revision: https://reviews.llvm.org/D87953
2020-09-24 22:09:53 -04:00
Reid Kleckner ecfc9b9712 [MS] For unknown ISAs, pass non-trivially copyable arguments indirectly
Passing them directly is likely to be non-conforming, since it usually
involves copying the bytes of the record. For unknown architectures, we
don't know what MSVC does or will do, but we should at least try to
conform as well as we can.
2020-09-24 16:29:48 -07:00
Reid Kleckner b8a50e9207 [MS] Simplify rules for passing C++ records
Regardless of the target architecture, we should always use the C rules
(RAA_Default) for records that "canBePassedInRegisters". Those are
trivially copyable things, and things marked with [[trivial_abi]].

This should be NFC, although it changes where the final decision about
x86_32 overaligned records is made. The current x86_32 C rules say that
overaligned things are passed indirectly, so there is no functional
difference.
2020-09-24 16:29:47 -07:00
Amy Huang c8df781e54 [DebugInfo] Fix bug in constructor homing with classes with trivial
constructors.

This changes the code to avoid using constructor homing for aggregate
classes and classes with trivial default constructors, instead of trying
to loop through the constructors.

Differential Revision: https://reviews.llvm.org/D87808
2020-09-24 14:43:48 -07:00
Erich Keane f8a92adfa2 Remove dead branch identified by @rsmith on post-commit for D88236 2020-09-24 13:05:15 -07:00
Erich Keane 606a734755 [PR47636] Fix tryEmitPrivate to handle non-constantarraytypes
As mentioned in the bug report, tryEmitPrivate chokes on the
MaterializeTemporaryExpr in the reproducers, since it assumes that if
there are elements, than it must be a ConstantArrayType. However, the
MaterializeTemporaryExpr (which matches exactly the AST when it is NOT a
global/static) has an incomplete array type.

This changes the section where the number-of-elements is non-zero to
properly handle non-CAT types by just extracting it as an array type
(since all we needed was the element type out of it).
2020-09-24 12:09:22 -07:00
Amy Kwan 6b136b19cb [Power10] Implement custom codegen for the vec_replace_elt and vec_replace_unaligned builtins.
This patch implements custom codegen for the vec_replace_elt and
vec_replace_unaligned builtins.

These builtins map to the @llvm.ppc.altivec.vinsw and @llvm.ppc.altivec.vinsd
intrinsics depending on the arguments. The main motivation for doing custom
codegen for these intrinsics is because there are float and double versions of
the builtin. Normally, the converting the float to an integer would be done via
fptoui in the IR. This is incorrect as fptoui truncates the value and we must
ensure the value is not truncated. Therefore, we provide custom codegen to utilize
bitcast instead as bitcasts do not truncate.

Differential Revision: https://reviews.llvm.org/D83500
2020-09-23 22:55:25 -05:00
Craig Topper d9717d8ee7 [X86] Add a memory clobber to the bittest intrinsic inline asm. Get default clobbers from the target
I believe the inline asm emitted here should have a memory clobber since it writes to memory.

It was also missing the dirflag clobber that we use by default along with flags and fpsr. To avoid missing defaults in the future, get the default list from the target

Differential Revision: https://reviews.llvm.org/D88121
2020-09-23 14:54:39 -07:00
Amy Kwan 2e7117f847 [PowerPC] Implement the 128-bit vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins in Clang/LLVM
This patch implements the vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins for vector signed/unsigned __int128.

Differential Revision: https://reviews.llvm.org/D87910
2020-09-23 16:49:40 -04:00
Stanislav Mekhanoshin 59691dc874 [AMDGPU] Make ds fp atomics overloadable
Differential Revision: https://reviews.llvm.org/D87947
2020-09-23 11:39:50 -07:00
Sriraman Tallam 7d0bbe4090 Re-apply https://reviews.llvm.org/D87921, was reverted to triage a PPC bot failure.
D87921 was reverted in commit b89059a313
as it was causing an unknown llvm PPC bot failure.  Reapplying the patch
after confirming that this is not responsible. Build bot failure:
https://reviews.llvm.org/D87921#2286644  which caused the revert.

The wrong placement of add pass with optimizations led to
-funique-internal-linkage-names being disabled.

Fixed the placement of the MPM.addpass for UniqueInternalLinkageNames to make it
work correctly with -O2 and new pass manager. Updated the tests to explicitly
check O0 and O1.

Differential Revision: https://reviews.llvm.org/D87921
2020-09-23 10:28:40 -07:00
Yaxun (Sam) Liu 301e23305d [CUDA][HIP] Fix static device var used by host code only
A static device variable may be accessed in host code through
cudaMemCpyFromSymbol etc. Currently clang does not
emit the static device variable if it is only referenced by
host code, which causes host code to fail at run time.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D88115
2020-09-23 08:18:19 -04:00
Mircea Trofin cf112382dd [ThinLTO] Option to bypass function importing.
This completes the circle, complementing -lto-embed-bitcode
(specifically, post-merge-pre-opt). Using -thinlto-assume-merged skips
function importing. The index file is still needed for the other data it
contains.

Differential Revision: https://reviews.llvm.org/D87949
2020-09-22 13:12:11 -07:00
Sriraman Tallam b89059a313 Revert "The wrong placement of add pass with optimizations led to -funique-internal-linkage-names being disabled."
This reverts commit 6950db36d3.
2020-09-22 12:32:43 -07:00