Commit Graph

13660 Commits

Author SHA1 Message Date
Michael Liao c7b683c126 [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions.
- Skip generating profile data on `__global__` function in the host
  compilation. It's a host-side stub function only and don't have
  profile instrumentation generated on the real function body. The extra
  profile data results in the malformed instrumentation profile data.
- Skip generating region mapping on functions in the wrong-side, i.e.,
  + For the device compilation, skip host-only functions; and,
  + For the host compilation, skip device-only functions (including
    `__global__` functions.)
- As the device-side profiling is not ready yet, only host-side profile
  code generation is checked.

Differential Revision: https://reviews.llvm.org/D85276
2020-08-10 11:01:46 -04:00
Xiangling Liao 6ef801aa6b [AIX] Static init frontend recovery and backend support
On the frontend side, this patch recovers AIX static init implementation to
use the linkage type and function names Clang chooses for sinit related function.

On the backend side, this patch sets correct linkage and function names on aliases
created for sinit/sterm functions.

Differential Revision: https://reviews.llvm.org/D84534
2020-08-10 10:10:49 -04:00
Nick Desaulniers abb9bf4bcf Revert "[Clang] implement -fno-eliminate-unused-debug-types"
This reverts commit e486921fd6.

Breaks windows builds and osx builds.
2020-08-07 16:11:41 -07:00
Nick Desaulniers e486921fd6 [Clang] implement -fno-eliminate-unused-debug-types
Fixes pr/11710.
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D80242
2020-08-07 14:13:48 -07:00
Alexey Bataev 4a7aedb843 [OPENMP]Simplify representation for atomic, critical, master and section
constrcut.

Several constructs may be represented wityout relying on CapturedStmt.
It saves memory and improves compilation speed.
2020-08-07 09:58:23 -04:00
Matt Arsenault 30eeb742f1 clang: Use byref for aggregate kernel arguments
Add address space to indirect abi info and use it for kernels.

Previously, indirect arguments assumed assumed a stack passed object
in the alloca address space using byval. A stack pointer is unsuitable
for kernel arguments, which are passed in a separate, constant buffer
with a different address space.

Start using the new byref for aggregate kernel arguments. Previously
these were emitted as raw struct arguments, and turned into loads in
the backend. These will lower identically, although with byref you now
have the option of applying an explicit alignment. In the future, a
reasonable implementation would use byref for all kernel arguments
(this would be a practical problem at the moment due to losing things
like noalias on pointer arguments).

This is mostly to avoid fighting the optimizer's treatment of
aggregate load/store. SROA and instcombine both turn aggregate loads
and stores into a long sequence of element loads and stores, rather
than the optimizable memcpy I would expect in this situation. Now an
explicit memcpy will be introduced up-front which is better understood
and helps eliminate the alloca in more situations.

This skips using byref in the case where HIP kernel pointer arguments
in structs are promoted to global pointers. At minimum an additional
patch is needed to allow coercion with indirect arguments. This also
skips using it for OpenCL due to the current workaround used to
support kernels calling kernels. Distinct function bodies would need
to be generated up front instead of emitting an illegal call.
2020-08-06 15:52:26 -04:00
Alexey Bataev 0af7835eae [OPENMP]Redesign of OMPExecutableDirective/OMPDeclarativeDirective representation.
Summary:
Introduced OMPChildren class to handle all associated clauses, statement
and child expressions/statements. It allows to represent some directives
more correctly (like flush, depobj etc. with pseudo clauses, ordered
depend directives, which are standalone, and target data directives).
Also, it will make easier to avoid using of CapturedStmt in directives,
if required (atomic, tile etc. directives).
Also, it simplifies serialization/deserialization of the
executable/declarative directives.
Reduces number of allocation operations for mapper declarations.

Reviewers: jdoerfert

Subscribers: yaxunl, guansong, jfb, cfe-commits, sstefan1, aaron.ballman, caomhin

Tags: #clang

Differential Revision: https://reviews.llvm.org/D83261
2020-08-06 12:25:19 -04:00
Anatoly Trosinenko 5a07490d76 [ABI][NFC] Fix the confusion of ByVal and ByRef argument names
The second argument of getNaturalAlignIndirect() was `bool ByRef`, but
the implementation was just delegating to getIndirect() with `ByRef`
passed unchanged to `bool ByVal` parameter of getIndirect().

Fix a couple of /*ByRef=*/ comments as well.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D85113
2020-08-06 15:20:18 +03:00
Stanislav Mekhanoshin 105608a4c2 [AMDGPU] Added missing gfx1031 cases to CGOpenMPRuntimeGPU.cpp 2020-08-05 12:39:03 -07:00
Erich Keane 2143a90b34 Fix _ExtInt(1) to be a i1 in memory.
The _ExtInt(1) in getTypeForMem was hitting the bool logic for expanding
to an 8 bit value.  The result was an assert, or store i1 %0, i8* %2, align 1
since the parameter IS an i1.  This patch changes the 'forMem' test to
exclude ext-int from the bool test.
2020-08-05 10:54:51 -07:00
Joel E. Denny 002d61db2b [OpenMP] Fix `present` for exit from `omp target data`
Without this patch, the following example fails but shouldn't
according to OpenMP TR8:

```
 #pragma omp target enter data map(alloc:i)
 #pragma omp target data map(present, alloc: i)
 {
   #pragma omp target exit data map(delete:i)
 } // fails presence check here
```

OpenMP TR8 sec. 2.22.7.1 "map Clause", p. 321, L23-26 states:

> If the map clause appears on a target, target data, target enter
> data or target exit data construct with a present map-type-modifier
> then on entry to the region if the corresponding list item does not
> appear in the device data environment an error occurs and the
> program terminates.

There is no corresponding statement about the exit from a region.
Thus, the `present` modifier should:

1. Check for presence upon entry into any region, including a `target
   exit data` region.  This behavior is already implemented correctly.

2. Should not check for presence upon exit from any region, including
   a `target` or `target data` region.  Without this patch, this
   behavior is not implemented correctly, breaking the above example.

In the case of `target data`, this patch fixes the latter behavior by
removing the `present` modifier from the map types Clang generates for
the runtime call at the end of the region.

In the case of `target`, we have not found a valid OpenMP program for
which such a fix would matter.  It appears that, if a program can
guarantee that data is present at the beginning of a `target` region
so that there's no error there, that data is also guaranteed to be
present at the end.  This patch adds a comment to the runtime to
document this case.

Reviewed By: grokos, RaviNarayanaswamy, ABataev

Differential Revision: https://reviews.llvm.org/D84422
2020-08-05 10:03:31 -04:00
Yonghong Song 00602ee7ef BPF: simplify IR generation for __builtin_btf_type_id()
This patch simplified IR generation for __builtin_btf_type_id().
For __builtin_btf_type_id(obj, flag), previously IR builtin
looks like
   if (obj is a lvalue)
     llvm.bpf.btf.type.id(obj.ptr, 1, flag)  !type
   else
     llvm.bpf.btf.type.id(obj, 0, flag)  !type
The purpose of the 2nd argument is to differentiate
   __builtin_btf_type_id(obj, flag) where obj is a lvalue
vs.
   __builtin_btf_type_id(obj.ptr, flag)

Note that obj or obj.ptr is never used by the backend
and the `obj` argument is only used to derive the type.
This code sequence is subject to potential llvm CSE when
  - obj is the same .e.g., nullptr
  - flag is the same
  - metadata type is different, e.g., typedef of struct "s"
    and strust "s".
In the above, we don't want CSE since their metadata is different.

This patch change IR builtin to
   llvm.bpf.btf.type.id(seq_num, flag)  !type
and seq_num is always increasing. This will prevent potential
llvm CSE.

Also report an error if the type name is empty for
remote relocation since remote relocation needs non-empty
type name to do relocation against vmlinux.

Differential Revision: https://reviews.llvm.org/D85174
2020-08-04 16:29:42 -07:00
Thorsten Schuett e18c6ef6b4 [clang] improve diagnostics for misaligned and large atomics
"Listing the alignment and access size (== expected alignment) in the warning
seems like a good idea."

solves PR 46947

  struct Foo {
    struct Bar {
      void * a;
      void * b;
    };
    Bar bar;
  };

  struct ThirtyTwo {
    struct Large {
      void * a;
      void * b;
      void * c;
      void * d;
    };
    Large bar;
  };

  void braz(Foo *foo, ThirtyTwo *braz) {
    Foo::Bar bar;
    __atomic_load(&foo->bar, &bar, __ATOMIC_RELAXED);

    ThirtyTwo::Large foobar;
    __atomic_load(&braz->bar, &foobar, __ATOMIC_RELAXED);
  }

repro.cpp:21:3: warning: misaligned atomic operation may incur significant performance penalty; the expected (16 bytes) exceeds the actual alignment (8 bytes) [-Watomic-alignment]
  __atomic_load(&foo->bar, &bar, __ATOMIC_RELAXED);
  ^
repro.cpp:24:3: warning: misaligned atomic operation may incur significant performance penalty; the expected (32 bytes) exceeds the actual alignment (8 bytes) [-Watomic-alignment]
  __atomic_load(&braz->bar, &foobar, __ATOMIC_RELAXED);
  ^
repro.cpp:24:3: warning: large atomic operation may incur significant performance penalty; the access size (32 bytes) exceeds the max lock-free size (16  bytes) [-Watomic-alignment]
3 warnings generated.

Differential Revision: https://reviews.llvm.org/D85102
2020-08-04 11:10:29 -07:00
Yonghong Song 6d67506964 [clang][BPF] support type exist/size and enum exist/value relocations
This patch added the following additional compile-once
run-everywhere (CO-RE) relocations:
  - existence/size of typedef, struct/union or enum type
  - enum value and enum value existence

These additional relocations will make CO-RE bpf programs more
adaptive for potential kernel internal data structure changes.

For existence/size relocations, the following two code patterns
are supported:
  1. uint32_t __builtin_preserve_type_info(*(<type> *)0, flag);
  2. <type> var;
     uint32_t __builtin_preserve_field_info(var, flag);
flag = 0 for existence relocation and flag = 1 for size relocation.

For enum value existence and enum value relocations, the following code
pattern is supported:
  uint64_t __builtin_preserve_enum_value(*(<enum_type> *)<enum_value>,
                                         flag);
flag = 0 means existence relocation and flag = 1 for enum value.
relocation. In the above <enum_type> can be an enum type or
a typedef to enum type. The <enum_value> needs to be an enumerator
value from the same enum type. The return type is uint64_t to
permit potential 64bit enumerator values.

Differential Revision: https://reviews.llvm.org/D83242
2020-08-04 08:39:53 -07:00
Kazushi (Jam) Marukawa 045e79e77c [VE] Extend integer arguments and return values smaller than 64 bits
In order to follow NEC Aurora SX VE ABI correctly, change to sign/zero
extend integer arguments and return values smaller than 64 bits in clang.
Also update regression test.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D85071
2020-08-04 08:07:05 +09:00
Thomas Lively cb32792210 [WebAssembly] Implement prototype v128.load{32,64}_zero instructions
Specified in https://github.com/WebAssembly/simd/pull/237, these
instructions load the first vector lane from memory and zero the other
lanes. Since these instructions are not officially part of the SIMD
proposal, they are only available on an opt-in basis via LLVM
intrinsics and clang builtin functions. If these instructions are
merged to the proposal, this implementation will change so that the
instructions will be generated from normal IR. At that point the
intrinsics and builtin functions would be removed.

This PR also changes the opcodes for the experimental f32x4.qfm{a,s}
instructions because their opcodes conflicted with those of the
v128.load{32,64}_zero instructions. The new opcodes were chosen to
match those used in V8.

Differential Revision: https://reviews.llvm.org/D84820
2020-08-03 13:54:00 -07:00
Akira Hatanaka 41b1e97b12 [CodeGen][ObjC] Mark calls to objc_unsafeClaimAutoreleasedReturnValue as
notail on x86-64

This is needed because the epilogue code inserted before tail calls on
x86-64 breaks the handshake between the caller and callee.

Calls to objc_retainAutoreleasedReturnValue used to have the same
problem, which was fixed in https://reviews.llvm.org/D59656.

rdar://problem/66029552

Differential Revision: https://reviews.llvm.org/D84540
2020-08-03 13:25:25 -07:00
Saiyedul Islam 160ff83765 [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 3
Provides AMDGCN and NVPTX specific specialization of getGPUWarpSize,
getGPUThreadID, and getGPUNumThreads methods. Adds tests for AMDGCN
codegen for these methods in generic and simd modes. Also changes the
precondition in InitTempAlloca to be slightly more permissive. Useful for
AMDGCN OpenMP codegen where allocas are created with a cast to an
address space.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D84260
2020-08-03 05:38:39 +00:00
Eli Friedman 8dfb5d767e [clang codegen][AArch64] Use llvm.aarch64.neon.fcvtzs/u where it's necessary
fptosi/fptoui have similar, but not identical, semantics.  In
particular, the behavior on overflow is different.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46844 for 64-bit.  (The
corresponding patch for 32-bit is more involved because the equivalent
intrinsics don't exist, as far as I can tell.)

Differential Revision: https://reviews.llvm.org/D84703
2020-07-30 15:41:54 -07:00
Richard Smith 1e7f026c3b PR46908: Emit undef destroying_delete_t as an aggregate RValue.
We previously used a non-aggregate RValue to represent the passed value,
which violated the assumptions of call arg lowering in some cases, in
particular on 32-bit Windows, where we'd end up producing an FCA store
with TBAA metadata, that the IR verifier would reject.
2020-07-30 14:50:01 -07:00
Johannes Doerfert ebad64dfe1 [OpenMP][FIX] Consistently use OpenMPIRBuilder if requested
When we use the OpenMPIRBuilder for the parallel region we need to also
use it to get the thread ID (among other things) in the body. This is
because CGOpenMPRuntime::getThreadID() and
CGOpenMPRuntime::emitUpdateLocation implicitly assumes that if they are
called from within a parallel region there is a certain structure to the
code and certain members of the OMPRegionInfo are initialized. It might
make sense to initialize them even if we use the OpenMPIRBuilder but we
would preferably get rid of such state instead.

Bug reported by Anchu Rajendran Sudhakumari.

Depends on D82470.

Reviewed By: anchu-rajendran

Differential Revision: https://reviews.llvm.org/D82822
2020-07-30 10:19:40 -05:00
Johannes Doerfert 19756ef53a [OpenMP][IRBuilder] Support allocas in nested parallel regions
We need to keep track of the alloca insertion point (which we already
communicate via the callback to the user) as we place allocas as well.

Reviewed By: fghanim, SouraVX

Differential Revision: https://reviews.llvm.org/D82470
2020-07-30 10:19:39 -05:00
Alexey Bataev 622e46156d [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region.
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84767
2020-07-30 11:18:33 -04:00
Alexey Bataev b69357c2f4 Revert "[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region."
This reverts commit 142d0d3ed8 to
investigate undefined behavior revealed by buildbots.
2020-07-30 10:57:56 -04:00
Alexey Bataev 142d0d3ed8 [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region.
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.
It applies only for global pointers.

Differential Revision: https://reviews.llvm.org/D84767
2020-07-30 09:40:05 -04:00
Amy Huang f71deb43ab [DebugInfo] Fix to ctor homing to ignore classes with trivial ctors.
Previously ctor homing was omitting debug info for classes if they
have both trival and nontrivial constructors, but we should only omit debug
info if the class doesn't have any trivial constructors.

retained types list.

bug: https://bugs.llvm.org/show_bug.cgi?id=46537

Differential Revision: https://reviews.llvm.org/D84870
2020-07-29 19:55:20 -07:00
Arthur Eubanks 71d0a2b8a3 [DFSan][NewPM] Port DataFlowSanitizer to NewPM
Reviewed By: ychen, morehouse

Differential Revision: https://reviews.llvm.org/D84707
2020-07-29 10:19:15 -07:00
Joel E. Denny 9f2f3b9de6 [OpenMP] Implement TR8 `present` motion modifier in Clang (1/2)
This patch implements Clang front end support for the OpenMP TR8
`present` motion modifier for `omp target update` directives.  The
next patch in this series implements OpenMP runtime support.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D84711
2020-07-29 12:18:45 -04:00
Alexey Bader 8d27be8dba [OpenCL] Add global_device and global_host address spaces
This patch introduces 2 new address spaces in OpenCL: global_device and global_host
which are a subset of a global address space, so the address space scheme will be
looking like:

```
generic->global->host
                          ->device
             ->private
             ->local
constant
```

Justification: USM allocations may be associated with both host and device memory. We
want to give users a way to tell the compiler the allocation type of a USM pointer for
optimization purposes. (Link to the Unified Shared Memory extension:
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_unified_shared_memory.asciidoc)

Before this patch USM pointer could be only in opencl_global
address space, hence a device backend can't tell if a particular pointer
points to host or device memory. On FPGAs at least we can generate more
efficient hardware code if the user tells us where the pointer can point -
being able to distinguish between these types of pointers at compile time
allows us to instantiate simpler load-store units to perform memory
transactions.

Patch by Dmitry Sidorov.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D82174
2020-07-29 17:24:53 +03:00
Thomas Lively 11bb7eef41 [WebAssembly] Remove intrinsics for SIMD widening ops
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.

Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.

Differential Revision: https://reviews.llvm.org/D84556
2020-07-28 18:25:55 -07:00
Joel E. Denny 69fc33f0cd Revert "[OpenMP] Implement TR8 `present` motion modifier in Clang (1/2)"
This reverts commit 3c3faae497.

It breaks a number of bots.
2020-07-28 20:30:05 -04:00
Joel E. Denny 3c3faae497 [OpenMP] Implement TR8 `present` motion modifier in Clang (1/2)
This patch implements Clang front end support for the OpenMP TR8
`present` motion modifier for `omp target update` directives.  The
next patch in this series implements OpenMP runtime support.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D84711
2020-07-28 19:15:18 -04:00
Zahira Ammarguellat 80bd6ae13e On Windows build, making the /bigobj flag global , instead of passing it per file.
To avoid having this flag be passed in per/file manner, we are instead
passing it globally.

This fixes this bug: https://bugs.llvm.org/show_bug.cgi?id=46733

Reviewed-by: aaron.ballman, beanz, meinersbur

Differential Revision: https://reviews.llvm.org/D84038
2020-07-28 18:04:36 -05:00
Richard Smith 740a164dec PR46377: Fix dependence calculation for function types and typedef
types.

We previously did not treat a function type as dependent if it had a
parameter pack with a non-dependent type -- such a function type depends
on the arity of the pack so is dependent even though none of the
parameter types is dependent. In order to properly handle this, we now
treat pack expansion types as always being dependent types (depending on
at least the pack arity), and always canonically being pack expansion
types, even in the unusual case when the pattern is not a dependent
type. This does mean that we can have canonical types that are pack
expansions that contain no unexpanded packs, which is unfortunate but
not inaccurate.

We also previously did not treat a typedef type as
instantiation-dependent if its canonical type was not
instantiation-dependent. That's wrong because instantiation-dependence
is a property of the type sugar, not of the type; an
instantiation-dependent type can have a non-instantiation-dependent
canonical type.
2020-07-28 13:23:13 -07:00
Zequan Wu b46176bbb0 Reland [Coverage] Add comment to skipped regions
Bug filled here: https://bugs.llvm.org/show_bug.cgi?id=45757.
Add comment to skipped regions so we don't track execution count for lines containing only comments.

Differential Revision: https://reviews.llvm.org/D83592
2020-07-28 13:20:57 -07:00
Richard Smith 6c18f7db73 For PR46800, implement the GCC __builtin_complex builtin.
glibc's implementation of the CMPLX macro uses it (with -fgnuc-version
set to 4.7 or later).
2020-07-22 13:43:10 -07:00
Hans Wennborg 238bbd48c5 Revert abd45154b "[Coverage] Add comment to skipped regions"
This casued assertions during Chromium builds. See comment on the code review

> Bug filled here: https://bugs.llvm.org/show_bug.cgi?id=45757.
> Add comment to skipped regions so we don't track execution count for lines containing only comments.
>
> Differential Revision: https://reviews.llvm.org/D84208

This reverts commit abd45154bd and the
follow-up 87d7254733.
2020-07-22 17:09:20 +02:00
Joel E. Denny aa82c40f0a [OpenMP] Implement TR8 `present` map type modifier in Clang (1/2)
This patch implements Clang front end support for the OpenMP TR8
`present` map type modifier.  The next patch in this series implements
OpenMP runtime support.

This patch does not attempt to implement TR8 sec. 2.22.7.1 "map
Clause", p. 319, L14-16:

> If a map clause with a present map-type-modifier is present in a map
> clause, then the effect of the clause is ordered before all other
> map clauses that do not have the present modifier.

Compare to L10-11, which Clang does not appear to implement yet:

> For a given construct, the effect of a map clause with the to, from,
> or tofrom map-type is ordered before the effect of a map clause with
> the alloc, release, or delete map-type.

This patch also does not implement the `present` implicit-behavior for
`defaultmap` or the `present` motion-modifier for `target update`.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D83061
2020-07-22 10:15:32 -04:00
Sjoerd Meijer 5567c62afa [Matrix] Add LowerMatrixIntrinsics to the NPM
Pass LowerMatrixIntrinsics wasn't running yet running under the new pass
manager, and this adds LowerMatrixIntrinsics to the pipeline (to the
same place as where it is running in the old PM).

Differential Revision: https://reviews.llvm.org/D84180
2020-07-22 09:47:53 +01:00
David Blaikie 36036aa70e Reapply "Rename/refactor isIntegerConstantExpression to getIntegerConstantExpression"
Reapply 49e5f603d4
which had been reverted in c94332919b.

Originally reverted because I hadn't updated it in quite a while when I
got around to committing it, so there were a bunch of missing changes to
new code since I'd written the patch.

Reviewers: aaron.ballman

Differential Revision: https://reviews.llvm.org/D76646
2020-07-21 20:57:12 -07:00
Zequan Wu abd45154bd [Coverage] Add comment to skipped regions
Bug filled here: https://bugs.llvm.org/show_bug.cgi?id=45757.
Add comment to skipped regions so we don't track execution count for lines containing only comments.

Differential Revision: https://reviews.llvm.org/D84208
2020-07-21 17:34:18 -07:00
Wang, Pengfei 18581fd2c4 [CFE] Add nomerge function attribute to inline assembly.
Sometimes we also want to avoid merging inline assembly. This patch add
the nomerge function attribute to inline assembly.

Reviewed By: zequanwu

Differential Revision: https://reviews.llvm.org/D84225
2020-07-22 08:22:58 +08:00
Alexey Bataev 13bfe4b226 [OPENMP]Fix PR46012: declare target pointer cannot be accessed in target region.
Summary:
Need to avoid an optimization for base pointer mapping for target data
directives.

Reviewers: jdoerfert, ye-luo

Subscribers: yaxunl, guansong, cfe-commits, sstefan1, caomhin

Tags: #clang

Differential Revision: https://reviews.llvm.org/D84182
2020-07-21 15:48:32 -04:00
Arthur Eubanks b13b858182 [NewPM] Support optnone under new pass manager
OptNoneInstrumentation is part of StandardInstrumentations. It skips
functions (or loops) that are marked optnone.

The feature of skipping optional passes for optnone functions under NPM
is gated on a -enable-npm-optnone flag. Currently it is by default
false. That is because we still need to mark all required passes to be
required. Otherwise optnone functions will start having incorrect
semantics.  After that is done in following changes, we can remove the
flag and always enable this.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D83519
2020-07-21 09:53:43 -07:00
Saiyedul Islam fc7d2908ab [OpenMP] Use common interface to access GPU Grid Values
Use common interface for accessing target specific GPU grid values in NVPTX
OpenMP codegen as proposed in https://reviews.llvm.org/D80917

Originally authored by Greg Rodgers (@gregrodgers).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83492
2020-07-21 05:25:46 +00:00
Logan Smith 8b6179f48c [NFC] Add missing 'override's 2020-07-20 14:39:36 -07:00
Joel E. Denny cbf64b5834 [OpenMP] Fix map clause for unused var: don't ignore it
For example, without this patch:

```
 $ cat test.c
 int main() {
   int x[3];
   #pragma omp target map(tofrom:x[0:3])
 #ifdef USE
   x[0] = 1
 #endif
   ;
   return 0;
 }
 $ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -S -emit-llvm test.c
 $ grep '^@.offload_maptypes' test.ll
 $ echo $?
 1
 $ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -S -emit-llvm test.c \
         -DUSE
 $ grep '^@.offload_maptypes' test.ll
 @.offload_maptypes = private unnamed_addr constant [1 x i64] [i64 35]
```

With this patch, both greps produce the same result.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D83922
2020-07-17 21:37:27 -04:00
Michele Scandale 53880b8cb9 [CMake] Make `intrinsics_gen` dependency unconditional.
The `intrinsics_gen` target exists in the CMake exports since r309389
(see LLVMConfig.cmake.in), hence projects can depend on `intrinsics_gen`
even it they are built separately from LLVM.

Reviewed By: MaskRay, JDevlieghere

Differential Revision: https://reviews.llvm.org/D83454
2020-07-17 16:43:17 -07:00
Xiangling Liao ec6ada6264 [AIX] report_fatal_error on `-fregister_global_dtors_with_atexit` for static init
On AIX, the semantic of global_dtors contains __sterm functions associated with C++
cleanup actions and user-declared __attribute__((destructor)) functions. We should
never merely register __sterm with atexit(), so currently
-fregister_global_dtors_with_atexit does not work well on AIX: It would cause
finalization actions to not occur when unloading shared libraries.  We need to figure
out a way to handle that when we start supporting user-declared
__attribute__((destructor)) functions.

Currently we report_fatal_error on this option temporarily.

Differential Revision: https://reviews.llvm.org/D83974
2020-07-17 16:14:49 -04:00
Saiyedul Islam c7562e77b3 [OpenMP][NFC] Generalize CGOpenMPRuntimeNVPTX as CGOpenMPRuntimeGPU
Refactors CGOpenMPRuntimeNVPTX as CGOpenMPRuntimeGPU to make it a
generalization for OpenMP GPU Codegen. Target specific specialized
methods for NVPTX are defined in class CGOpenMPRuntimeNVPTX. This
paves the way for a clean and maintainable extension to more GPU
targets for OpenMP Codegen.

For original author (git blame) list of CGOpenMPRuntimeGPU code,
look in history of CGOpenMPRuntimeNVPTX.cpp and .h, after this commit.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D83723
2020-07-17 14:38:04 +00:00