Commit Graph

363625 Commits

Author SHA1 Message Date
Johannes Doerfert 95a25e4c32 [OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156
When we implement OpenMP GPU reductions we use type punning a lot during
the shuffle and reduce operations. This is not always compatible with
language rules on aliasing. So far we generated TBAA which later allowed
to remove some of the reduce code as accesses and initialization were
"known to not alias". With this patch we avoid TBAA in this step,
hopefully for all accesses that we need to.

Verified on the reproducer of PR46156 and QMCPack.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D86037
2020-08-16 14:38:31 -05:00
David Green 5f45f91de4 [ARM] Tests for tail predicated loads. NFC 2020-08-16 19:46:37 +01:00
Mark de Wever fef2607124 [Sema] Use the proper cast for a fixed bool enum.
When casting an enumerate with a fixed bool type the casting should use
an IntegralToBoolean instead of an IntegralCast as is required per Core
Issue 2338.

Fixes PR47055: Incorrect codegen for enum with bool underlying type

Differential Revision: https://reviews.llvm.org/D85612
2020-08-16 18:40:08 +02:00
Mark de Wever 827ba67e38 [Sema] Validate calls to GetExprRange.
When a conditional expression has a throw expression it called
GetExprRange with a void expression, which caused an assertion failure.

This approach was suggested by Richard Smith.

Fixes PR46484: Clang crash in clang/lib/Sema/SemaChecking.cpp:10028

Differential Revision: https://reviews.llvm.org/D85601
2020-08-16 18:32:38 +02:00
Simon Pilgrim f25d47b7ed [X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types
We can now enable this for AVX1 targets can now assist with canonicalizeShuffleMaskWithHorizOp cleanup.

There's still a few missed opportunities for merging subvector insert/extracts into shuffles, but they shouldn't cause any regressions now.
2020-08-16 15:00:41 +01:00
Sanjay Patel 29e1d16a3e Revert "[PhaseOrdering] add test for memcpy removal (PR47114); NFC"
This reverts commit babb59496b.

This test addition was queued up with some unrelated changes,
but it seems more likely that we need to fix something internal
to -memcpyopt. Also, I'm not sure if including target-specifc
attributes in a generic regression test dir will cause bot
problems.
2020-08-16 09:52:33 -04:00
Sanjay Patel 3ffb751f3d [InstCombine] fold copysign with fabs/fneg operand
We already get this in the backend, but we need to do
it in IR too to consistently get yet more copysign
transforms.
2020-08-16 08:53:47 -04:00
Sanjay Patel 3fed67b7e6 [InstCombine] reduce code duplication; NFC 2020-08-16 08:53:47 -04:00
Sanjay Patel 4d5fdff434 [InstCombine] add tests for copysign; NFC 2020-08-16 08:53:47 -04:00
Sanjay Patel babb59496b [PhaseOrdering] add test for memcpy removal (PR47114); NFC 2020-08-16 08:53:47 -04:00
Vitaly Buka 47552a614a [StackSafety] Change how callee searched in index
Handle other than local linkage types.
2020-08-16 04:37:19 -07:00
Simon Pilgrim dca7eb7d60 [X86][SSE] Replace combineShuffleWithHorizOp with canonicalizeShuffleMaskWithHorizOp
Instead of just attempting to fold shuffle(HOP,HOP) for a specific target shuffle, make this part of combineX86ShufflesRecursively so we can perform this on the combined shuffle chain, which is particularly useful for recognising more cases of where we're performing multiple HOPs that can be merged and pre-AVX where we don't have good blend/unary target shuffle support.
2020-08-16 12:26:27 +01:00
Brad Smith 44613bbec8 Create strict aligned code for OpenBSD/arm64. 2020-08-16 07:14:34 -04:00
Simon Pilgrim c27baa54b7 [X86] isRepeatedTargetShuffleMask - don't require specific MVT type. NFC.
Split the isRepeatedTargetShuffleMask into a wrapper variant that takes a MVT describing the mask width, and an internal version that just needs the raw mask element bit size.

This will be necessary for an upcoming change where the horizontal ops element width might not match the shuffle mask element width.
2020-08-16 11:51:44 +01:00
Shoaib Meenai 402b063c80 [llvm-libtool-darwin] Fix test on all host architectures
By default, if a universal binary has a slice matching the host
architecture, llvm-objdump will only print that slice, otherwise it'll
print all architectures. Explicitly pass `--arch all` to force it to
always print all architectures, as we want for this test.
2020-08-16 00:18:03 -07:00
Fady Ghanim aaa93a681b [OpenMP][OMPBuilder] Adding support for `omp single`
This adds support for generating `omp single`, and necessary calls for
`copyprivate` clause.

Differential Revision: https://reviews.llvm.org/D85617
2020-08-16 01:15:16 -04:00
Shoaib Meenai 12b4df9919 [llvm-libtool-darwin] Speculative buildbot fix
http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l is failing
this test. Attempt to explicitly use the Mach-O dump format as a
speculative fix.
2020-08-15 21:32:09 -07:00
LLVM GN Syncbot 1bc298aa12 [gn build] Port 577e58bcc7 2020-08-16 03:17:58 +00:00
Wenlei He 577e58bcc7 [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.

A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.

This is a resubmit of https://reviews.llvm.org/D83743
2020-08-15 20:17:21 -07:00
Fangrui Song 5b50a1656a [ARC] Fix CodeGen/ARC/brcc.ll 2020-08-15 19:33:35 -07:00
Jon Chesterfield d0b312955f [libomptarget] Implement host plugin for amdgpu
[libomptarget] Implement host plugin for amdgpu

Replacement for D71384. Primary difference is inlining the dependency on atmi
followed by extensive simplification and bugfixes. This is the latest version
from https://github.com/ROCm-Developer-Tools/amd-llvm-project/tree/aomp12 with
minor patches and a rename from hsa to amdgpu, on the basis that this can't be
used by other implementations of hsa without additional work.

This will not build unless the ROCM_DIR variable is passed so won't break other
builds. That variable is used to locate two amdgpu specific libraries that ship
as part of rocm:
libhsakmt at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
libhsa-runtime64 at https://github.com/RadeonOpenCompute/ROCR-Runtime
These libraries build from source. The build scripts in those repos are for
shared libraries, but can be adapted to statically link both into this plugin.

There are caveats.
- This works well enough to run various tests and benchmarks, and will be used
  to support the current clang bring up
- It is adequately thread safe for the above but there will be races remaining
- It is not stylistically correct for llvm, though has had clang-format run
- It has suboptimal memory management and locking strategies
- The debug printing / error handling is inconsistent

I would like to contribute this pretty much as-is and then improve it in-tree.
This would be advantagous because the aomp12 branch that was in use for fixing
this codebase has just been joined with the amd internal rocm dev process.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85742
2020-08-15 23:58:28 +01:00
Lang Hames a49b05bb61 [JITLink][MachO] Use correct symbol scope when N_PEXT is set and N_EXT unset.
MachOLinkGraphBuilder has been treating these as hidden, but they should be
treated as local.

Symbols with N_PEXT set and N_EXT unset are produced when hidden symbols are
run through 'ld -r' without passing -keep_private_externs. They will show up
under 'nm -m' as "was private extern", hence the name of the test cases.

Testcase commited as relocatable object to ensure that the test suite doesn't
depend on having 'ld -r' available.
2020-08-15 15:53:33 -07:00
Mehdi Amini 22cbe40fa9 Slightly relax the regex on lld version in test (NFC)
This makes the test introduced in 537f5483fe more robust with respect
to the actual version number. The previous regex restricted the version
to start with a leading `1` which was overly restrictive.
2020-08-15 21:38:02 +00:00
Amara Emerson 7006bb69ef [GlobalISel] Enable copy-propagation in post-legalizer combiner.
This cleans up copies that the legalizer or other combines leave around. They
can occasionally end up escaping as moves.

Differential Revision: https://reviews.llvm.org/D85964
2020-08-15 13:44:30 -07:00
Mehdi Amini 54ce344314 Refactor mlir-opt setup in a new helper function (NFC)
This will help refactoring some of the tools to prepare for the explicit registration of
Dialects.

Differential Revision: https://reviews.llvm.org/D86023
2020-08-15 20:09:06 +00:00
Shoaib Meenai 93c761f5e5 [llvm-libtool-darwin] Use Optional operator overloads. NFC
Use operator bool instead of hasValue and operator* instead of getValue
to simplify the code slightly.
2020-08-15 11:41:57 -07:00
LLVM GN Syncbot 160c133be5 [gn build] Port 79298a5067 2020-08-15 16:24:37 +00:00
Matt Arsenault 04a288f0f0 GlobalISel: Remove unnecessary llvm:: 2020-08-15 12:12:50 -04:00
Matt Arsenault f0af434b79 AMDGPU: Remove register class params from flat memory patterns 2020-08-15 12:12:33 -04:00
Matt Arsenault a7455652c0 AMDGPU: Fix global atomic saddr operand class 2020-08-15 12:12:28 -04:00
Matt Arsenault 625db2fe5b AMDGPU: Remove slc from flat offset complex patterns
This was always set to 0. Use a default value of 0 in this context to
satisfy the instruction definition patterns. We can't unconditionally
use SLC with a default value of 0 due to limitations in TableGen's
handling of defaulted operands when followed by non-default operands.
2020-08-15 12:12:24 -04:00
Matt Arsenault e5077b5c2a AMDGPU: Fix matching wrong offsets for global atomic loads
These used signed offsets with a different size.
2020-08-15 12:12:17 -04:00
Matt Arsenault 8cb022982a AMDGPU: Remove redundant FLAT complex patterns
These were identical to the non-atomic cases. I'm not sure why these
were ever separated.
2020-08-15 12:12:01 -04:00
Matt Arsenault 47af1ac69a AMDGPU: Correct definitions for global saddr instructions
The VGPR component is a 32-bit offset, not 64-bits.

I'm not sure what the correct syntax is for this. This maintains the
vaddr position and leaves saddr in the end "off" position. This is
particularly terrible for stores, since the operand order is now <vgpr
offset>, <data>, <sgpr base>, splitting the pointer operands. I
suppose this is a logical consequence from the mistake of not putting
the data operand first. I'm not sure what sp3 does.
2020-08-15 12:11:57 -04:00
Matt Arsenault 79298a5067 AMDGPU: Remove SIFixupVectorISel pass
This was only used for matching the saddr addressing mode of global
instructions, but this was not implemented correctly. The instruction
definitions aren't even correct, and are defined as using a 64-bit
VGPR component. Eliminate this pass to enable correcting the
instruction definitions. A new matching implementation can work in
GlobalISel or relying on DAG divergence information for the base
address.
2020-08-15 12:11:51 -04:00
Aditya Kumar 49a944af7f [NFC] Fix typo and variable names 2020-08-15 09:06:22 -07:00
Luofan Chen 266949b2bc [Attributor][NFC] Format code 2020-08-16 00:00:45 +08:00
Luofan Chen b7448a348b [Attributor][NFC] Use indexes instead of iterator
When adding elements when iterating, the iterator will become
valid, which could cause errors. This fixes the issue by using
indexes instead of iterator.
2020-08-15 23:09:46 +08:00
Bernhard Manfred Gruber 345053390a Add support for C++20 concepts and decltype to modernize-use-trailing-return-type. 2020-08-15 10:40:22 -04:00
Cyndy Ishida 85d381eb02 [TextAPI] update DriverKit string value
String value differed from downstream, where upstream doesn't depend on
casing difference.
<rdar://problem/67106257>
2020-08-15 06:44:30 -07:00
Xing GUO 030df8242f [MachOYAML] Move EmitFunc to an inner scope. NFC. 2020-08-15 21:10:03 +08:00
Luofan Chen 87a85f3d57 [Attributor] Use internalized version of non-exact functions
This patch internalize non-exact functions and replaces of their uses
with the internalized version. Doing this enables the analysis of
non-exact functions.

We can do this because some non-exact functions with the same name
whose linkage is `linkonce_odr` or `weak_odr` should have the same
semantics, so we can safely internalize and replace use of them (the
result of the other version of this function should be the same.).
Note that not all functions can be internalized, e.g., function with
`linkonce` or `weak` linkage.

For now when specified in commandline, we internalize all functions
that meet the requirements without calculating the cost of such
internalzation.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84167
2020-08-15 20:23:38 +08:00
Xing GUO 4a0b95dc5e [DWARFYAML] Simplify isEmpty(). NFC. 2020-08-15 20:10:29 +08:00
Dimitry Andric 3aecf4bdf3 On FreeBSD, add -pthread to ASan dynamic compile flags for tests
Otherwise, lots of these tests fail with a CHECK error similar to:

==12345==AddressSanitizer CHECK failed: compiler-rt/lib/asan/asan_posix.cpp:120 "((0)) == ((pthread_key_create(&tsd_key, destructor)))" (0x0, 0x4e)

This is because the default pthread stubs in FreeBSD's libc always
return failures (such as ENOSYS for pthread_key_create) in case the
pthread library is not linked in.

Reviewed By: arichardson

Differential Revision: https://reviews.llvm.org/D85082
2020-08-15 13:05:31 +02:00
Dávid Bolvanský f134fc4f1b Reland "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)" 2020-08-15 12:14:57 +02:00
Mehdi Amini 25ee851746 Revert "Separate the Registration from Loading dialects in the Context"
This reverts commit 2056393387.

Build is broken on a few bots
2020-08-15 09:21:47 +00:00
Mehdi Amini 2056393387 Separate the Registration from Loading dialects in the Context
This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.

This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.

Differential Revision: https://reviews.llvm.org/D85622
2020-08-15 08:07:31 +00:00
Mehdi Amini ba92dadf05 Revert "Separate the Registration from Loading dialects in the Context"
This was landed by accident, will reland with the right comments
addressed from the reviews.
Also revert dependent build fixes.
2020-08-15 07:35:10 +00:00
Martin Storsjö 3e7403a134 Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"
This reverts commit 6dbf0cfcf7.

That commit caused failed assertions, e.g. like this:

$ cat sprintf-strcpy.c
char *ptr; void func(void) { ptr += sprintf(ptr, "%s", ""); }

$ clang -c sprintf-strcpy.c -O2 -target x86_64-linux-gnu
clang: ../lib/IR/Value.cpp:473: void llvm::Value::doRAUW(llvm::Value*,
llvm::Value::ReplaceMetadataUses): Assertion `New->getType() ==
getType() && "replaceAllUses of value with new value of different
type!"' failed.
2020-08-15 09:35:11 +03:00
Raphael Isemann 7208cb1ac4 [lldb] Remove XFAIL from now passing TestPtrRefs/TestPtreRefsObjC
8fcfe2862f and
0cceb54366 fixed those tests.
2020-08-15 08:14:44 +02:00