Commit Graph

132762 Commits

Author SHA1 Message Date
Simon Pilgrim e95d04f4f1 [X86][AVX] lowerV4X128Shuffle - attempt to widen to 2x256 to simplify shuffles
If we are lowering to X86ISD::SHUF128 we are going to lose track of individual 128-bit lanes that are UNDEF, so if we can widen these to guarantee that they are sequential with their neighbour we should. This helps with later shuffle combines.
2020-03-30 12:22:26 +01:00
Florian Hahn 9e81249d76 [Matrix] Rename emitChainedMatrixMultiply to emitMatrixMultiply (NFC).
The Chained in the name potentially leads to confusion. Also updated the
comment to drop the unnecessary mention of tile-sized.
2020-03-30 11:17:25 +01:00
Florian Hahn c3b03f3d0c [AMDGPU] Drop const for value that is copied (NFC).
This fixes

    warning: loop variable 'Def' of type 'const llvm::Register' creates a copy from type 'const llvm::Register' [-Wrange-loop-analysis]

llvm::Register just contains a single unsigned and should be copied.

Reviewers: rampitec

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D77011
2020-03-30 10:59:59 +01:00
Sam Parker 94b195ff12 [ARM][LowOverheadLoops] Add horizontal reduction support
Add a bit more logic into the 'FalseLaneZeros' tracking to enable
horizontal reductions and also make the VADDV variants
validForTailPredication.

Differential Revision: https://reviews.llvm.org/D76708
2020-03-30 09:55:41 +01:00
Guillaume Chatelet b91535f6c7 [Alignment][NFC] Return Align for SelectionDAGNodes::getOriginalAlignment/getAlignment
Summary:
Also deprecate getOriginalAlignment, getAlignment will take much more time as it is pervasive through the codebase (including TableGened files).

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76933
2020-03-30 07:26:48 +00:00
David Green c9eaed5149 [ARM] MVE VMOV.i64
In the original batch of MVE VMOVimm code generation VMOV.i64 was left
out due to the way it was done downstream. It turns out that it's fairly
simple though. This adds the codegen for it, similar to NEON.

Bigendian is technically incorrect in this version, which John is fixing
in a Neon patch.
2020-03-30 07:44:23 +01:00
Max Kazantsev 4e0d9925d6 [NFC] Remove obsolete checks followed by fix of isGuaranteedToTransferExecutionToSuccessor
In past, isGuaranteedToTransferExecutionToSuccessor contained some weird logic
for volatile loads/stores that was ultimately removed by patch D65375. It's time to
remove a piece of dependent logic that used to be a workaround for the code which
is now deleted.

Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D76918
2020-03-30 12:24:41 +07:00
Jun Ma 31a1d85c53 [Coroutines 2/2] Improve symmetric control transfer feature
Differential Revision: https://reviews.llvm.org/D76913
2020-03-30 09:53:09 +08:00
Jun Ma a94fa2c049 [Coroutines 1/2] Improve symmetric control transfer feature
Differential Revision: https://reviews.llvm.org/D76911
2020-03-30 09:53:09 +08:00
Benjamin Kramer 854f268ca6 [MC] Move deprecation infos from MCTargetDesc to MCInstrInfo
This allows emitting it only when the feature is used by a target.
Shrinks Release+Asserts clang by 900k.
2020-03-29 21:20:40 +02:00
Simon Pilgrim 9c8ec99c80 [X86][AVX] Combine 128/256-bit lane shuffles with zeroable upper subvectors to EXTRACT_SUBVECTOR (PR40720)
As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128/SHUF128, and we can use the implicit zeroing of the uppers.
2020-03-29 19:51:38 +01:00
Simon Pilgrim 8206c50cde [X86] Add isAnyZero shuffle mask helper 2020-03-29 19:51:37 +01:00
Nikita Popov 8253a86b65 [InstCombine] Erase old mul when creating umulo
As we don't return the result of replaceInstUsesWith(), we are
responsible for erasing the instruction.

There is a small subtlety here in that we need to do this after
the other uses of Builder, which uses the original multiply as
the insertion point.

NFC apart from worklist order changes.
2020-03-29 20:46:08 +02:00
Nikita Popov 53d209076a [InstCombine] Use replaceOperand() in demanded elements simplification
To make sure that dead operands get DCEd. This fixes the largest
source of leftover dead operands we see in tests.

NFC apart from worklist changes.
2020-03-29 20:43:19 +02:00
Nikita Popov 0c87140065 [InstCombine] Use replaceOperand() in assoc cast simplification
To make sure the old operands are DCEd.

NFC apart from worklist order.
2020-03-29 20:28:37 +02:00
Nikita Popov a9ddcd6411 [InstCombine] Erase old add when optimizing add overflow
We don't return the replaceInstUsesWith() result, so we're
responsible for cleaning up.

NFC apart from worklist order changes.
2020-03-29 20:20:14 +02:00
Uday Bondhugula c0955edfd6 Introduce support for lib function aligned_alloc in TLI / memory builtins
Aligned_alloc is a standard lib function and has been in glibc since
2.16 and in the C11 standard. It has semantics similar to malloc/calloc
for several analyses/transforms. This patch introduces aligned_alloc
in target library info and memory builtins. Subsequent ones will
make other passes aware and fix https://bugs.llvm.org/show_bug.cgi?id=44062

This change will also be useful to LLVM generators that need to allocate
buffers of vector elements larger than 16 bytes (for eg. 256-bit ones),
element boundary alignment for which is not typically provided by glibc malloc.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Differential Revision: https://reviews.llvm.org/D76970
2020-03-29 23:36:24 +05:30
Matt Arsenault d15723ef06 AMDGPU/GlobalISel: Remove redundant virtual 2020-03-29 14:03:07 -04:00
Matt Arsenault ab7a41069e AMDGPU: Fix using wrong instruction for FP conversion
This was was never actually hit, but FTRUNC was clearly not the intent
here.
2020-03-29 14:03:07 -04:00
Matt Arsenault 97bbe7ad2a AMDGPU: Fix typo 2020-03-29 14:03:06 -04:00
Sanjay Patel fc3cc8a4b0 [VectorCombine] skip debug intrinsics first for efficiency 2020-03-29 13:58:04 -04:00
Nikita Popov 26fa33755f [InstCombine] Simplify select of cmpxchg transform
Rather than converting to a dummy select with equal true and false
ops, just directly return the resulting value.

As a side-effect, this fixes missing DCE of the previously replaced
operand.
2020-03-29 18:57:32 +02:00
Florian Hahn 99913ef3d1 [OpenMP] set_bits iterator yields unsigned elements, no reference (NFC).
BitVector::set_bits() returns an iterator range yielding unsinged
elements, which always will be copied while const & gives the impression
that there will be no copy. Newer version of clang complain:

    warning: loop variable 'SetBitsIt' is always a copy because the range of type 'iterator_range<llvm::BitVector::const_set_bits_iterator>' (aka 'iterator_range<const_set_bits_iterator_impl<llvm::BitVector> >') does not return a reference [-Wrange-loop-analysis]

Reviewers: jdoerfert, rnk

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D77010
2020-03-29 17:08:13 +01:00
Nikita Popov 28f67bd5c5 [InstCombine] Fix worklist management in varargs transform
Add a replaceUse() helper to mirror replaceOperand() for the
rare cases where we're working directly on uses.

NFC apart from worklist order changes.
2020-03-29 18:04:12 +02:00
Nikita Popov 6f07a9e80a [InstCombine] Erase original add when creating saddo
Usually when we replaceInstUsesWith() we also return the original
instruction, and InstCombine will take care of erasing it. Here
we don't do that, so we need to manually erase it.

NFC apart from worklist order changes.
2020-03-29 18:01:32 +02:00
Nikita Popov 1e363023b8 [InstCombine] Use replaceOperand() in a few more places
To make sure the old operands get DCEd.

NFC apart from worklist order changes.
2020-03-29 18:01:00 +02:00
Simon Pilgrim 7734e4b3a3 [X86][AVX] Combine 128-bit lane shuffles with a zeroable upper half to EXTRACT_SUBVECTOR (PR40720)
As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128, and we can use the implicit zeroing of the upper half.

I've added some extra tests to vector-shuffle-combining-avx2.ll to make sure we don't lose coverage.
2020-03-29 16:41:59 +01:00
Simon Pilgrim da4c7db793 [X86] Rename matchShuffleAsByteRotate to matchShuffleAsElementRotate. NFC.
This was an inner helper function for the real matchShuffleAsByteRotate function, but it is more generic and is used directly for VALIGN lowering which doesn't work at the byte level.
2020-03-29 16:41:58 +01:00
Simon Pilgrim 10439f9e32 [X86][AVX] Add X86ISD::VALIGN target shuffle decode support
Allows us to combine VALIGN instructions with other shuffles - the combiner doesn't create VALIGN yet though.
2020-03-29 16:41:58 +01:00
Florian Hahn 49d00824bb [VPlan] Use one VPWidenRecipe per original IR instruction. (NFC).
This patch changes VPWidenRecipe to only store a single original IR
instruction. This is the first required step towards modeling it's
operands as VPValues and also towards breaking it up into a
VPInstruction.

Discussed as part of D74695.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D76988
2020-03-29 13:47:28 +01:00
Simon Pilgrim a7115d51be [X86] X86CallFrameOptimization - generalize slow push code path
Replace the explicit isAtom() || isSLM() test with the more general (and more specific) slowTwoMemOps() check to avoid the use of the PUSHrmm push from memory case.

This is actually very tricky to test in anything but quite complex code, but the atomic-idempotent.ll tests seem to be the most straightforward to use.

Differential Revision: https://reviews.llvm.org/D76239
2020-03-29 11:01:59 +01:00
Richard Diamond 4bf015c035 [AlignmentFromAssumptions] Fix a SCEV assertion resulting from address space differences.
Summary:
On targets with different pointer sizes, -alignment-from-assumptions could attempt to create SCEV expressions which use different effective SCEV types. The provided test illustrates the issue.

In `getNewAlignment`, AASCEV would be the (only) alloca, which would have an effective SCEV type of i32. But PtrSCEV, the GEP in this case, due to being in the flat/default address space, will have an effective SCEV of i64.

This patch resolves the issue by truncating PtrSCEV to AASCEV's effective type.

Reviewers: hfinkel, jdoerfert

Reviewed By: jdoerfert

Subscribers: jvesely, nhaehnle, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75471
2020-03-29 01:26:31 -05:00
Fangrui Song fc93787d7e [MC][PowerPC] Make .reloc support arbitrary relocation types
Generalizes ad7199f3e6 (R_PPC_NONE/R_PPC64_NONE).
2020-03-28 17:04:31 -07:00
Matt Arsenault 9564f46766 AMDGPU: Make use of default operands 2020-03-28 17:33:29 -04:00
Benjamin Kramer ba2e72c54e [MDBuilder] Don't use stable sort for sorting integers. 2020-03-28 21:19:46 +01:00
Nikita Popov 2215dcf1d7 [InstCombine] Remove unreachable blocks before DCE
Dropping unreachable code may reduce use counts on other instructions,
so it's better to do this earlier rather than later.

NFC-ish, may only impact worklist order.
2020-03-28 21:19:16 +01:00
Nikita Popov 97cc1275c7 [InstCombine] Merge two functions; NFC
Merge AddReachableCodeToWorklist() into prepareICWorklistFromFunction().
It's one logical step, and this makes it easier to move code.
2020-03-28 21:19:16 +01:00
Benjamin Kramer 2d24d74b85 [AMDGPU] Stabilize sort order
Found by the expensive checks in llvm::sort.
2020-03-28 20:20:14 +01:00
Yonghong Song ced0d1f42b [BPF] support 128bit int explicitly in layout spec
Currently, bpf does not specify 128bit alignment in its
layout spec. So for a structure like
  struct ipv6_key_t {
    unsigned pid;
    unsigned __int128 saddr;
    unsigned short lport;
  };
clang will generate IR type
  %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] }
Additional padding is to ensure later IR->MIR can generate correct
stack layout with target layout spec.

But it is common practice for a tracing program to be
first compiled with target flag (e.g., x86_64 or aarch64) through
clang to generate IR and then go through llc to generate bpf
byte code. Tracing program often refers to kernel internal
data structures which needs to be compiled with non-bpf target.

But such a compilation model may cause a problem on aarch64.
The bcc issue https://github.com/iovisor/bcc/issues/2827
reported such a problem.

For the above structure, since aarch64 has "i128:128" in its
layout string, the generated IR will have
  %struct.ipv6_key_t = type { i32, i128, i16 }

Since bpf does not have "i128:128" in its spec string,
the selectionDAG assumes alignment 8 for i128 and
computes the stack storage size for the above is 32 bytes,
which leads incorrect code later.

The x86_64 does not have this issue as it does not have
"i128:128" in its layout spec as it does permits i128 to
be alignmented at 8 bytes at stack. Its IR type looks like
  %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] }

The fix here is add i128 support in layout spec, the same as
aarch64. The only downside is we may have less optimal stack
allocation in certain cases since we require 16byte alignment
for i128 instead of 8. But this is probably fine as i128 is
not used widely and in most cases users should already
have proper alignment.

Differential Revision: https://reviews.llvm.org/D76587
2020-03-28 11:46:29 -07:00
Benjamin Kramer 4065e92195 Upgrade some instances of std::sort to llvm::sort. NFC. 2020-03-28 19:23:29 +01:00
Reid Kleckner e5bf5037d8 [CodeGen] Fix sinking local values in lpads with phis
There was already a test case for landingpads to handle this case, but I
had forgotten to consider PHI instructions preceding the EH_LABEL in the
landingpad.

PR45261
2020-03-28 11:10:33 -07:00
Nikita Popov 30d712103f [InstCombine] Use replaceOperand() API in GEP transforms
To make sure that replaced operands get DCEd. This drops one
iteration from gepphigep.ll, which is still not optimal.

This was the last test case performing more than 3 iterations.

NFC-ish, only worklist order should change.
2020-03-28 19:07:25 +01:00
Nikita Popov b1f78baeaa [InstCombine] Reduce code duplication in GEP of PHI transform; NFC
The `NewGEP->setOperand(DI, NewPN)` call was duplicated, and the
insertion of NewGEP is the same in both if/else, so we can extract it.
2020-03-28 19:07:25 +01:00
Alexandre Ganea 3ab3f3c5d5 After 09158252f7, fix build when -DLLVM_ENABLE_THREADS=OFF
Tested on Linux with Clang 9, and on Windows with Visual Studio 2019 16.5.1 with -DLLVM_ENABLE_THREADS=ON and OFF.
2020-03-28 13:54:58 -04:00
Nikita Popov 672e8bfbfc [InstCombine] Fix worklist management in foldXorOfICmps()
Because this code does not use the IC-aware replaceInstUsesWith()
helper, we need to manually push users to the worklist.

This is NFC-ish, in that it may only change worklist order.
2020-03-28 18:25:21 +01:00
Enna1 03bc311a16 [CorrelatedValuePropagation] Remove redundant if statement in processSelect()
This statement

    if (ReplaceWith == S) ReplaceWith = UndefValue::get(S->getType());

is introduced in https://reviews.llvm.org/rG35609d97ae89b8e13f40f4e6b9b056954f8baa83
to fix a case where unreachable code can cause select instruction
simplification to fail. In https://reviews.llvm.org/rGd10480657527ffb44ea213460fb3676a6b1300aa,
we begin to perform a depth-first walk of basic blocks. This means
we will not visit unreachable blocks. So we do not need this the
special check any more.

Differential Revision: https://reviews.llvm.org/D76753
2020-03-28 18:01:17 +01:00
Martin Storsjö e6112a56dd [AsmPrinter] Emit .weak directive for weak linkage on COFF for symbols without a comdat
MC already knows how to emulate the .weak directive (with its ELF
semantics; i.e., an undefined weak symbol resolves to 0, and a defined
weak symbol has lower link precedence than a strong symbol of the same
name) using COFF weak externals. Plumb this through the ASM printer too,
so that definitions marked with __attribute__((weak)) at the language
level (which gets translated to weak linkage at the IR level) have the
corresponding .weak directive emitted. Note that declarations marked
with __attribute__((weak)) at the language level (which translates to
extern_weak at the IR level) already have .weak directives emitted.

Weak*/linkonce* symbols without an associated comdat (in particular, ones
generated with __attribute__((weak)) in C/C++) were earlier emitted as
normal unique globals, as the comdat is required to provide the linkonce
semantics. This change makes sure they are emitted as .weak instead,
allowing other symbols to override them.

Rename the existing coff-weak.ll test to coff-linkonce.ll. I'm not
quite sure what that test covers, since the behavior being tested in it
(the emission of a one_only section) is just a result of passing
-function-sections to llc; the linkonce_odr makes no difference.

Add a new coff-weak.ll which tests the new directive emission.

Based on an previous patch by Shoaib Meenai.

Differential Revision: https://reviews.llvm.org/D44543
2020-03-28 18:48:58 +02:00
Florian Hahn 81f173ed0e [SCCP] Remove LatticeVal alias now that transition is done (NFC).
The LatticeVal alias was introduced to reduce the diff size for the
transition to ValueLatticeElement, which is done now.

This patch removes the unnecessary alias and updates some very verbose
type uses with auto.
2020-03-28 15:40:24 +00:00
Florian Hahn a44bf59c93 [SCCP] Remove unused toLatticeValue helper (NFC).
LatticeVal is an alias for ValueLatticeElement and the function is not
used any longer.
2020-03-28 15:40:24 +00:00
Michael Liao d2dd0fac48 Fix `-Wsign-compare` warning. NFC. 2020-03-28 10:20:27 -04:00
Uday Bondhugula 06066c4003 [NFC] Attributor comment updates / cast cleanup
Minor update/fixes to comments for the Attributor pass, and dyn_cast -> cast.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Differential Revision: https://reviews.llvm.org/D76972
2020-03-28 13:36:43 +05:30
Serge Pavlov f398739152 [FEnv] Constfold some unary constrained operations
This change implements constant folding to constrained versions of
intrinsics, implementing rounding: floor, ceil, trunc, round, rint and
nearbyint.

Differential Revision: https://reviews.llvm.org/D72930
2020-03-28 12:28:33 +07:00
Jonas Devlieghere 190df4a5bc Revert "[FileCollector] Add a method to add a whole directory and it contents."
This reverts commit 8913769e35 because the
unit test is failing on the Windows bot.
2020-03-27 19:21:48 -07:00
Jessica Paquette 98d05f88d5 [GlobalISel] Fix equality for copies from physregs in matchEqualDefs
When we see this:

```
%a = COPY $physreg
...
SOMETHING implicit-def $physreg
...
%b = COPY $physreg
```

The two copies are not equivalent, and so we shouldn't perform any folding
on them.

When we have two instructions which use a physical register check that they
define the same virtual register(s) as well.

e.g., if we run into this case

```
%a = COPY $physreg
...
%b = COPY %a
```

we can say that the two copies are the same, and can be folded.

Differential Revision: https://reviews.llvm.org/D76890
2020-03-27 17:52:21 -07:00
Jonas Devlieghere 8913769e35 [FileCollector] Add a method to add a whole directory and it contents.
Extend the FileCollector's API with addDirectory which adds a directory
and its contents to the VFS mapping.

Differential revision: https://reviews.llvm.org/D76671
2020-03-27 17:38:24 -07:00
Kamlesh Kumar aabc24acf0 [RISCV] Support llvm.thread.pointer
Fixes https://bugs.llvm.org/show_bug.cgi?id=45303 (clang crashed on __builtin_thread_pointer)

Reviewed By: lenary, MaskRay, luismarques

Differential Revision: https://reviews.llvm.org/D76828
2020-03-27 17:30:12 -07:00
Nemanja Ivanovic 4821411347 [DAGCombine] Fix splitting indexed loads in ForwardStoreValueToDirectLoad()
In DAGCombiner::visitLOAD() we perform some checks before breaking up an indexed
load. However, we don't do the same checking in ForwardStoreValueToDirectLoad()
which can lead to failures later during combining
(see: https://bugs.llvm.org/show_bug.cgi?id=45301).

This patch just adds the same checks to this function as well.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45301

Differential revision: https://reviews.llvm.org/D76778
2020-03-27 18:03:47 -05:00
Francesco Petrogalli 4b3d94051c [llvm][Type] Return fixed size for scalar types. [NFC]
Summary:
It is safe to assume that the TypeSize associated to scalar types has
a fixed size.

This avoids an implicit cast of TypeSize to integer inside
`Type::getScalarSizeInBits()`, as such implicit cast is deprecated.

Reviewers: efriedma, sdesmalen

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76892
2020-03-27 22:23:46 +00:00
Florian Hahn 9ce198d6ed [Darwin] Respect -fno-unroll-loops during LTO.
Currently -fno-unroll-loops is ignored when doing LTO on Darwin. This
patch adds a new -lto-no-unroll-loops option to the LTO code generator
and forwards it to the linker if -fno-unroll-loops is passed.

Reviewers: thegameg, steven_wu

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D76916
2020-03-27 22:19:03 +00:00
Jonas Devlieghere 3ef33e69de [VirtualFileSystem] Support directory entries in the YAMLVFSWriter
The current implementation of the JSONWriter does not support writing
out directory entries. Earlier today I added a unit test to illustrate
the problem. When an entry is added to the YAMLVFSWriter and the path is
a directory, it will incorrectly emit the directory as a file, and any
files inside that directory will not be found by the VFS.

It's possible to partially work around the issue by only adding "leaf
nodes" (files) to the YAMLVFSWriter. However, this doesn't work for
representing empty directories. This is a problem for clients of the VFS
that want to iterate over a directory. The directory not being there is
not the same as the directory being empty.

This is not just a hypothetical problem. The FileCollector for example
does not differentiate between file and directory paths. I temporarily
worked around the issue for LLDB by ignoring directories, but I suspect
this will prove problematic sooner rather than later.

This patch fixes the issue by extending the JSONWriter to support
writing out directory entries. We store whether an entry should be
emitted as a file or directory.

Differential revision: https://reviews.llvm.org/D76670
2020-03-27 15:16:52 -07:00
Sanjay Patel 0f56bbc1a5 [InstCombine] reduce FP-casted and bitcasted signbit check
PR45305:
https://bugs.llvm.org/show_bug.cgi?id=45305

Alive2 proofs:
http://volta.cs.utah.edu:8080/z/bVyrko
http://volta.cs.utah.edu:8080/z/Vxpz9q
2020-03-27 17:33:59 -04:00
Francesco Petrogalli c66d1f38f6 [llvm][Support] Add isZero method for TypeSize. [NFC]
Summary:
The method is used where TypeSize is implicitly cast to integer for
being checked against 0.

Reviewers: sdesmalen, efriedma

Reviewed By: sdesmalen, efriedma

Subscribers: efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76748
2020-03-27 21:03:44 +00:00
Matt Arsenault a8cc9047de CodeGen: Add -denormal-fp-math-f32 flag
Make the set of FP related attributes and command flags closer.
2020-03-27 14:00:39 -07:00
Jay Foad a6dfd827e5 [AMDGPU] Fix getEUsPerCU for gfx10 in CU mode
Summary:
"Per CU" is a bit simplistic for gfx10, but I couldn't think of a better
name.

Reviewers: arsenm, rampitec, nhaehnle, dstuttard, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76861
2020-03-27 20:36:49 +00:00
Fangrui Song 152d14da64 [MC][X86] Make .reloc support arbitrary relocation types
Generalizes D62014 (R_386_NONE/R_X86_64_NONE).

Unlike ARM (D76746) and AArch64 (D76754), we cannot delete FK_NONE from
getFixupKindSize because FK_NONE is still used by R_386_TLS_DESC_CALL/R_X86_64_TLSDESC_CALL.
2020-03-27 13:33:15 -07:00
diggerlin 9c20f09985 [AIX] Address comment https://reviews.llvm.org/D76162#inline-701237
SUMMARY:

Address clang format issue:

"clang format this block, I don't think the spaces are aligned correctly."

Subscribers: wuzish, nemanjai, hiraditya

Differential Revision: https://reviews.llvm.org/D76162
2020-03-27 16:21:53 -04:00
Matt Arsenault 348735b723 AMDGPU: Stop setting attributes based on TargetOptions
Having arbitrary passes looking at the TargetOptions is pretty
messy. This was also disregarding if a function already had an
explicit attribute setting on it. opt/llc now add the attributes to
functions that don't specify the attribute. clang and lld do not call
the function to do this, which they maybe should.

This was also treating unsafe-fp-math as implying the others, and
setting the other attributes based on it. This is not done anywhere
else, and I'm not sure is correct based on the current description of
the option bit.

Effectively reverts 1d8cf2be89
2020-03-27 13:13:43 -07:00
Matt Arsenault 0ab5b5b858 Fix denormal-fp-math flag and attribute interaction
Make these behave the same way unsafe-fp-math and co. The command line
flag should add the attribute to functions that do not already have
it, and leave existing attributes. The attribute is the actual
implementation, but the flag is useful in some testing situations.

AMDGPU has a variety of tests with denormals enabled/disabled that
would require a painful level of test duplication without a flag. This
doesn't expose setting the separate input/output modes, or add a flag
for the f32 version yet.

Tests will be included in future patch.
2020-03-27 12:48:58 -07:00
Fangrui Song 34d77516b8 [MC][AArch64] Make .reloc support arbitrary relocation types
Depends on D76746. Generalizes D61973.

Differential Revision: https://reviews.llvm.org/D76754
2020-03-27 12:30:52 -07:00
Fangrui Song c389526171 [MC][ARM] Make .reloc support arbitrary relocation types
Generalizes D61992. In GNU as, the .reloc directive supports arbitrary relocation types.

A MCFixupKind value `V` larger than or equal to FirstLiteralRelocationKind
is used to represent the relocation type whose number is V-FirstLiteralRelocationKind.

This is useful for linker tests. Without the feature the assembler
cannot produce certain relocation records (e.g.  R_ARM_ALU_PC_G0/R_ARM_LDR_PC_G0)
This helps move forward D75349 and D76575.

Differential Revision: https://reviews.llvm.org/D76746
2020-03-27 12:29:49 -07:00
Lang Hames cb84e4827e [ORC] Introduce JITSymbolFlags::HasMaterializeSideEffectsOnly flag.
This flag can be used to mark a symbol as existing only for the purpose of
enabling materialization. Such a symbol can be looked up to trigger
materialization with the lookup returning only once materialization is
complete. Symbols with this flag will never resolve however (to avoid
permanently polluting the symbol table), and should only be looked up using
the SymbolLookupFlags::WeaklyReferencedSymbol flag. The primary use case for
this flag is initialization symbols.
2020-03-27 11:02:54 -07:00
Lang Hames d38d06e649 [ORC] Don't create MaterializingInfo entries unnecessarily. 2020-03-27 11:02:54 -07:00
Craig Topper cdd1cd7120 [X86] Don't form masked instructions if the operation has an additional user.
This will cause the operation to be repeated in both a mask and another masked
or unmasked form. This can a wasted of execution resources.

Differential Revision: https://reviews.llvm.org/D60940
2020-03-27 10:44:22 -07:00
Simon Pilgrim 950ea61653 [X86] Remove orphan LowerSTRICT_FSETCC declaration. NFCI.
LowerSETCC handles strict cases as well, we don't have a separate function.
2020-03-27 17:03:19 +00:00
jasonliu d60d7d69de [llvm-objdump][XCOFF][AIX] Implement -r option
Summary:
Implement several XCOFF hooks to get '-r' option working for llvm-objdump -r.

Reviewer: DiggerLin, hubert.reinterpretcast, jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D75131
2020-03-27 16:05:42 +00:00
Guillaume Chatelet 74eac9031a [Alignment][NFC] MachineMemOperand::getAlign/getBaseAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76925
2020-03-27 15:49:13 +00:00
Sam Parker d7084fa34a [ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros
Given that some instructions generate wider result elements than
their inputs, flag them as being able to generate non zeros in the
false lanes.

Differential Revision: https://reviews.llvm.org/D76766
2020-03-27 15:26:13 +00:00
Alexandre Ganea 09158252f7 [ThinLTO] Allow usage of all hardware threads in the system
Before this patch, it wasn't possible to extend the ThinLTO threads to all SMT/CMT threads in the system. Only one thread per core was allowed, instructed by usage of llvm::heavyweight_hardware_concurrency() in the ThinLTO code. Any number passed to the LLD flag /opt:lldltojobs=..., or any other ThinLTO-specific flag, was previously interpreted in the context of llvm::heavyweight_hardware_concurrency(), which means SMT disabled.

One can now say in LLD:
/opt:lldltojobs=0 -- Use one std::thread / hardware core in the system (no SMT). Default value if flag not specified.
/opt:lldltojobs=N -- Limit usage to N threads, regardless of usage of heavyweight_hardware_concurrency().
/opt:lldltojobs=all -- Use all hardware threads in the system. Equivalent to /opt:lldltojobs=$(nproc) on Linux and /opt:lldltojobs=%NUMBER_OF_PROCESSORS% on Windows. When an affinity mask is set for the process, threads will be created only for the cores selected by the mask.

When N > number-of-hardware-threads-in-the-system, the threads in the thread pool will be dispatched equally on all CPU sockets (tested only on Windows).
When N <= number-of-hardware-threads-on-a-CPU-socket, the threads will remain on the CPU socket where the process started (only on Windows).

Differential Revision: https://reviews.llvm.org/D75153
2020-03-27 10:20:58 -04:00
Sam Parker 0e6aa08381 [ARM][MVE] Add DoubleWidthResult flag
Add a flag for those instructions which read from the top/bottom
halves of their inputs and produce a vector of results with double
width elements.

Differential Revision: https://reviews.llvm.org/D76762
2020-03-27 13:44:04 +00:00
Sjoerd Meijer 401a324c51 [LV] Refactor widenIntOrFpInduction. NFC.
This untangles the logic in widenIntOrFpInduction in order to make more
explicit and visible how exactly the induction variable is lowered.

Differential Revision: https://reviews.llvm.org/D76686
2020-03-27 12:58:50 +00:00
Guillaume Chatelet e2ef6127d9 [Alignment] Fix overaligning bug
Summary:
This was discovered while converting to Align type.

See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76914
2020-03-27 12:57:50 +00:00
Simon Pilgrim d6ddabd7ef Revert rG6ff1ea3244c543ad24fc99c7f4979db2f2078593 "Fix "use of uninitialized variable" static analyzer warning. NFCI."
@dblaikie noticed that this may interfere with msan analysis
2020-03-27 11:44:03 +00:00
Jonas Paulsson 35173dddd1 [SystemZ] Fix typos in comments. 2020-03-27 12:31:48 +01:00
David Green 8689f98e9b [ARM] Fix MVE VCMPr f16 pattern
This patterns seemed to be using the f32 instruction, not f16. Fix it to
use the correct one.

Differential Revision: https://reviews.llvm.org/D76841
2020-03-27 11:18:24 +00:00
Georgii Rymar 30c1f9a558 [llvm-readobj] - Fix a crash when DT_STRTAB is broken.
We might have a crash scenario when we have an invalid DT_STRTAB value
that is larger than the file size. I've added a test case to demonstrate.

Differential revision: https://reviews.llvm.org/D76706
2020-03-27 13:18:08 +03:00
Guillaume Chatelet a98662f4c1 [Alignment][NFC] Update MachineMemOperand implementation to use MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Reviewed By: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76625
2020-03-27 08:06:10 +00:00
Fangrui Song 6728a9ae19 [MCInstPrinter] Add parameter `Address` to printCustomAliasOperand. NFC
Follow-up of D72172 and llvmorg-11-init-6896-gb3cc5dcef0f.
2020-03-27 00:38:20 -07:00
Johannes Doerfert 095cecbe0d [OpenMP] `omp begin/end declare variant` - part 1, parsing
This is the first part extracted from D71179 and cleaned up.

This patch provides parsing support for `omp begin/end declare variant`,
as defined in OpenMP technical report 8 (TR8) [0].

A major purpose of this patch is to provide proper math.h/cmath support
for OpenMP target offloading. See PR42061, PR42798, PR42799. The current
code was developed with this feature in mind, see [1].

[0] https://www.openmp.org/wp-content/uploads/openmp-TR8.pdf
[1] https://reviews.llvm.org/D61399#change-496lQkg0mhRN

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D74941
2020-03-27 02:30:58 -05:00
Fangrui Song b3cc5dcef0 [MCInstPrinter] Add parameter `Address` to MCInstPrinter::printAliasInstr. NFC
Follow-up of D72172.
2020-03-27 00:03:32 -07:00
Shengchen Kan 1fb4f99a21 [X86][MC] Fix the bug for prefix padding support
Summary:
There is a tiny logic error of D75300, making branch is not
correctly aligned with option -x86-pad-max-prefix-size

Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76285
2020-03-27 14:16:09 +08:00
Juneyoung Lee 1bcc500b48 [DAGCombine] Add basic optimizations for FREEZE in SelDag
Summary: This patch is the first effort to adding basic optimizations for FREEZE in SelDag.

Reviewers: spatel, lebedev.ri

Reviewed By: spatel

Subscribers: xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76707
2020-03-27 12:20:39 +09:00
Zakk Chen 64fe841856 Fix typo, targetFeature should be lowercase.
this fixing also enable llc -mattr=+cpuhelp

Reviewers: ziangwan, kongyi

Reviewed By: kongyi

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76757
2020-03-26 19:40:04 -07:00
Kai Wang 1a6b7318dd [NFC] Clang format for the ELF header and ARM build attributes.
Differential Revision: https://reviews.llvm.org/D76819
2020-03-27 09:53:12 +08:00
Leonard Chan 5d929e6646 Move setBugReportMsg() out from under a conditional
Fixes a build break with LLVM_ENABLE_BACKTRACES=OFF.

Differential Revision: https://reviews.llvm.org/D76893
2020-03-26 16:39:03 -07:00
Dan Gohman d865437d9c [WebAssembly] Fix the order of destructors in the LowerGlobalDtors pass.
Fix the LowerGlobalDtors pass to run destructors in the same order as the
regular LLVM destructor lowering -- in reverse order. Adjacent
destructors with the same associated object are grouped, but destructors
are not reordered based on associated objects.

Differential Revision: https://reviews.llvm.org/D70685
2020-03-26 16:19:02 -07:00
Stanislav Mekhanoshin 4c4b71843b [AMDGPU] Propagate amdgpu-waves-per-eu to callees
Differential Revision: https://reviews.llvm.org/D76868
2020-03-26 14:43:44 -07:00
Craig Topper 9f7d4150b9 [X86] Move combineLoopMAddPattern and combineLoopSADPattern to an IR pass before SelecitonDAG.
These transforms rely on a vector reduction flag on the SDNode
set by SelectionDAGBuilder. This flag exists because SelectionDAG
can't see across basic blocks so SelectionDAGBuilder is looking
across and saving the info. X86 is the only target that uses this
flag currently. By removing the X86 code we can remove the flag
and the SelectionDAGBuilder code.

This pass adds a dedicated IR pass for X86 that looks across the
blocks and transforms the IR into a form that the X86 SelectionDAG
can finish.

An advantage of this new approach is that we can enhance it to
shrink the phi nodes and final reduction tree based on the zeroes
that we need to concatenate to bring the partially reduced
reduction back up to the original width.

Differential Revision: https://reviews.llvm.org/D76649
2020-03-26 14:10:20 -07:00
Simon Pilgrim ad36491ebb [X86] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on all targets
Extends rG9d1721ce3926 to support AVX2+ targets.
2020-03-26 20:46:24 +00:00
Jay Foad 0fe096c4e9 [AMDGPU] Rename overloaded getMaxWavesPerEU to getWavesPerEUForWorkGroup
Summary: I think Max in the name was misleading. NFC.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76860
2020-03-26 20:21:04 +00:00
Jay Foad bb9c4fd7ea [AMDGPU] Remove getMaxWavesPerCU in favour of getWavesPerWorkGroup.
Summary:
These methods were identical. I chose to remove getMaxWavesPerCU because
I think Max in the name was misleading. NFC.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76859
2020-03-26 20:21:04 +00:00
Derek Schuff e110897e28 [WEbAssembly] Clear frame base vreg in explicit-locals when stack pointer is dead
Having an alloca in a function causes the stack pointer to be generated in the
prolog, but if it's unused other than for debug info, explicit-locals will drop
it and not allocate a local. In this case we need to reset the FrameBaseVreg.

Differential Revision: https://reviews.llvm.org/D76784
2020-03-26 13:07:32 -07:00
Simon Pilgrim 39a52a19ed [X86] lowerV16I8Shuffle - create v8i16 mask for PACKUS(AND(),AND()) patterns.
We can improve computeKnownBits results by avoiding excess bitcasts.

For this pattern we were doing:

  (v16i8 PACKUS(v8i16 BITCAST(v16i8 AND(V1, MASK)), v8i16 BITCAST(v16i8 AND(V2, MASK))))

By performing the MASK/AND with a v8i16 type and bitcasting V1/V2 directly we can help computeKnownBits see that the mask is clearing the upper bits and allows shuffle combining to peek through later on.

This will be necessary to extend rG9d1721ce3926 to AVX2+ targets in a future patch.
2020-03-26 19:59:57 +00:00
diggerlin fdfe411e7c [AIX] discard the label in the csect of function description and use qualname for linkage
SUMMARY:

SUMMARY
for a source file  "test.c"

void foo() {};

llc will generate assembly code as (assembly patch)
     .globl  foo
     .globl  .foo
     .csect foo[DS]
foo:

        .long   .foo
        .long   TOC[TC0]
        .long   0

   and symbol table as (xcoff object file)
   [4]     m   0x00000004     .data     1  unamex                    foo
   [5]     a4  0x0000000c       0    0     SD       DS    0    0
   [6]     m   0x00000004     .data     1  extern                    foo
   [7]     a4  0x00000004       0    0     LD       DS    0    0

   After first patch, the assembly will be as

        .globl  foo[DS]                 # -- Begin function foo
        .globl  .foo
        .align  2
        .csect foo[DS]
        .long   .foo
        .long   TOC[TC0]
        .long   0

    and symbol table will as
   [6]     m   0x00000004     .data     1  extern                    foo
   [7]     a4  0x00000004       0    0     DS      DS    0    0
Change the code for the assembly path and xcoff objectfile patch for llc.

Reviewers: Jason Liu
Subscribers: wuzish, nemanjai, hiraditya

Differential Revision: https://reviews.llvm.org/D76162
2020-03-26 15:46:52 -04:00
Scott Linder bd12ecb88f [AMDGPU] Fix PC register mapping in wave32 mode
Summary:
The PC_32 DWARF register is for a 32-bit process address space which we
don't implement in AMDGCN; another way of putting this is that the size
of the PC register is not a function of the wavefront size. If we ever
implement a 32-bit process address space we will need to add two more
DwarfFlavours i.e. we will need to represent the product of (wave32,
wave64) x (64-bit address space, 32-bit address space).

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76732
2020-03-26 14:43:25 -04:00
David Blaikie 9002db05a2 Roll otherwise unused subexpressions into an assertion 2020-03-26 11:32:33 -07:00
Guillaume Chatelet b727aabcb8 [Alignment][NFC] Use llvmTargetFrameLowering::getStackAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Reviewed By: courbet

Subscribers: wuzish, arsenm, jyknight, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, fedor.sergeev, jrtc27, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76613
2020-03-26 18:15:53 +00:00
Jonathan Roelofs 7a89a5d81b [InstCombine] Fix Incorrect fold of ashr+xor -> lshr w/ vectors
Fixes https://bugs.llvm.org/show_bug.cgi?id=43665
2020-03-26 12:09:36 -06:00
Jay Foad 0602c20b1b [AMDGPU] Make use of divideCeil. NFC. 2020-03-26 16:11:35 +00:00
Jay Foad 596bed3fd3 [AMDGPU] Remove unused methods. NFC. 2020-03-26 16:11:35 +00:00
Justin Hibbits 459e8e9488 [PowerPC]: Don't allow r0 as a target for LD_GOT_TPREL_L/32
Summary:
The linker is free to relax this (relocation R_PPC_GOT_TPREL16) against
R_PPC_TLS, if it sees fit (initial exec to local exec).  If r0 is used,
this can generate execution-invalid code (converts to 'addi %rX, %r0,
FOO, which translates in PPC-lingo to li %rX, FOO).  Forbid this
instead.

This fixes static binaries using locales on FreeBSD/powerpc
(tested on FreeBSD/powerpcspe).

Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D76662
2020-03-26 10:59:28 -05:00
Simon Pilgrim 9d1721ce39 [X86][SSE] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on pre-AVX2 targets
As discussed on PR31443, we should be trying to use PACKUS for binary truncation patterns to reduce the number of shuffles.

The plan is to support AVX2+ targets once we've worked around PR45315 - we fail to peek through a VBROADCAST_LOAD mask to recognise zero upper bits in a PACKUS pattern.

We should also be able to add support for v8i16 and possibly 256/512-bit vectors as well.
2020-03-26 15:47:43 +00:00
Fangrui Song 3eef47407b [PPCInstPrinter] Change printBranchOperand(calltarget) to print the target address in hexadecimal form
```
// llvm-objdump -d output (before)
0: bl .-4
4: bl .+0
8: bl .+4

// llvm-objdump -d output (after) ; GNU objdump -d
0: bl 0xfffffffc / bl 0xfffffffffffffffc
4: bl 0x4
8: bl 0xc
```

Many Operand's are not annotated as OPERAND_PCREL.
They are not affected (e.g. `b .+67108860`). I plan to fix them in future patches.

Modified test/tools/llvm-objdump/ELF/PowerPC/branch-offset.s to test
address space wraparound for powerpc32 and powerpc64.

Reviewed By: sfertile, jhenderson

Differential Revision: https://reviews.llvm.org/D76591
2020-03-26 08:32:29 -07:00
Fangrui Song 87de9a0786 [X86InstPrinter] Change printPCRelImm to print the target address in hexadecimal form
```
// llvm-objdump -d output (before)
400000: e8 0b 00 00 00   callq 11
400005: e8 0b 00 00 00   callq 11

// llvm-objdump -d output (after)
400000: e8 0b 00 00 00  callq 0x400010
400005: e8 0b 00 00 00  callq 0x400015

// GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled
400000: e8 0b 00 00 00  callq 400010
400005: e8 0b 00 00 00  callq 400015
```

In llvm-objdump, we pass the address of the next MCInst. Ideally we
should just thread the address of the current address, unfortunately we
cannot call X86MCCodeEmitter::encodeInstruction (X86MCCodeEmitter
requires MCInstrInfo and MCContext) to get the length of the MCInst.

MCInstPrinter::printInst has other callers (e.g llvm-mc -filetype=asm, llvm-mca) which set Address to 0.
They leave MCInstPrinter::PrintBranchImmAsAddress as false and this change is a no-op for them.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D76580
2020-03-26 08:28:59 -07:00
Fangrui Song 5fad05e80d [MCInstPrinter] Pass `Address` parameter to MCOI::OPERAND_PCREL typed operands. NFC
Follow-up of D72172 and D72180

This patch passes `uint64_t Address` to print methods of PC-relative
operands so that subsequent target specific patches can change
`*InstPrinter::print{Operand,PCRelImm,...}` to customize the output.

Add MCInstPrinter::PrintBranchImmAsAddress which is set to true by
llvm-objdump.

```
// Current llvm-objdump -d output
aarch64: 20000: bl #0
ppc:     20000: bl .+4
x86:     20000: callq 0

// Ideal output
aarch64: 20000: bl 0x20000
ppc:     20000: bl 0x20004
x86:     20000: callq 0x20005

// GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled
aarch64: 20000: bl 20000
ppc:     20000: bl 0x20004
x86:     20000: callq 20005
```

In `lib/Target/X86/X86GenAsmWriter1.inc` (generated by `llvm-tblgen -gen-asm-writer`):

```
   case 12:
     // CALL64pcrel32, CALLpcrel16, CALLpcrel32, EH_SjLj_Setup, JCXZ, JECXZ, J...
-    printPCRelImm(MI, 0, O);
+    printPCRelImm(MI, Address, 0, O);
     return;
```

Some targets have 2 `printOperand` overloads, one without `Address` and
one with `Address`. They should annotate derived `Operand` properly with
`let OperandType = "OPERAND_PCREL"`.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D76574
2020-03-26 08:21:15 -07:00
Dominik Montada 9fedb6900d [GlobalISel] add helper function to create arbitrary libcalls
Summary:
The existing helper function can only create a libcall to functions available in
RTLIB. Add a helper function that can create a libcall to a given function name
using the provided calling convention.

Reviewers: aditya_nandakumar, t.p.northover, rovka, arsenm, dsanders

Reviewed By: arsenm

Subscribers: wdng, hiraditya, volkan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76845
2020-03-26 16:11:13 +01:00
Qiu Chaofan 172456c775 [Legalizer] Fix some flags miss in vector results
In some scalarize/split result methods (unary, binary, ...), flags in
SDNode were not passed down, which may lead to unexpected results in
unsafe float-point optimization. This patch fixes them. (maybe not
complete)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D76832
2020-03-26 22:01:19 +08:00
Simon Pilgrim e30d29ebc1 [X86][SSE] getFauxShuffleMask - peek through TRUNCATE/AEXT/ZEXT for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT())
As long we extract from a source vector with smaller elements and we zero-extend the element in the final shuffle mask then we can safely peek through truncations and any/zero-extensions to find the source extraction.
2020-03-26 11:57:45 +00:00
Jonas Paulsson 8bf9e317e4 [SystemZ] Bugfix in tieOpsIfNeeded()
This function did a check which was broken to see if an opcode requires op0
and op1 to be tied. By chance this is NFC.

Review: Ulrich Weigand
2020-03-26 12:22:14 +01:00
gbreynoo a945037e8f Tools emit the bug report URL on crash
When Clang crashes a useful message is output:

"PLEASE submit a bug report to https://bugs.llvm.org/ and include the
crash backtrace, preprocessed source, and associated run script."

A similar message is now output for all tools.

Differential Revision: https://reviews.llvm.org/D74324
2020-03-26 10:26:59 +00:00
Kang Zhang 4673699a47 [PowerPC] Remove the repeated definition for some InstAlias for mtspr/mfspr
Summary:
Below InstAlias have been redefined, this patch is to remove the repeated
definition.
mtdec/mfdec mtsdr1/mfsdr1 mtsrr0/mfsrr0 mtsrr1/mfsrr1 mtasr

Reviewed By: nemanjai, steven.zhang

Differential Revision: https://reviews.llvm.org/D75821
2020-03-26 09:58:30 +00:00
Cullen Rhodes 9086db707d [AArch64][SVE] Implement structured store intrinsics
Summary:
This patch adds initial support for the following intrinsics:

    * llvm.aarch64.sve.st2
    * llvm.aarch64.sve.st3
    * llvm.aarch64.sve.st4

For storing two, three and four vectors worth of data. Basic codegen for
reg+immediate forms are implemented. Reg+reg addressing modes will be
addressed in a later patch.

These intrinsics are intended for use in the Arm C Language Extension
(ACLE).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75947
2020-03-26 09:34:51 +00:00
Ties Stuij 71ae267d1f [PATCH] [ARM] ARMv8.6-a command-line + BFloat16 Asm Support
Summary:
This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

in addition to the GCC patch for the 8..6-a CLI:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html

In detail this patch

- march options for armv8.6-a
- BFloat16 assembly

This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.

Based on work by:
- labrinea
- MarkMurrayARM
- Luke Cheeseman
- Javed Asbar
- Mikhail Maltsev
- Luke Geeson

Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson

Reviewed By: SjoerdMeijer

Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D76062
2020-03-26 09:17:20 +00:00
David Green 37b9cc8f29 [ARM] Sink splats to vector float instructions
Some MVE floating point instructions have gpr register variants that take
the scalar gpr value and splat them to all lanes. In order to accept
them in loops, the shuffle_vector and insert need to be sunk down into
the loop, next to the instruction so that ISel can see the whole
pattern.

This does that sinking for FAdd, FSub, FMul and FCmp. The patterns for
mul are slightly more constrained as there are no fms variants taking
register arguments.

Differential Revision: https://reviews.llvm.org/D76023
2020-03-26 09:02:18 +00:00
Fangrui Song 4c52d51e78 [InstCombine] Fix a code-sinking bug after D73832/f1a9efabcb9b
- UserParent = PN->getIncomingBlock(*I->use_begin());
+ UserParent = PN->getIncomingBlock(*SingleUse);

The first use of I may be droppable (llvm.assume).

When compiling llvm/lib/IR/AutoUpgrade.cpp with a bootstrapped clang
with ThinLTO with minimized bitcode files, I see such a case in
the function _ZN4llvm20UpgradeIntrinsicCallEPNS_8CallInstEPNS_8FunctionE

  clang -c -fthinlto-index=AutoUpgrade.o.thinlto.bc AutoUpgrade.bc -O3

Unfortunately it is really difficult to get a minimized reproduce.
2020-03-25 22:50:53 -07:00
John McCall 9514c048d8 Use optimal layout and preserve alloca alignment in coroutine frames.
Previously, we would ignore alloca alignment when building the frame
and just use the natural alignment of the allocated type.  If an alloca
is over-aligned for its IR type, this could lead to a frame entry with
inadequate alignment for the downstream uses of the alloca.

Since highly-aligned fields also tend to produce poor layouts under a
naive layout algorithm, I've also switched coroutine frames to use the
new optimal struct layout algorithm.

In order to communicate the frame size and alignment to later passes,
I needed to set align+dereferenceable attributes on the frame-pointer
parameter of the resume function.  This is clearly the right thing to
do, but the align attribute currently seems to result in assumptions
being added during inlining that the optimizer cannot easily remove.
2020-03-26 00:51:09 -04:00
QingShan Zhang 1ef7bf4121 [PowerPC] Improve the way legalize mul for v8i16 and add pattern to match mul + add
We can legalize the operation MUL for v8i16 with instruction (vmladduhm A, B, 0)
if altivec enabled. Now, it is set as custom and expand it later, which is not
the right way. And then, we can add the pattern to match the mul + add with (vmladduhm A, B, C)

Reviewed By: Nemanjai

Differential Revision: https://reviews.llvm.org/D76751
2020-03-26 04:46:49 +00:00
Stanislav Mekhanoshin e06d707aa2 [AMDGPU] Fixed function traversal in attribute propagation
AMDGPUPropagateAttributes pass was skipping some of the functions
when cloning. Functions were added to root set and then skipped
on the next interation because they are already in the root set,
while were meant to be processed with different features.

Differential Revision: https://reviews.llvm.org/D76815
2020-03-25 18:47:09 -07:00
Stanislav Mekhanoshin 6e00e3fcb0 [AMDGPU] Preserve original symbol during attribute propagation
AMDGPUPropagateAttributes can swap names while cloning a function.
Only do it if original symbol was not externally visible.

Differential Revision: https://reviews.llvm.org/D76789
2020-03-25 15:26:30 -07:00
Tyker f1a9efabcb Ignore/Drop droppable uses for code-sinking in InstCombine
Summary:
This patch allows code-sinking in InstCombine to be performed when instruction have uses in llvm.assume.

Use are considered droppable when it is preferable to modify the User such that the use disappears rather than to prevent a transformation because of the use.
for now uses are considered droppable if they are in an llvm.assume.

Reviewers: jdoerfert, nikic, spatel, lebedev.ri, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73832
2020-03-25 20:42:52 +01:00
Alina Sbirlea 3abcbf9903 [CFG/BasicBlock] Rename succ_const to const_succ. [NFC]
Summary:
Rename `succ_const_iterator` to `const_succ_iterator` and
`succ_const_range` to `const_succ_range` for consistency with the
predecessor iterators, and the corresponding iterators in
MachineBasicBlock.

Reviewers: nicholas, dblaikie, nlewycky

Subscribers: hiraditya, bmahjour, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75952
2020-03-25 12:40:55 -07:00
Heejin Ahn f93426c5b9 [WebAssembly] Move event section before global section
Summary:
https://github.com/WebAssembly/exception-handling/issues/98

Also this moves many parts of code to make code align with the section
order, even if they don't affect the output.

Reviewers: tlively

Subscribers: dschuff, sbc100, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76752
2020-03-25 11:49:03 -07:00
Nico Weber d7888149aa Suppress a few -Wunreachable-code warnings.
No behavior change. Also fix a comment to say match reality.
2020-03-25 13:55:42 -04:00
Simon Pilgrim c6e5531f9b [X86][AVX] Combine shuffles to TRUNCATE/VTRUNC patterns
Add support for combining shuffles to AVX512 truncate instructions - another step toward fixing D56387/D66004. It also fixes SKX code on PR31443.

We could probably extend this further to handle non-VLX truncation cases.
2020-03-25 17:41:51 +00:00
Gil Rapaport 078c863305 [LV] Replace stored value with a VPValue (NFCI)
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit introduces a VPValue for VPWidenMemoryInstructionRecipe to use as
the stored value. The recipe is generated with a VPValue wrapping the stored
value of the scalar store. This reduces ingredient def-use usage by ILV as a
step towards full VPlan-based def-use relations.

Differential Revision: https://reviews.llvm.org/D76373
2020-03-25 19:36:55 +02:00
Tyker d72c586aeb [NFC] Rename function to match Coding Convention and fix typo in KnowledgeRetention 2020-03-25 18:31:13 +01:00
Mikhail Maltsev bb4da94e5b [ARM,CDE] Implement predicated Q-register CDE intrinsics
Summary:
This patch implements the following CDE intrinsics:

  T __arm_vcx1q_m(int coproc, T inactive, uint32_t imm, mve_pred_t p);
  T __arm_vcx2q_m(int coproc, T inactive, U n, uint32_t imm, mve_pred_t p);
  T __arm_vcx3q_m(int coproc, T inactive, U n, V m, uint32_t imm, mve_pred_t p);

  T __arm_vcx1qa_m(int coproc, T acc, uint32_t imm, mve_pred_t p);
  T __arm_vcx2qa_m(int coproc, T acc, U n, uint32_t imm, mve_pred_t p);
  T __arm_vcx3qa_m(int coproc, T acc, U n, V m, uint32_t imm, mve_pred_t p);

The intrinsics are not part of the released ACLE spec, but internally at
Arm we have reached consensus to add them to the next ACLE release.

Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen

Reviewed By: simon_tatham

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76610
2020-03-25 17:08:19 +00:00
Yvan Roux bd069ad39c [ARM] Move ConstantIsland and LowOverheadLoops Passes.
Move ARM ConstantIsland and LowOverheadLopps passes later in the pipeline
such that they will be run after the upcoming Machine Outlining pass.

Differential Revision: https://reviews.llvm.org/D76065
2020-03-25 16:49:21 +01:00
cdevadas ce984129ea [AMDGPU] Add SIPreEmitPeephole pass.
This pass can handle all the optimization
opportunities found just before code emission.
Presently it includes the handling of vcc branch
optimization that was handled earlier in SIInsertSkips.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D76712
2020-03-25 15:35:35 +00:00
Jonas Paulsson f09b891d4a [SystemZ] Improve foldMemoryOperandImpl()
A spilled load of an immediate can use MVHI/MVGHI instead.
A compare of a spilled register against an immediate can use CHSI/CGHSI.
A logical compare can use CLFHSI/CLGHSI.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D76055
2020-03-25 16:21:08 +01:00
Sean Fertile 3282d875d6 [PowerPC][AIX] ByVal formal arguments in a single register.
Adds support for passing ByVal formal arguments as long as they fit in a
single register.

Differential Revision: https://reviews.llvm.org/D76401
2020-03-25 11:09:40 -04:00
Kerry McLaughlin 05606329e2 [AArch64][SVE] Add SVE intrinsics for masked loads & stores
Summary:
Implements the following intrinsics for contiguous loads & stores:
  - @llvm.aarch64.sve.ld1
  - @llvm.aarch64.sve.st1

Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, dancgr, rengolin

Reviewed By: cameron.mcinally

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76688
2020-03-25 11:48:40 +00:00
Sam Parker e87250202d [ARM][MVE] Add HorizontalReduction flag
Add a target flag for instructions that reduce into one, or more,
scalar reg(s), including variants of:
- VADDV
- VABAV
- VMINV/VMAXV
- VMLADAV

Differential Revision: https://reviews.llvm.org/D76683
2020-03-25 11:12:03 +00:00
Kazushi (Jam) Marukawa 28a42dd1b9 [VE] Change name of enum to CondCode
Summary: Change enum name for condition codes from CondCodes to CondCode.

Reviewers: arsenm, simoll, k-ishizaka

Reviewed By: arsenm

Subscribers: wdng, hiraditya, llvm-commits

Tags: #llvm, #ve

Differential Revision: https://reviews.llvm.org/D76747
2020-03-25 09:20:05 +01:00
Juneyoung Lee 453eac3f77 Minor fixes to a comment in CodeGenPrepare 2020-03-25 16:34:43 +09:00
Adrian Prantl ed8ad6ec15 Add an -object-path-prefix option to dsymutil
to remap object file paths (but no source paths) before
processing. This is meant to be used for Clang objects where the
module cache location was remapped using ``-fdebug-prefix-map``; to
help dsymutil find the Clang module cache.

<rdar://problem/55685132>

Differential Revision: https://reviews.llvm.org/D76391
2020-03-24 17:13:42 -07:00
Matt Arsenault 39c55cef21 GlobalISel: Introduce bitcast legalize action
For some operations, the type is unimportant and only the number of
bits matters. For example I don't want to treat <4 x s8> as a legal
type, but I also don't want to decompose loads of this into smaller
pieces to get legal register types.

On AMDGPU in SelectionDAG, we legalize a number of operations (most
notably load and store) by coercing all types to vectors of i32. For
GlobalISel, I'm trying very hard to avoid doing this for every type,
but I don't think this strategy can be completely avoided. I'm trying
to avoid bitcasts for any legitimately legal type we can operate on,
since the intervening bitcasts have proven to be a hassle.

For loads, I think I can get away without ever casting the result
type, and handling any arbitrary bitwidth during selection (I will
eventually want new tablegen support to help with this, rather than
having to add every possible type as legal). The unmerge required to
do anything with the value should expand to the expected shifts. This
is trickier for stores, since it would now require handling a wide
array of truncates during selection which I don't want.

Future potentially interesting case are for vector indexing, where
sub-dword type should be indexed in s32 pieces.
2020-03-24 19:33:33 -04:00
Nikita Popov ec184dd548 [LVI] Convert some checks to assertions; NFC
solveBlockValue() should only be called if the value isn't cached
yet. Similarly, it does not make sense to "solve" a constant.
2020-03-24 23:11:13 +01:00
Amara Emerson 472d282046 [AArch64][GlobalISel] Don't localize TLS G_GLOBAL_VALUEs on Darwin.
On Darwin these need to be selected into a function call for the TLS
address lookup. As a result, they can't be moved below a physreg write,
which happens in call sequences. In the long term, we should have some
mechanism in the localizer to prevent localizing into target-specific
atomic instruction sequences.

rdar://60056248

Differential Revision: https://reviews.llvm.org/D76652
2020-03-24 13:35:50 -07:00
Johannes Doerfert 5699d08b79 [Attributor] Use knowledge retained in llvm.assume (operand bundles)
This patch integrates operand bundle llvm.assumes [0] with the
Attributor. Most IRAttributes will now look at uses of the associated
value and if there are llvm.assume operand bundle uses with the right
tag we will check if they are in the must-be-executed-context (around
the context instruction). Droppable users, which is currently only
llvm::assume, are handled special in some places now as well.

[0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D74888
2020-03-24 15:33:40 -05:00
Craig Topper e8d67ada2d [X86] Disable autoupgrade support for avx512.mask.broadcasti32x2.* and avx512.mask.broadcastf32x2.*.
These intrinsics take a v4i32/v4f32 input and are supposed to
broadcast elements 0 and 1. Instead the autoupgrade code was
broadcasting elements 0, 1, 2, and 3.

I could fix the autoupgrade, but since its been broken for years
it seemed better just to steer anyone still trying to use it away
completely.
2020-03-24 12:35:24 -07:00