Commit Graph

129445 Commits

Author SHA1 Message Date
Anna Welker 7cd1cfdd6b [NFC][TTI] Add Alignment for isLegalMasked[Gather/Scatter]
Add an extra parameter so alignment can be taken under
consideration in gather/scatter legalization.

Differential Revision: https://reviews.llvm.org/D71610
2019-12-18 09:14:39 +00:00
Wang, Pengfei 8cc0b58673 [X86] Add calculation for elements in structures in getting uniform base for the Gather/Scatter intrinsic.
Summary: Add calculation for elements in structures in getting uniform
base for the Gather/Scatter intrinsic.

Reviewers: craig.topper, c-rhodes, RKSimon

Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71442
2019-12-18 12:24:58 +08:00
Wang, Pengfei 1949235d13 [X86] Add strict fma support
Summary: Add strict fma support

Reviewers: craig.topper, RKSimon, LiuChen3

Subscribers: hiraditya, llvm-commits, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71604
2019-12-18 11:44:00 +08:00
Nemanja Ivanovic a5da8d90da [PowerPC] Add missing legalization for vector BSWAP
We somehow missed doing this when we were working on Power9 exploitation.
This just adds the missing legalization and cost for producing the vector
intrinsics.

Differential revision: https://reviews.llvm.org/D70436
2019-12-17 19:07:34 -06:00
Craig Topper 004fdbe041 [X86] Manually format some setOperationAction calls to line up arguments to improve readability. NFC 2019-12-17 16:11:31 -08:00
Xiaoqing Wu a17619e0b0 [AArch64][GlobalISel]: Fix a crash in GlobalIsel in dealing with 16bit uadd.with.overflow.
Summary: AArch64 doesn't support uadd.with.overflow.i16 natively. This change adds a legalization rule to convert the 32bit add result to 16bit. This should fix PR43981.

Reviewers: arsenm, qcolombet, paquette, aemerson

Reviewed By: paquette

Subscribers: wdng, rovka, kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71587
2019-12-17 16:05:00 -08:00
Craig Topper c36773c78e [FPEnv][LegalizeTypes] Make ScalarizeVecOp_STRICT_FP_ROUND do its own replacements and return SDValue()
The caller will assert for nodes with more than 2 results unless
we return a null SDValue.

I tried to test this by copying an AArch64 test for ScalarizeVecOp_FP_ROUND.
While it did hit the assert and this commited fixed that. It also
hit a later problem that couldn't be fixed without adding strict
FP support to AArch64.
2019-12-17 15:17:43 -08:00
Stanislav Mekhanoshin b8ac5894a1 [AMDGPU] Fixed cost model for packed 16 bit ops
Differential Revision: https://reviews.llvm.org/D71622
2019-12-17 15:14:17 -08:00
Thomas Lively f1b351e14a [WebAssembly] Implement SIMD {i8x16,i16x8}.avgr_u instructions
Summary:
These instructions were added to the spec proposal in
https://github.com/WebAssembly/simd/pull/126. Their semantics are
equivalent to `(a + b + 1) / 2`. The opcode for the experimental
i32x4.dot_i16x8_s is also bumped due to a collision with the
i8x16.avgr_u opcode.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71628
2019-12-17 15:05:50 -08:00
Mitch Phillips f827aff859 Revert "[ MC ] Match labels to existing fragments even when switching sections."
This reverts commit 4272372c57.

Caused an MSan buildbot failure. More information available in the patch
that introduced the bug: https://reviews.llvm.org/D71368
2019-12-17 15:04:26 -08:00
Craig Topper 84d8fa30f9 [FPEnv][LegalizeTypes][LegalizeDAG][AArch64] Few fixes/improvements for legalizing fp<->int conversion nodes.
This started with adding a test to support get code coverage on
ScalarizeVecOp_UnaryOp_StrictFP by copying an existing AArch64 test
and using constrained sitofp/uitofp intrinsics.

This found 3 separate issues:
-ScalarizeVecOp_UnaryOp_StrictFP needs to do its own replacement
 because the caller can't handle replacing multiple results.
-Missing integer promotion support for sitofp/uitofp
-Chain result not always assigned in ExpandLegalINT_TO_FP.

Committing them together so I can add the test case.
2019-12-17 14:37:00 -08:00
Whitney Tsang 36bdc3dc35 [LoopFusion] Move instructions from FC0.Latch to FC1.Latch.
Summary:This PR move instructions from FC0.Latch bottom up to the
beginning of FC1.Latch as long as they are proven safe.

To illustrate why this is beneficial, let's consider the following
example:
Before Fusion:
header1:
  br header2
header2:
  br header2, latch1
latch1:
  br header1, preheader3
preheader3:
  br header3
header3:
  br header4
header4:
  br header4, latch3
latch3:
  br header3, exit3

After Fusion (before this PR):
header1:
  br header2
header2:
  br header2, latch1
latch1:
  br header3
header3:
  br header4
header4:
  br header4, latch3
latch3:
  br header1, exit3

Note that preheader3 is removed during fusion before this PR.
Notice that we cannot fuse loop2 with loop4 as there exists block latch1
in between.
This PR move instructions from latch1 to beginning of latch3, and remove
block latch1. LoopFusion is now able to fuse loop nest recursively.

After Fusion (after this PR):
header1:
  br header2
header2:
  br header3
header3:
  br header4
header4:
  br header2, latch3
latch3:
  br header1, exit3

Reviewer: kbarton, jdoerfert, Meinersbur, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: kbarton, Meinersbur
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71165
2019-12-17 22:10:23 +00:00
David Tenty 84161f18cc [AIX] Avoid unset csect assert for functions defined after their use in TOC
Summary:
If a function is defined after it appears in a TOC expression, we may
try to access an unset containing csect when returning a symbol for the
expression.

Reviewers: Xiangling_L, DiggerLin, jasonliu, hubert.reinterpretcast

Reviewed By: hubert.reinterpretcast

Subscribers: hubert.reinterpretcast, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71125
2019-12-17 16:59:22 -05:00
Tom Stellard c3bc805f4f AMDGPU/SILoadStoreOptimillzer: Refactor CombineInfo struct
Summary:
Modify CombineInfo to only store information about a single instruction.
This is a little easier to work with and removes a lot of duplicate
initialization code.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm, nhaehnle

Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71045
2019-12-17 13:43:10 -08:00
Mark de Wever 1a8ff89653 [IR] Use a reference in a range-based for
This avoids unneeded copies when using a range-based for loops.

This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.

Differential Revision: https://reviews.llvm.org/D70870
2019-12-17 21:57:58 +01:00
Sourabh Singh Tomar 399273e5eb Recommit "[DebugInfo] Refactored macro related generation,
added a test case for macinfo.dwo emission."

This was reverted in caa4120906,
since it was causing an assertion failure on Windows bots.
This revision is revised to fix that.

Original commit message -

[DebugInfo] Refactored macro related generation, added a test case for macinfo.dwo emission.

Reviewers: dblaikie, aprantl, jini.susan.george

Tags: #debug-info #llvm

Differential Revision: https://reviews.llvm.org/D71008
2019-12-18 02:12:59 +05:30
Stefan Stipanovic fff8ec9813 [Attributor] H2S fix.
Summary: Fixing issues that were noticed in D71521

Reviewers: jdoerfert, lebedev.ri, uenoku

Subscribers:

Differential Revision: https://reviews.llvm.org/D71564
2019-12-17 20:41:09 +01:00
Jay Foad 0412f518dc [AMDGPU] Fix typo in SIInstrInfo::memOpsHaveSameBasePtr
Summary:
The typo has been present since memOpsHaveSameBasePtr was introduced in
r313208.

It caused SIInstrInfo::shouldClusterMemOps to cluster more mem ops than
it was supposed to.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71616
2019-12-17 18:54:27 +00:00
Sanjay Patel 6a77e36975 [SDAG] adjust isNegatibleForFree calculation to avoid crashing
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets to show the problem more
generally.

We check the number of uses as a predicate for whether some value is free to negate,
but that use count can change as we rewrite the expression in getNegatedExpression().
So something that was marked free to negate during the cost evaluation phase becomes
not free to negate during the rewrite phase (or the inverse - something that was not
free becomes free). This can lead to a crash/assert because we expect that everything
in an expression that is negatible to be handled in the corresponding code within
getNegatedExpression().

This patch adds a hack to work-around the case where we probably no longer detect
that either multiply operand of an FMA isNegatibleForFree which is assumed to be
true when we started rewriting the expression.

Differential Revision: https://reviews.llvm.org/D70975
2019-12-17 13:49:15 -05:00
Sanjay Patel 5b0251da1c Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()"
This reverts commit 36b1232ec5.
Need to adjust commit message - that was a leftover from the earlier version.
2019-12-17 13:47:59 -05:00
Sanjay Patel 36b1232ec5 [SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets to show the problem more
generally.

We check the number of uses as a predicate for whether some value is free to negate,
but that use count can change as we rewrite the expression in getNegatedExpression().
So something that was marked free to negate during the cost evaluation phase becomes
not free to negate during the rewrite phase (or the inverse - something that was not
free becomes free). This can lead to a crash/assert because we expect that everything
in an expression that is negatible to be handled in the corresponding code within
getNegatedExpression().

This patch adds a hack to work-around the case where we probably no longer detect
that either multiply operand of an FMA isNegatibleForFree which is assumed to be
true when we started rewriting the expression.

Differential Revision: https://reviews.llvm.org/D70975
2019-12-17 13:46:06 -05:00
Zakk Chen 2c8e22d25c [RISCV] Add subtargets initialized with target feature
expected failed test (RV32IF-ILP32F) will be fixed in a subsequent patch.

Reviewers: efriedma, lenary, asb

Reviewed By: efriedma, lenary

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70116
2019-12-17 09:34:01 -08:00
Amaury Séchet ff6567cc77 [DAGCombiner] Add node back in the worklist in topological order in CommitTargetLoweringOpt
Summary:
Right now, DAGCombiner process the nodes in an iplementation defined order. This tends to be fragile as optimisation may or may not kick in depending on the traversal order.

This is part of a larger effort to get the DAGCombiner to process its node in topological order.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70921
2019-12-17 18:26:16 +01:00
Ulrich Weigand d1c0f14be8 [SystemZ][FPEnv] Back-end support for STRICT_[SU]INT_TO_FP
As of b1d8576 there is middle-end support for STRICT_[SU]INT_TO_FP,
so this patch adds SystemZ back-end support as well.

The patch is SystemZ target specific except for adding SD patterns
strict_[su]int_to_fp and any_[su]int_to_fp to TargetSelectionDAG.td
as usual.
2019-12-17 18:24:05 +01:00
Piotr Sobczak 65f94b3380 [InstCombine][AMDGPU] Trim more components of *buffer_load
Summary:
Add trimming of unused components of s_buffer_load.

Extend trimming of *buffer_load to also include
unused components at the beginning of vectors and update offset.

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70315
2019-12-17 17:50:07 +01:00
Michael Trent 4272372c57 [ MC ] Match labels to existing fragments even when switching sections.
Summary:
This commit builds upon Derek Schuff's 2014 commit for attaching labels to
existing fragments ( Diff Revision: http://reviews.llvm.org/D5915 )

When temporary labels appear ahead of a fragment, MCObjectStreamer will
track the temporary label symbol in a "Pending Labels" list. Labels are
associated with fragments when a real fragment arrives; otherwise, an empty
data fragment will be created if the streamer's section changes or if the
stream finishes.

This commit moves the "Pending Labels" list into each MCStream, so that
this label-fragment matching process is resilient to section changes. If
the streamer emits a label in a new section, switches to another section to
do other work, then switches back to the first section and emits a
fragment, that initial label will be associated with this new fragment.
Labels will only receive empty data fragments in the case where no other
fragment exists for that section.

The downstream effects of this can be seen in Mach-O relocations. The
previous approach could produce local section relocations and external
symbol relocations for the same data in an object file, and this mix of
relocation types resulted in problems in the ld64 Mach-O linker. This
commit ensures relocations triggered by temporary labels are consistent.

Reviewers: pete, ab, dschuff

Reviewed By: pete, dschuff

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71368
2019-12-17 08:49:25 -08:00
Mitch Phillips 2423774cc2 Revert "Honor -fuse-init-array when os is not specified on x86"
This reverts commit aa5ee8f244.

This change broke the sanitizer buildbots. See comments at the patchset
(https://reviews.llvm.org/D71360) for more information.
2019-12-17 07:36:59 -08:00
Kevin P. Neal b1d8576b0a This adds constrained intrinsics for the signed and unsigned conversions
of integers to floating point.

This includes some of Craig Topper's changes for promotion support from
D71130.

Differential Revision: https://reviews.llvm.org/D69275
2019-12-17 10:06:51 -05:00
alex-t e7f585ed61 PostRA Machine Sink should take care of COPY defining register that is a sub-register by another COPY source operand
Differential Revision: https://reviews.llvm.org/D71132
2019-12-17 15:20:43 +03:00
James Henderson 5666b70fd0 [DebugInfo] Only print a single blank line after an empty line table
Commit 84a9756 added an extra blank line at the end of any line table.
However, a blank line is also printed after the line table header, which
meant that two blank lines in a row were being printed after a header,
if there were no rows. This patch defers the post-header blank line
printing until it has been determined that there are rows to print.

Reviewed by: dblaikie

Differential Revision: https://reviews.llvm.org/D71540
2019-12-17 12:04:09 +00:00
Luís Marques e332a09619 [RISCV][NFC] Trivial cleanup
Fix a typo. Remove two seemingly out-of-date TODO comments.
2019-12-17 11:44:35 +00:00
Kristof Beyls 870f39d310 Fix assertion failure in getMemOperandWithOffsetWidth
This fixes an assertion failure that triggers inside
getMemOperandWithOffset when Machine Sinking calls it on a MachineInstr
that is not a memory operation.

Different backends implement getMemOperandWithOffset differently: some
return false on non-memory MachineInstrs, others assert.

The Machine Sinking pass in at least SinkingPreventsImplicitNullCheck
relies on getMemOperandWithOffset to return false on non-memory
MachineInstrs, instead of asserting.

This patch updates the documentation on getMemOperandWithOffset that it
should return false on any MachineInstr it cannot handle, instead of
asserting. It also adapts the in-tree backends accordingly where
necessary.

Differential Revision: https://reviews.llvm.org/D71359
2019-12-17 10:56:09 +00:00
Guillaume Chatelet 531c1161b9 Resubmit "[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove"
Summary:
This is a resubmit of D71473.

This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients.
Functions will be deprecated one by one and as in tree code is cleaned up.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: aaron.ballman, courbet

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71547
2019-12-17 10:07:46 +01:00
Russell Gallop 531c71118f Revert "[Support] Fix time trace multi threaded support with LLVM_ENABLE_THREADS=OFF"
This reverts commit 2bbcf156ac.

This was failing on systems which use __thread such as
http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/30851
2019-12-17 08:59:05 +00:00
Russell Gallop 2bbcf156ac [Support] Fix time trace multi threaded support with LLVM_ENABLE_THREADS=OFF
Following on from 8ddcd1dc26, which added the support. As pointed out
on D71059 this does not build on some systems with LLVM_ENABLE_THREADS=OFF.

Differential Revision: https://reviews.llvm.org/D71548
2019-12-17 08:42:56 +00:00
Raphael Isemann ccfab8e459 [ObjC][DWARF] Emit DW_AT_APPLE_objc_direct for methods marked as __attribute__((objc_direct))
Summary:
With DWARF5 it is no longer possible to distinguish normal methods and methods with `__attribute__((objc_direct))` by just looking at the debug information
as they are both now children of the of the DW_TAG_structure_type that defines them (before only the `__attribute__((objc_direct))` methods were children).

This means that in LLDB we are no longer able to create a correct Clang AST of a module by just looking at the debug information. Instead we would
need to call the Objective-C runtime to see which of the methods have a `__attribute__((objc_direct))` and then add the attribute to our own Clang AST
depending on what the runtime returns. This would mean that we either let the module AST be dependent on the Objective-C runtime (which doesn't
seem right) or we retroactively add the missing attribute to the imported AST in our expressions.

A third option is to annotate methods with `__attribute__((objc_direct))` as `DW_AT_APPLE_objc_direct` which is what this patch implements. This way
LLDB doesn't have to call the runtime for any `__attribute__((objc_direct))` method and the AST in our module will already be correct when we create it.

Reviewers: aprantl, SouraVX

Reviewed By: aprantl

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D71201
2019-12-17 09:40:36 +01:00
Craig Topper 13ce7c1291 [LegalizeTypes] Pre-size the SmallVectors in ScalarizeVecRes_StrictFPOp and SplitVecRes_StrictFPOp so we don't have to call push_back. NFCI
This avoids grow checking/handling in each iteration of the loop.
2019-12-16 23:42:13 -08:00
Craig Topper c738ebc1f5 [LegalizeTypes] Remove ScalarizeVecRes_STRICT_FP_ROUND in favor of just using ScalarizeVecRes_StrictFPOp. NFCI
It looks like ScalarizeVecRes_StrictFPOp can handle a variable
number of arguments with scalar and vector types so it should
be sufficient.
2019-12-16 23:42:13 -08:00
Craig Topper c4d2bb1ede [LegalizeTypes] Remove the call to SplitVecRes_UnaryOp from SplitVecRes_StrictFPOp. NFCI
It doesn't seem to do anything that SplitVecRes_StrictFPOp can't
do. SplitVecRes_StrictFPOp already handles nodes with a variable
number of arguments and a mix of scalar and vector arguments.
2019-12-16 23:42:13 -08:00
Fangrui Song 97182013c4 [MC] Delete redundant alignment update from MCELFStreamer::EmitCommonSymbol. NFC
EmitValueToAlignment() updates the maximum alignment.
2019-12-16 23:08:32 -08:00
Whitney Tsang ec4749e3b8 Revert "[LoopUtils] Updated deleteDeadLoop() to handle loop nest."
This reverts commit cd09fee3d6.
This reverts commit c066ff11d8.
2019-12-17 03:51:41 +00:00
Johannes Doerfert 0bc3336ac1 [Attributor][NFC] Clang format the Attributor
The Attributor is always kept formatted so diffs are cleaner.

Sometime we get out of sync for various reasons so we need to format the
file once in a while.
2019-12-16 21:03:18 -06:00
Craig Topper 4e48513b47 [SelectionDAG] Add the fpexcept flag to the SelectionDAG dumping output so we can better see when its not propagating.
We're currently losing this flag in type legalization and probably
other places when we expand strict fp nodes. This will make
reading logs easier.
2019-12-16 18:05:11 -08:00
Whitney Tsang c066ff11d8 [LoopUtils] Updated deleteDeadLoop() to handle loop nest.
Reviewer: kariddi, sanjoy, reames, Meinersbur, bmahjour, etiotto,
kbarton
Reviewed By: Meinersbur
Subscribers: mgorny, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D70939
2019-12-17 01:06:14 +00:00
Puyan Lotfi 204dfabfe6 [NFC][llvm][MIRVRegNamerUtils] Moving some switch cases and altering comments. 2019-12-16 18:50:26 -05:00
Puyan Lotfi f63b64c0c3 [llvm][MIRVRegNamerUtils] Adding hashing on CImm / FPImm MachineOperands.
This patch makes it so that cases where multiple instructions that
differ only in their ConstantInt or ConstantFP MachineOperand values no
longer collide. For instance:

%0:_(s1) = G_CONSTANT i1 true
%1:_(s1) = G_CONSTANT i1 false
%2:_(s32) = G_FCONSTANT float 1.0
%3:_(s32) = G_FCONSTANT float 0.0

Prior to this patch the first two instructions would collide together.
Also, the last two G_FCONSTANT instructions would also collide. Now they
will no longer collide.

Differential Revision: https://reviews.llvm.org/D71558
2019-12-16 18:25:04 -05:00
Kamlesh Kumar aa5ee8f244 Honor -fuse-init-array when os is not specified on x86
Currently -fuse-init-array option is not effective when target triple
does not specify os, on x86,x86_64.
i.e.

// -fuse-init-array is not honored.
$ clang -target i386 -fuse-init-array test.c -S

// -fuse-init-array is honored.
$ clang -target i386-linux -fuse-init-array test.c -S

This patch fixes first case.
And does cleanup.

Reviewers: rnk, craig.topper, fhahn, echristo

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D71360
2019-12-16 15:21:23 -08:00
Ana Pazos d7af86bdd0 [RISCV] Added isCompressibleInst() to estimate size in getInstSizeInBytes()
Summary:
Modified compression emitter tablegen backend to emit isCompressibleInst()
check which in turn is used by getInstSizeInBytes() to better estimate
instruction size. Note the generation of compressed instructions in RISC-V
happens late in the assembler therefore instruction size estimate might be off
if computed before.

Reviewers: lenary, asb, luismarques, lewis-revill

Reviewed By: asb

Subscribers: sameer.abuasal, lewis-revill, hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, lenary, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68290
2019-12-16 15:15:10 -08:00
Fangrui Song 002adabb3a [AArch64][SVE] Change pattern generation code to fix -Wimplicit-fallthrough after D71483 2019-12-16 15:09:05 -08:00
Danilo Carvalho Grael f933878991 [AArch64][SVE] Add patterns for logical immediate operations.
Summary:
Add pattern matching for the following SVE logical vector and immediate instructions:
- and/bic, orr/orn, eor/eon.

Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71483
2019-12-16 16:18:34 -05:00
Kit Barton ff07fc66d9 [LoopFusion] Restrict loop fusion to rotated loops.
Summary:
This patch restricts loop fusion to only consider rotated loops as valid candidates.
This simplifies the analysis and transformation and aligns with other loop optimizations.

Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel

Reviewed By: Meinersbur

Subscribers: ormris, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71025
2019-12-16 15:17:29 -05:00
Craig Topper 02f644c59a [InstCombine] Teach removeBitcastsFromLoadStoreOnMinMax not to change the size of a store.
We can change the type as long as we don't change the size.

Fixes PR44306

Differential Revision: https://reviews.llvm.org/D71532
2019-12-16 12:12:54 -08:00
Thomas Lively 3a93756dfb [WebAssembly] Replace SIMD int min/max builtins with patterns
Summary:
The instructions were originally implemented via builtins and
intrinsics so users would have to explicitly opt-in to using
them. This was useful while were validating whether these instructions
should have been merged into the spec proposal. Now that they have
been, we can use normal codegen patterns, so the intrinsics and
builtins are no longer useful.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71500
2019-12-16 11:48:49 -08:00
Jonas Paulsson 49f55dda01 [SystemZ] Improve verification of MachineOperands.
Now that the machine verifier will check for cases of register/immediate
MachineOperands and their correspondence to the MC instruction descriptor,
this patch adds the operand types to the descriptors where they were
previously missing. All MCOI::OPERAND_UNKNOWN operand types have been handled
to get a known type, except for G_... (global isel) instructions.

Review: Ulrich Weigand
https://reviews.llvm.org/D71494
2019-12-16 09:51:54 -08:00
Teresa Johnson 878ab6df03 [TLI] Support for per-Function TLI that overrides available libfuncs
Summary:

Follow-on to D66428 and D71193, to build the TLI per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.

With D71193, the -fno-builtin* flags are converted to function
attributes, so we can now set this information per-function on the TLI.

In this patch, the TLI constructor is changed to take a Function, which
can be used to override the available builtins. The TLI is augmented
with an array that can be used to specify which builtins are not
available for the corresponding function. The available function checks
are changed to consult this override before checking the underlying
module level baseline TLII. New code is added to set this override
array based on the attributes.

I also removed the code that sets availability in the TLII in clang from
the options, which is no longer needed.

I removed a per-Triple caching of TLII objects in the analysis object,
as it is based on the Module's Triple which is the same for all
functions in any case. Is there a case where we would be compiling
multiple Modules with different Triples in one compilation?

Finally, I have changed the legacy analysis wrapper to create and use
the new PM analysis class (TargetLibraryAnalysis) in getTLI. This is
consistent with the behavior of getTTI for the legacy
TargetTransformInfo analysis. This change means that getTLI now creates
a new TLI on each call (although that should be very cheap as we cache
the module level TLII, and computing the per-function
attribute based availability should also be reasonably efficient).
I measured the compile time for a large C++ file with tens of thousands
of functions and as expected there was no increase.

Reviewers: chandlerc, hfinkel, gchatelet

Subscribers: mehdi_amini, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67923
2019-12-16 09:19:30 -08:00
Miloš Stojanović d7efa6b198 [mips] Add an assert in getTargetStreamer()
Check if the TargetStreamer can be accessed.

Differential Revision: https://reviews.llvm.org/D71477
2019-12-16 17:04:55 +01:00
Guillaume Chatelet 4658da10e4 Revert "[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove"
This reverts commit 181ab91efc.
2019-12-16 15:19:49 +01:00
David Tellenbach df0cc105fa Reland [AArch64][MachineOutliner] Return address signing for outlined functions
Summary:
Reland after fixing a bug that allowed outlining of SP modifying instructions
that invalidated return address signing.

During AArch64 frame lowering instructions to enable return address
signing are inserted into functions if needed. Functions generated during
machine outlining don't run through target frame lowering and hence are
missing such instructions.

This patch introduces the following changes:

1. If not all functions that potentially participate in function outlining agree
   on their return address signing scope and their return address signing key,
   outlining is disabled for these functions.
2. If not all functions that potentially participate in function outlining agree
   on their support for v8.3A features, outlining is disabled for these
   functions.
3. If an outlining candidate would outline instructions that modify sp in a way
   that invalidates return address signing, outlining is disabled for that
   particular candidate.
4. If all candidate functions agree on the signing scope, signing key and their
   support for v8.3 features, the outlined function behaves as if it had the
   same scope and key attributes and as if it would provide the same v8.3A
   support as the original functions.

Reviewers: ostannard, paquette

Reviewed By: ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70635
2019-12-16 14:40:45 +01:00
Guillaume Chatelet 181ab91efc [Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove
Summary:
This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients.
Functions will be deprecated one by one and as in tree code is cleaned up.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71473
2019-12-16 13:35:55 +01:00
Andrzej Warzynski c41d2b5ab2 [AArch64][SVE2] Add intrinsics for binary narrowing operations
Summary:
The following intrinsics for binary narrowing add and sub operations are
added:
  * @llvm.aarch64.sve.addhnb
  * @llvm.aarch64.sve.addhnt
  * @llvm.aarch64.sve.raddhnb
  * @llvm.aarch64.sve.raddhnt
  * @llvm.aarch64.sve.subhnb
  * @llvm.aarch64.sve.subhnt
  * @llvm.aarch64.sve.rsubhnb
  * @llvm.aarch64.sve.rsubhnt

Reviewers: sdesmalen, rengolin, efriedma

Reviewed By: sdesmalen, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71424
2019-12-16 12:22:56 +00:00
Kristof Beyls 7f4f07ddf3 [AArch64] Enable emission of stack maps for non-Mach-O binaries on AArch64.
The emission of stack maps in AArch64 binaries has been disabled for all
binary formats except Mach-O since rL206610, probably mistakenly, as far
as I can tell. This patch reverts this to its intended state.

Differential Revision: https://reviews.llvm.org/D70069

Patch by Loic Ottet.
2019-12-16 12:02:47 +00:00
Andrzej Warzynski 7e20c3a71d [Aarch64][SVE] Add intrinsics for scatter stores
Summary:
This patch adds the following SVE intrinsics for scatter stores:
* 64-bit offsets:
  * @llvm.aarch64.sve.st1.scatter (unscaled)
  * @llvm.aarch64.sve.st1.scatter.index (scaled)
* 32-bit unscaled offsets:
  * @llvm.aarch64.sve.st1.scatter.uxtw (zero-extended offset)
  * @llvm.aarch64.sve.st1.scatter.sxtw (sign-extended-offset)
* 32-bit scaled offsets:
  * @llvm.aarch64.sve.st1.scatter.uxtw.index (zero-extended offset)
  * @llvm.aarch64.sve.st1.scatter.sxtw.index (sign-extended offset)
* vector base + immediate:
  * @llvm.aarch64.sve.st1.scatter.imm

Reviewers: rengolin, efriedma, sdesmalen

Reviewed By: efriedma, sdesmalen

Subscribers: kmclaughlin, eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71074
2019-12-16 11:52:53 +00:00
Jay Foad f8495017f0 Fix whitespace. 2019-12-16 10:42:34 +00:00
Bjorn Pettersson e5f07080b8 [BasicBlockUtils] Fix dbg.value elimination problem in MergeBlockIntoPredecessor
Summary:
In commit d60f34c20a (llvm-svn 317128,
PR35113) MergeBlockIntoPredecessor was changed into
discarding some dbg.value intrinsics referring to
PHI values, post-splice due to loop rotation.

That elimination of dbg.value intrinsics did not
consider which dbg.value to keep depending on the
context (e.g. if the variable is changing its value
several times inside the basic block).

In the past that hasn't been such a big problem since
CodeGenPrepare::placeDbgValues has moved the dbg.value
to be next to the PHI node anyway. But after commit
00e238896c CodeGenPrepare isn't doing that
any longer, so we need to be more careful when avoiding
duplicate dbg.value intrinsics in MergeBlockIntoPredecessor.

This patch replaces the code that tried to avoid duplicate
dbg.values by using the RemoveRedundantDbgInstrs helper.

Reviewers: aprantl, jmorse, vsk

Reviewed By: aprantl, vsk

Subscribers: jholewinski, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71480
2019-12-16 11:41:21 +01:00
Bjorn Pettersson 1c49553c19 [BasicBlockUtils] Add utility to remove redundant dbg.value instrs
Summary:
Add a RemoveRedundantDbgInstrs to BasicBlockUtils with the
goal to remove redundant dbg intrinsics from a basic block.

This can be useful after various transforms, as it might
be simpler to do a filtering of dbg intrinsics after the
transform than during the transform.
One primary use case would be to replace a too aggressive
removal done by MergeBlockIntoPredecessor, seen at loop
rotate (not done in this patch).

The elimination algorithm currently focuses on dbg.value
intrinsics and is doing two iterations over the BB.

First we iterate backward starting at the last instruction
in the BB. Whenever a consecutive sequence of dbg.value
instructions are found we keep the last dbg.value for
each variable found (variable fragments are identified
using the  {DILocalVariable, FragmentInfo, inlinedAt}
triple as given by the DebugVariable helper class).

Next we iterate forward starting at the first instruction
in the BB. Whenever we find a dbg.value describing a
DebugVariable (identified by {DILocalVariable, inlinedAt})
we save the {DIValue, DIExpression} that describes that
variables value. But if the variable already was mapped
to the same {DIValue, DIExpression} pair we instead drop
the second dbg.value.

To ease the process of making lit tests for this utility a
new pass is introduced called RedundantDbgInstElimination.
It can be executed by opt using -redundant-dbg-inst-elim.

Reviewers: aprantl, jmorse, vsk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71478
2019-12-16 11:41:21 +01:00
Jay Foad 4f17b1784e Fix for AMDGPU MUL_I24 known bits calculation
Summary:
At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number".

In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8.

Reviewers: foad, arsenm

Reviewed By: arsenm

Subscribers: foad, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Patch by Eugene Kuznetsov.

Differential Revision: https://reviews.llvm.org/D70367
2019-12-16 10:25:57 +00:00
Valentin Churavy 5c29e8c65f [CodegenPrepare] Guard against degenerate branches
Summary:
Guard against a potential crash observed in https://github.com/JuliaLang/julia/issues/32994#issuecomment-524249628
If two branches are collapsed we can encounter a degenerate conditional branch `TBB==FBB`.
The subsequent code assumes that they differ, so we exit out early.

Reviewers: ributzka, spatel

Subscribers: loladiro, dexonsmith, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66657
2019-12-16 04:23:32 -05:00
Sjoerd Meijer 049f9672d8 [ARM] Move MVE opcode helper functions to ARMBaseInstrInfo. NFC.
In ARMLowOverheadLoops.cpp, MVETailPredication.cpp, and MVEVPTBlock.cpp we have
quite a few helper functions all looking at the opcodes of MVE instructions.
This moves all these utility functions to ARMBaseInstrInfo.

Diferential Revision: https://reviews.llvm.org/D71426
2019-12-16 09:13:59 +00:00
Lang Hames 67a1b7f053 [Orc][LLJIT] Automatically use JITLink for LLJIT on supported platforms.
JITLink (which underlies ObjectLinkingLayer) is a replacement for RuntimeDyld.
It supports the native code model, and linker plugins that enable a wider range
of features than RuntimeDyld.

Currently only enabled for MachO/x86-64 and MachO/arm64. New architectures will
be added as JITLink support for them is developed.
2019-12-15 21:57:10 -08:00
Fangrui Song d25db94fa7 [MC] Delete STT_SECTION special cases from MCSymbolELF::setType and setBinding
The special cases added by rL293936 were no longer needed after rL296180
disallowed redefinition of section symbols.
2019-12-15 20:39:25 -08:00
Jim Lin 7e0fd77645 [PowerPC] Fix %llvm.ppc.altivec.vc* lowering
Summary:
r372285 changed LLVM to use a `TargetConstant` for parameters of intrinsics that are required to be immediates.

Since that commit, use of `%llvm.ppc.altivec.vc{fsx,fux,tsxs,tuxs}` intrinsics has not worked, and resulted in a `LLVM ERROR: Cannot select: intrinsic %llvm.ppc.altivec.vc*` error. The intrinsics' TableGen definitions matched on `imm` instead of `timm`.

This commit updates those definitions to use `timm`.

Fixes: https://llvm.org/PR44239

Reviewers: hfinkel, nemanjai, #powerpc, Jim

Reviewed By: Jim

Subscribers: qiucf, wuzish, Jim, hiraditya, kbarton, jsji, shchenz, llvm-commits

Tags: #llvm

Patched by vddvss (Colin Samples).

Differential Revision: https://reviews.llvm.org/D71138
2019-12-16 10:21:55 +08:00
Lang Hames c0143f37da [ORC] Make ObjectLinkingLayer own its jitlink::MemoryManager.
This relieves ObjectLinkingLayer clients of the responsibility of holding the
memory manager. This makes it easier to select between RTDyldObjectLinkingLayer
(which already owned its memory manager factory) and ObjectLinkingLayer at
runtime as clients aren't required to hold a jitlink::MemoryManager field just
in case ObjectLinkingLayer is selected.
2019-12-15 17:35:52 -08:00
Fangrui Song 1ea5ce6335 [MC] Assume CommentStream is non-null in MCDisassembler::tryAdding*
AArch64/ARM/X86 call the two functions. CommentStream is always
initialized.
2019-12-15 16:21:06 -08:00
Fangrui Song 2b0256e49b [MC] Ignore VK_WEAKREF in MCValue::getAccessVariant
MCSymbolRefExpr::getVariantKindForName does not return VK_WEAKREF, so this code path is not exercised. Moreoever, .weakref is probably a feature that nobody uses.
2019-12-15 16:05:46 -08:00
Fangrui Song fdb408f348 [MC] Delete unused MCAsmInfoELF::UsesNonexecutableStackSection after EM_WEBASSEMBLY was removed in D48744
This removes remnant of D15969 which hasn't been removed by D48744.
2019-12-15 15:43:30 -08:00
Sanjay Patel 6080387f13 [InstSimplify] fold splat of inserted constant to vector constant
shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...>

This is another missing shuffle fold pattern uncovered by the
shuffle correctness fix from D70246.

The problem was visible in the post-commit thread example, but
we managed to overcome the limitation for that particular case
with D71220.

This is something like the inverse of the previous fix - there
we didn't demand the inserted scalar, and here we are only
demanding an inserted scalar.

Differential Revision: https://reviews.llvm.org/D71488
2019-12-15 09:32:03 -05:00
Sanjay Patel 2afe864118 [DAG] Add SimplifyDemandedBits support for BSWAP
This exposes a shortcoming for AArch64, and that is tracked by PR40881:
https://bugs.llvm.org/show_bug.cgi?id=40881

Patch by: @RKSimon (Simon Pilgrim)

Differential Revision: https://reviews.llvm.org/D58017
2019-12-15 08:52:34 -05:00
Craig Topper 1dc0c8af5e [LegalizeTypes] Teach BitcastToInt_ATOMIC_SWAP to only create FP16_TO_FP when called from PromoteFloatResult.
There's also a call from SoftenFloatResult that should not be promoted.

The change test case would fail with the new RUN line prior to
this change.
2019-12-14 15:05:32 -08:00
Craig Topper 95ce8f9498 [LegalizeTypes] In PromoteFloatOp_SETCC, don't both querying for transforming the result type.
The result type is already legal, is doesnt' need to be
transformed.
2019-12-14 15:05:32 -08:00
Logan Chien 061a94e4e2 Revert "AArch64: Fix frame record chain"
Breaks aosp-O3-polly-before-vectorizer-unprofitable with the following
error message:

void llvm::emitFrameOffset(llvm::MachineBasicBlock &,
MachineBasicBlock::iterator, const llvm::DebugLoc &, unsigned int,
unsigned int, llvm::StackOffset, const llvm::TargetInstrInfo *,
MachineInstr::MIFlag, bool, bool, bool *): Assertion `(DestReg !=
AArch64::SP || Bytes % 16 == 0) && "SP increment/decrement not 16-byte
aligned"' failed.

This reverts commit d4e10e6adb.
2019-12-14 13:58:40 -08:00
Logan Chien d4e10e6adb AArch64: Fix frame record chain
The commit r369122 may keep LR and FP register (aka. frame record) in
the middle of a frame, thus we must add the offsets to ensure the FP
register always points to innermost frame record on the stack.

According to AAPCS64[1], a conforming code shall construct a linked list
of stack frames that can be traversed with frame records.  This commit
is also essential to frame-pointer-based stack unwinder (e.g.  the stack
unwinder in linx-perf-tools.)

[1] https://github.com/ARM-software/software-standards/blob/master/abi/aapcs64/aapcs64.rst#the-frame-pointer

Test: llvm-lit ${LLVM_SRC}/test/CodeGen/AArch64/framelayout-frame-record.ll
Test: llvm-lit ${LLVM_SRC}/test/CodeGen/AArch64

Differential Revision: https://reviews.llvm.org/D70800
2019-12-14 10:23:20 -08:00
Puyan Lotfi 816985c120 [NFC][llvm][MIRVRegNamerUtils] Refactoring GetHashableMO into switch-statement.
This refactors the if-statements handling the hashing of various
MachineOperand types into a switch-statement. The purpose is to cover
all the basis for all MachineOperand types while being very deliberate
about which MachineOperand types we are not handling and why (better
added comments). This patch is a NFC redo of https://reviews.llvm.org/D71396.
Much of the changes present in D71396 will come in smaller follow-up patches
that will add support for hashing the MachineOperand types that aren't
covered piece-meal with tests for each new case.
2019-12-14 02:31:07 -05:00
Johannes Doerfert 139c9ef45a [Attributor] Annotate call sites of declarations with a callback
Even if a declaration is called, if there is a callback we might need
the information during CG-SCC traversal (D70767).
2019-12-13 23:51:59 -06:00
Johannes Doerfert 3d347e2835 [Attributor][NFC] Simplify debug printing for abstract attributes
This also fixes a type in the debug printing of AANoAlias.
2019-12-13 23:51:59 -06:00
Johannes Doerfert 5d34602da4 [Attributor] Only replace instruction operands
This was part of D70767. When we replace the value of (call/invoke)
instructions we do not want to disturb the old call graph so we will
only replace instruction uses until we get rid of the old PM.

Accepted as part of D70767.
2019-12-13 22:16:38 -06:00
Fangrui Song a0aa58dad5 [AArch64] Save FP for leaf functions when disabling frame pointer elimination
The change allows clang -mno-omit-leaf-frame-pointer to disable frame
pointer elimination. This behavior matches X86 and Mips, and also GCC
AArch64.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71168
2019-12-13 18:48:58 -08:00
Sean Fertile 93faa237da [PowerPC] Add Support for indirect calls on AIX.
Extends the desciptor-based indirect call support for 32-bit codegen,
and enables indirect calls for AIX.

In-depth Description:
In a function descriptor based ABI, a function pointer points at a
descriptor structure as opposed to the function's entry point. The
descriptor takes the form of 3 pointers: 1 for the function's entry
point, 1 for the TOC anchor of the module containing the function
definition, and 1 for the environment pointer:

struct FunctionDescriptor {
  void *EntryPoint;
  void *TOCAnchor;
  void *EnvironmentPointer;
};

An indirect call has several steps of loading the the information from
the descriptor into the proper registers for setting up the call. Namely
it has to:

1) Save the caller's TOC pointer into the TOC save slot in the linkage
   area, and then load the callee's TOC pointer into the TOC register
   (GPR 2 on AIX).

2) Load the function descriptor's entry point into the count register.

3) Load the environment pointer into the environment pointer register
   (GPR 11 on AIX).

4) Perform the call by branching on count register.

5) Restore the caller's TOC pointer after returning from the indirect call.

A couple important caveats to the above:

- There is no way to directly load a value from memory into the count register.
  Instead we populate the count register by loading the entry point address into
  a gpr and then moving the gpr to the count register.

- The TOC restore has to come immediately after the branch on count register
  instruction (i.e., the 1st instruction executed after we return from the
  call). This is an implementation limitation. We could, in theory, schedule
  the restore elsewhere as long as no uses of the TOC pointer fall in between
  the call and the restore; however, to keep it simple, we insert a pseudo
  instruction that represents both the indirect branch instruction and the
  load instruction that restores the caller's TOC from the linkage area. As
  they flow through the compiler as a single pseudo instruction, nothing can be
  inserted between them and the caller's TOC is then valid at any use.

Differtential Revision: https://reviews.llvm.org/D70724
2019-12-13 20:07:00 -05:00
Fangrui Song 40c288b75c [Mips] Fix gcc -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D71028 2019-12-13 16:41:08 -08:00
Roman Tereshin 8731799fc6 [Legalizer] Making artifact combining order-independent
Legalization algorithm is complicated by two facts:
1) While regular instructions should be possible to legalize in
   an isolated, per-instruction, context-free manner, legalization
   artifacts can only be eliminated in pairs, which could be deeply, and
   ultimately arbitrary nested: { [ () ] }, where which paranthesis kind
   depicts an artifact kind, like extend, unmerge, etc. Such structure
   can only be fully eliminated by simple local combines if they are
   attempted in a particular order (inside out), or alternatively by
   repeated scans each eliminating only one innermost pair, resulting in
   O(n^2) complexity.
2) Some artifacts might in fact be regular instructions that could (and
   sometimes should) be legalized by the target-specific rules. Which
   means failure to eliminate all artifacts on the first iteration is
   not a failure, they need to be tried as instructions, which may
   produce more artifacts, including the ones that are in fact regular
   instructions, resulting in a non-constant number of iterations
   required to finish the process.

I trust the recently introduced termination condition (no new artifacts
were created during as-a-regular-instruction-retrial of artifacts not
eliminated on the previous iteration) to be efficient in providing
termination, but only performing the legalization in full if and only if
at each step such chains of artifacts are successfully eliminated in
full as well.

Which is currently not guaranteed, as the artifact combines are applied
only once and in an arbitrary order that has to do with the order of
creation or insertion of artifacts into their worklist, which is a no
particular order.

In this patch I make a small change to the artifact combiner, making it
to re-insert into the worklist immediate (modulo a look-through copies)
artifact users of each vreg that changes its definition due to an
artifact combine.

Here the first scan through the artifacts worklist, while not
being done in any guaranteed order, only needs to find the innermost
pair(s) of artifacts that could be immediately combined out. After that
the process follows def-use chains, making them shorter at each step, thus
combining everything that can be combined in O(n) time.

Reviewers: volkan, aditya_nandakumar, qcolombet, paquette, aemerson, dsanders

Reviewed By: aditya_nandakumar, paquette

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71448
2019-12-13 15:45:18 -08:00
Roman Tereshin 18bf9670aa [Legalizer] Refactoring out legalizeMachineFunction
and introducing new unittests/CodeGen/GlobalISel/LegalizerTest.cpp
relying on it to unit test the entire legalizer algorithm (including the
top-level main loop).

See also https://reviews.llvm.org/D71448
2019-12-13 15:45:18 -08:00
Roman Tereshin 8207c81597 [Legalizer] More detailed debugging printing in main loop 2019-12-13 15:45:18 -08:00
Alex Richardson 11448eeb72 [NFC] Use SelectionDAG::getMemBasePlusOffset() instead of getNode(ISD::ADD)
Summary:
To find potential opportunities to use getMemBasePlusOffset() I looked at
all ISD::ADD uses found with the regex getNode\(ISD::ADD,.+,.+Ptr
in lib/CodeGen/SelectionDAG. If this patch is accepted I will convert
the files in the individual backends too.

The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.

We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.

Reviewers: spatel

Reviewed By: spatel

Subscribers: merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71207
2019-12-13 21:40:03 +00:00
Alex Richardson fc83f53a86 [NFC] Implement SelectionDAG::getObjectPtrOffset() using getMemBasePlusOffset()
Summary:
This change is preparatory work to use this helper functions in more places.
In order to make this change, getMemBasePlusOffset() has been extended to
also take a SDNodeFlags parameter.

The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.

We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.

Reviewers: spatel

Reviewed By: spatel

Subscribers: merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71206
2019-12-13 21:40:03 +00:00
Alex Richardson ea8888d1af [NFC] Add a SDValue overload for SelectionDAG::getMemBasePlusOffset()
Summary:
This change is preparatory work to use this helper functions in more places.
Currently the function only allows integer constants offsets, but there
are cases where we can use an existing SDValue parameter.

The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.

We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.

Reviewers: spatel, craig.topper

Reviewed By: spatel, craig.topper

Subscribers: craig.topper, merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71205
2019-12-13 21:40:03 +00:00
Alex Richardson d9bb70acd7 [NFC] Change SelectionDAG::getMemBasePlusOffset() to use int64_t
Summary:
This change is preparatory work to use this helper functions in more places.
Currently the function only allows positive offsets, but there are cases
where we want to subtract an offset from an existing pointer.

The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.

We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.

Reviewers: spatel

Reviewed By: spatel

Subscribers: merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71204
2019-12-13 21:40:03 +00:00
Sam Elliott a0f43b0043 [RISCV] Move DebugLoc Copy into CompressInstEmitter
Summary:
This copy ensures that debug location information is kept for
compressed instructions. There are places where both compressInstruction and
uncompressInstruction are called that were not doing this copy, discarding some
debug info.

This change merely moves the copy into the generated file, so you cannot forget
to copy the location over when compressing or uncompressing.

Reviewers: asb, luismarques

Reviewed By: luismarques

Subscribers: sameer.abuasal, aprantl, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67493
2019-12-13 20:01:04 +00:00
Francesco Petrogalli 19f73f0d1b Revert "[VectorUtils] Introduce the Vector Function Database (VFDatabase)."
This reverts commit 0be81968a2.

The VFDatabase needs some rework to be able to handle vectorization
and subsequent scalarization of intrinsics in out-of-tree versions of
the compiler. For more details, see the discussion in
https://reviews.llvm.org/D67572.
2019-12-13 19:42:04 +00:00
Fangrui Song 193da743db [profile] Fix a crash when -fprofile-remapping-file= triggers an error
Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D71485
2019-12-13 11:38:20 -08:00
Sanjay Patel 2f0c7fd2db [DAGCombiner] fold shift-trunc-shift to shift-mask-trunc (2nd try)
The initial attempt (rG89633320) botched the logic by reversing
the source/dest types. Added x86 tests for additional coverage.
The vector tests show a potential improvement (fold vector load
instead of broadcasting), but that's a known/existing problem.

This fold is done in IR by instcombine, and we have a special
form of it already here in DAGCombiner, but we want the more
general transform too:
https://rise4fun.com/Alive/3jZm

Name: general
Pre: (C1 + zext(C2) < 64)
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%a = and i64 %s2, zext((1 << (16 - C2)) - 1)
%r = trunc %a to i16

Name: special
Pre: C1 == 48
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%r = trunc %s2 to i16

...because D58017 exposes a regression without this fold.
2019-12-13 14:03:54 -05:00
Hiroshi Yamauchi ed50e6060b [PGO][PGSO] Enable size optimizations in code gen / target passes for cold code.
Summary: Split off of D67120.

Reviewers: davidxl

Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71288
2019-12-13 11:01:19 -08:00
Momchil Velikov 8e8e3181aa [ARM] Fix in ICE when retrieving the number of micro-ops for vlldm/vlstm
The big switch in `ARMBaseInstrInfo::getNumMicroOps` is missing cases for
`VLLDM` and `VLSTM`, which are currently defined with itineraries having a
dynamic count of micro-ops.

Assuming an optimistic case in which these instruction do not actually perform
loads or stores, and with the idea that Armv8-m cores are supposed to use the
new style scheduling models, this patch just sets the itinerary for those two
instructions to `NoItinerary`.

Differential Revision: https://reviews.llvm.org/D71266
2019-12-13 18:19:40 +00:00
Momchil Velikov d53e61863d [AArch64] Emit PAC/BTI .note.gnu.property flags
This patch make LLVM emit the processor specific program property types
defined in AArch64 ELF spec
https://developer.arm.com/docs/ihi0056/f/elf-for-the-arm-64-bit-architecture-aarch64-abi-2019q2-documentation

A file containing no functions gets both property flags.  Otherwise, a property
is set iff all the functions in the file have the corresponding attribute.

Patch by Daniel Kiss and Momchil Velikov.

Differential Revision: https://reviews.llvm.org/D71019
2019-12-13 17:38:20 +00:00
Fangrui Song f99eedeb72 [MC][PowerPC] Fix a crash when redefining a symbol after .set
Fix PR44284. This is probably not valid assembly but we should not crash.

Reviewed By: luporl, #powerpc, steven.zhang

Differential Revision: https://reviews.llvm.org/D71443
2019-12-13 09:31:54 -08:00
Fangrui Song f16377f11c [ARM][MVE] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D71062 2019-12-13 09:26:26 -08:00
Sam Parker 84593f058b [ARM][MVE] Make VPT invalid for tail predication
We've been marking VPT incompatible instructions as invalid for tail
predication too, though this may not strictly be true. VPT are
incompatible and, unless its the first predicate def in a loop,
they shouldn't be compatible for tail predication either.

Differential Revision: https://reviews.llvm.org/D71410
2019-12-13 15:01:08 +00:00
Nicola Zaghen 97572775d2 Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.

This fixes the buildbot failures.

Differential Revision: https://reviews.llvm.org/D68328

Patch by Joseph Faulls!
2019-12-13 14:30:21 +00:00
Mikhail Maltsev 99581fd4c8 [ARM][MVE] Add vector reduction intrinsics with two vector operands
Summary:
This patch adds intrinsics for the following MVE instructions:
* VABAV
* VMLADAV, VMLSDAV
* VMLALDAV, VMLSLDAV
* VRMLALDAVH, VRMLSLDAVH

Each of the above 4 groups has a corresponding new LLVM IR intrinsic,
since the instructions cannot be easily represented using
general-purpose IR operations.

Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM

Reviewed By: MarkMurrayARM

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71062
2019-12-13 13:17:29 +00:00
Simon Tatham 25305a9311 [ARM][MVE] Add intrinsics for more immediate shifts.
Summary:
This fills in the remaining shift operations that take a single vector
input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and
`vshll[bt]` families.

`vshll[bt]` (which shifts each input lane left into a double-width
output lane) is the most interesting one. There are separate MC
instruction ids for shifting by exactly the input lane width and
shifting by less than that, because the instruction encoding is so
completely different for the lane-width special case. So I had to
write two sets of patterns to match based on the immediate shift
count, which involved adding a ComplexPattern matcher to avoid the
general-case pattern accidentally matching the special case too. For
that family I've made sure to add an llc codegen test for both
versions of each instruction.

I'm experimenting with a new strategy for parametrising the isel
patterns for all these instructions: adding extra fields to the
relevant `Instruction` subclass itself, which are ignored by the
Tablegen backends that generate the MC data, but can be retrieved from
each instance of that instruction subclass when it's passed as a
template parameter to the multiclass that generates its isel patterns.
A nice effect of that is that I can fill in those informational fields
using `let` blocks, rather than having to type them out once per
instruction at `defm` time.

(As a result, quite a lot of existing instruction `def`s are
reindented by this patch, so it's clearer to read with whitespace
changes ignored.)

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71458
2019-12-13 13:07:39 +00:00
John Brawn 01ba201abc [ARM] Add custom strict fp conversion lowering when non-strict is custom
We have custom lowering for operations converting to/from floating-point types
when we don't have hardware support for those types, and this doesn't interact
well with the target-independent legalization of the strict versions of these
operations. Fix this by adding similar custom lowering of the strict versions.

This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics
test, with the remaining failures due to poor instruction selection.

Differential Revision: https://reviews.llvm.org/D71127
2019-12-13 13:00:00 +00:00
Tim Renouf fce1a6f584 Revert "AMDGPU: Try to commute sub of boolean ext"
This reverts commit 69fcfb7d35.

As shown in the test I attached to this commit, the change I reverted
causes a problem with "zext(cc1) - zext(cc2)". It commuted
the operands to the sub and used different logic to select the addc/subc
instruction:
   sub zext (setcc), x => addcarry 0, x, setcc
   sub sext (setcc), x => subcarry 0, x, setcc

... but that is bogus. I believe it is not possible to fold those commuted
patterns into any form of addcarry or subcarry. It may have worked as
intended before "AMDGPU: Change boolean content type to 0 or 1" because
the setcc was considered to be -1 rather than 1.

Differential Revision: https://reviews.llvm.org/D70978

Change-Id: If2139421aa6c935cbd1d925af58fe4a4aa9e8f43
2019-12-13 12:49:06 +00:00
Alex Richardson be15dfa88f [NFC] Use EVT instead of bool for getSetCCInverse()
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).

In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.

Committing this change will significantly reduce our merge conflicts
for each upstream merge.

Reviewers: spatel, bogner

Reviewed By: bogner

Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70917
2019-12-13 12:22:03 +00:00
Sjoerd Meijer e91420e17d Revert "[ARM][MVE] findVCMPToFoldIntoVPS. NFC."
This reverts commit 9468e3334b.

There's a test that doesn't like this change. The RDA analysis
gets invalided by changes in the block, which is not taken into
account. Revert while I work on a fix for this.
2019-12-13 11:56:44 +00:00
Mark Murray 228c74076d [ARM][MVE][Intrinsics] Add *_x() variants of my *_m() intrinsics.
Summary:
Better use of multiclass is used, and this helped find some existing
bugs in the predicated VMULL* intrinsics, which are now fixed.

The refactored VMULL[TB]Q_(INT|POLY)_M() intrinsics were discovered
to have an argument ("inactive") with incorrect type, and this required
a fix that is included in this whole patch. The argument "inactive"
should have been the same width (per vector element) as the return
type of the intrinsic, but was not in the case where the return type
was double the element width of the input types.

To assist in testing the multiclassing , and to thwart further gremlins,
the unit tests are improved in scope.

The *.ll tests are all generated by a small bit of throw-away scripting
from the corresponding *.c tests, and as such the diffs are large and
nasty. Look at the file rather than the diff.

Reviewers: dmgreen, miyuki, ostannard, simon_tatham

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71421
2019-12-13 11:51:23 +00:00
Kerry McLaughlin 4194ca8e5a Recommit "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch
to use reg + imm non-temporal loads to fix previous test failures.

Original commit message:

Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
2019-12-13 10:08:20 +00:00
David Stenberg 5c7cc6f83d [LiveDebugValues] Omit entry values for DBG_VALUEs with pre-existing expressions
Summary:
This is a quickfix for PR44275. An assertion that checks that the
DIExpression is valid failed due to attempting to create an entry value
for an indirect parameter. This started appearing after D69028, as the
indirect parameter started being represented using an DW_OP_deref,
rather than with the DBG_VALUE's second operand, meaning that the
isIndirectDebugValue() check in LiveDebugValues did not exclude such
parameters. A DIExpression that has an entry value operation can
currently not have any other operation, leading to the failed isValid()
check.

This patch simply makes us stop considering emitting entry values
for such parameters. To support such cases I think we at least need
to do the following changes:

 * In DIExpression::isValid(): Remove the limitation that a
   DW_OP_LLVM_entry_value operation can be the only operation in a
   DIExpression.

 * In LiveDebugValues::emitEntryValues(): Create an entry value of size
   1, so that it only wraps the register operand, and not the whole
   pre-existing expression (the DW_OP_deref).

 * In LiveDebugValues::removeEntryValue(): Check that the new debug
   value has the same debug expression as the original, rather than
   checking that the debug expression is empty.

 * In DwarfExpression::addMachineRegExpression(): Modify the logic so
   that a DW_OP_reg* expression is emitted for the entry value.
   That is how GCC emits entry values for indirect parameters. That will
   currently not happen to due the DW_OP_deref causing the
   !HasComplexExpression to fail. The LocationKind needs to be changed
   also, rather than always emitting a DW_OP_stack_value for entry values.

There are probably more things I have missed, but that could hopefully
be a good starting point for emitting such entry values.

Reviewers: djtodoro, aprantl, jmorse, vsk

Reviewed By: aprantl, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D71416
2019-12-13 10:49:46 +01:00
Georgii Rymar 86e652f828 [yaml2obj] - Add a way to override sh_flags section field.
Currently we have the `Flags` property that allows to
set flags for a section. The problem is that it does not
allow us to set an arbitrary value, because of bit fields
validation under the hood. An arbitrary values can be used
to test specific broken cases.

We probably do not want to relax the validation, so this
patch adds a `ShSize` property that allows to
override the `sh_size`. It is inline with others `Sh*` properties
we have already.

Differential revision: https://reviews.llvm.org/D71411
2019-12-13 11:54:37 +03:00
Craig Topper 5c80a4f454 [LegalizeTypes] Remove unnecessary if before calling ReplaceValueWith on the chain in SoftenFloatRes_LOAD.
I believe this is a leftover from when fp128 was softened to fp128
on X86-64. In that case type legalization must have been able to
create a load that was the same as N which would make this
replacement fail or assert. Since we no longer do that, this
check should be unneeded.
2019-12-13 00:14:41 -08:00
Nikita Popov 21fbd5587c Reapply [LVI] Normalize pointer behavior
This is a rebase of the change over D70376, which fixes an LVI cache
invalidation issue that also affected this patch.

-----

Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.

The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.

This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this change not fully NFC, because we can now fold
terminator icmps based on the dereferencability information in the
same block. This is the reason why I changed one JumpThreading test
(it would optimize the condition away without the change).

Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.

I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.

Differential Revision: https://reviews.llvm.org/D69914
2019-12-13 08:59:58 +01:00
Rui Ueyama 69da7e29de Revert an accidental commit af5ca40b47 2019-12-13 15:17:40 +09:00
Rui Ueyama af5ca40b47 temporary 2019-12-13 14:35:03 +09:00
Nate Voorhies bc16666de4 [NFC][AArch64] Fix typo.
Summary: Coaleascer should be coalescer.

Reviewers: qcolombet, Jim

Reviewed By: Jim

Subscribers: Jim, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70731
2019-12-13 10:25:36 +08:00
Eric Christopher a8154e5e0c Temporarily revert "NFC: DebugInfo: Refactor RangeSpanList to be a struct, like DebugLocStream::List"
as it was causing bot and build failures.

This reverts commit 8e04896288.
2019-12-12 17:55:41 -08:00
David Blaikie 8e04896288 NFC: DebugInfo: Refactor RangeSpanList to be a struct, like DebugLocStream::List
Move these data structures closer together so their emission code can
eventually share more of its implementation.
2019-12-12 16:53:59 -08:00
David Blaikie 20e06a28da NFC: DebugInfo: Refactor debug_loc/loclist emission into a common function
(except for v4 loclists, which are sufficiently different to not fit
well in this generic implementation)

In subsequent patches I intend to refactor the DebugLoc and ranges data
structures to be more similar so I can common more of the implementation
here.
2019-12-12 16:39:12 -08:00
Evgenii Stepanov dabd2622a8 hwasan: add tag_offset DWARF attribute to optimized debug info
Summary:
Support alloca-referencing dbg.value in hwasan instrumentation.
Update AsmPrinter to emit DW_AT_LLVM_tag_offset when location is in
loclist format.

Reviewers: pcc

Subscribers: srhines, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70753
2019-12-12 16:18:54 -08:00
Danilo Carvalho Grael 6bed43f3c4 [AArch64][SVE] Add integer arithmetic with immediate instructions.
Summary:
Add pattern matching for the following instructions:
- add, sub, subr, sqadd, sqsub, uqadd, uqsub

This patch required complex patterns to match the immediate with optinal left shift.

I re-used the Select function from the other SVE repo to implement the complext pattern.

I plan on doing another patch to also match constant vector of the same immediate.

Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71370
2019-12-12 17:28:46 -05:00
Johannes Doerfert 6abd01e462 [Attributor][FIX] Do treat byval arguments special
When we reason about the pointer argument that is byval we actually
reason about a local copy of the value passed at the call site. This was
not the case before and we wrongly introduced attributes based on the
surrounding function.

AAMemoryBehaviorArgument, AAMemoryBehaviorCallSiteArgument and
AANoCaptureCallSiteArgument are made aware of byval now. The code
to skip "subsuming positions" for reasoning follows a common pattern and
we should refactor it. A TODO was added.

Discovered by @efriedma as part of D69748.
2019-12-12 16:04:21 -06:00
Denis Bakhvalov 7081c92241 [NFC][InstSimplify] Refactoring ThreadCmpOverSelect function
Removed code duplication in ThreadCmpOverSelect and broke it
into several smaller functions for reusing them.

Differential Revision: https://reviews.llvm.org/D71158
2019-12-12 22:45:58 +01:00
Sanjay Patel 9432937190 Revert "[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc"
This reverts commit 8963332c33.
There was a logic bug typo in this code, but it wasn't visible in the asm for the tests.
2019-12-12 16:24:40 -05:00
Sanjay Patel 8963332c33 [DAGCombiner] fold shift-trunc-shift to shift-mask-trunc
This fold is done in IR by instcombine, and we have a special
form of it already here in DAGCombiner, but we want the more
general transform too:
https://rise4fun.com/Alive/3jZm

Name: general
Pre: (C1 + zext(C2) < 64)
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%a = and i64 %s2, zext((1 << (16 - C2)) - 1)
%r = trunc %a to i16

Name: special
Pre: C1 == 48
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%r = trunc %s2 to i16

...because D58017 exposes a regression without this fold.
2019-12-12 15:44:13 -05:00
Teresa Johnson c8e0bb3b2c [LTO] Support for embedding bitcode section during LTO
Summary:
This adds support for embedding bitcode in a binary during LTO. The libLTO gains supports the `-lto-embed-bitcode` flag. The option allows users of the LTO library to embed a bitcode section. For example, LLD can pass the option via `ld.lld -mllvm=-lto-embed-bitcode`.

This feature allows doing something comparable to `clang -c -fembed-bitcode`, but on the (LTO) linker level. Having bitcode alongside native code has many use-cases. To give an example, the MacOS linker can create a `-bitcode_bundle` section containing bitcode. Also, having this feature built into LLVM is an alternative to 3rd party tools such as [[ https://github.com/travitch/whole-program-llvm | wllvm ]] or [[ https://github.com/SRI-CSL/gllvm | gllvm ]]. As with these tools, this feature simplifies creating "whole-program" llvm bitcode files, but in contrast to wllvm/gllvm it does not rely on a specific llvm frontend/driver.

Patch by Josef Eisl <josef.eisl@oracle.com>

Reviewers: #llvm, #clang, rsmith, pcc, alexshap, tejohnson

Reviewed By: tejohnson

Subscribers: tejohnson, mehdi_amini, inglorion, hiraditya, aheejin, steven_wu, dexonsmith, dang, cfe-commits, llvm-commits, #llvm, #clang

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68213
2019-12-12 12:34:19 -08:00
Kit Barton 61368c8e98 Rename LoopInfo::isRotated() to LoopInfo::isRotatedForm().
This patch renames the LoopInfo::isRotated() method to LoopInfo::isRotatedForm()
to make it clear that the method checks whether the loop is in rotated form, not
whether the loop has been rotated by the LoopRotation pass.
2019-12-12 14:22:36 -05:00
Jonas Paulsson 61f5ba5c32 [SystemZ] Implement the packed stack layout
Any llvm function with the "packed-stack" attribute will be compiled to use
the packed stack layout which reuses unused parts of the incoming register
save area. This is needed for building the Linux kernel.

Review: Ulrich Weigand
https://reviews.llvm.org/D70821
2019-12-12 10:26:03 -08:00
Sanjay Patel b39009bf1d [DAGCombiner] improve readability
This is not quite NFC because I changed the SDLoc to use the more
standard 'N' (the starting node for the fold).

This transform is a special-case of a more general fold that we
do in IR, but it seems like the general fold is needed here too
to avoid a potential regression seen in D58017.

https://rise4fun.com/Alive/3jZm
2019-12-12 13:16:50 -05:00
Florian Hahn bd12a322d7 [BasicAA] Use GEP as context for computeKnownBits in aliasGEP.
In order to use assumptions, computeKnownBits needs a context
instruction. We can use the GEP, if it is an instruction. We already
pass the assumption cache, but it cannot be used without a context
instruction.

Reviewers: anemet, asbirlea, hfinkel, spatel

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D71264
2019-12-12 17:18:04 +00:00
Michael Liao 11b2b2f4b1 [amdgpu] Fix `-Wenum-compare` warning. NFC. 2019-12-12 11:44:16 -05:00
Florian Hahn 526244b187 [Matrix] Add first set of matrix intrinsics and initial lowering pass.
This is the first patch adding an initial set of matrix intrinsics and a
corresponding lowering pass. This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html

The first patch introduces four new intrinsics (transpose, multiply,
columnwise load and store) and a LowerMatrixIntrinsics pass, that
lowers those intrinsics to vector operations.

Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix
embedded in a <16 x float> vector) and the intrinsics take the dimension
information as parameters. Those parameters need to be ConstantInt.
For the memory layout, we initially assume column-major, but in the RFC
we also described how to extend the intrinsics to support row-major as
well.

For the initial lowering, we split the input of the intrinsics into a
set of column vectors, transform those column vectors and concatenate
the result columns to a flat result vector.

This allows us to lower the intrinsics without any shape propagation, as
mentioned in the RFC. In follow-up patches, we plan to submit the
following improvements:
 * Shape propagation to eliminate the embedding/splitting for each
   intrinsic.
 * Fused & tiled lowering of multiply and other operations.
 * Optimization remarks highlighting matrix expressions and costs.
 * Generate loops for operations on large matrixes.
 * More general block processing for operation on large vectors,
   exploiting shape information.

We would like to add dedicated transpose, columnwise load and store
intrinsics, even though they are not strictly necessary. For example, we
could instead emit a large shufflevector instruction instead of the
transpose. But we expect that to
  (1) become unwieldy for larger matrixes (even for 16x16 matrixes,
      the resulting shufflevector masks would be huge),
  (2) risk instcombine making small changes, causing us to fail to
      detect the transpose, preventing better lowerings

For the load/store, we are additionally planning on exploiting the
intrinsics for better alias analysis.

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70456
2019-12-12 15:42:18 +00:00
Sjoerd Meijer 9468e3334b [ARM][MVE] findVCMPToFoldIntoVPS. NFC.
This adds ReachingDefAnalysis (RDA) to the VPTBlock pass, so that we can
reimplement findVCMPToFoldIntoVPS with just a few calls to RDA.

Differential Revision: https://reviews.llvm.org/D71330
2019-12-12 15:41:20 +00:00
Guillaume Chatelet dbc5acf8ce [Alignment][NFC] Adding Align compatible methods to IntrinsicInst/IRBuilder
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71420
2019-12-12 16:22:15 +01:00
Tom Stellard bf13a71095 AMDGPU/SILoadStoreOptimizer: Simplify function
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71044
2019-12-12 06:53:03 -08:00
Sam Parker 1274ac3dc2 [ARM][MVE] Sink vector shift operand
Recommit e0b966643f. sub instructions were being generated for the
negated value, and for some reason they were the register only ones.
I think the problem was because I was grabbing the 'zero' from
vmovimm, which is a target constant. Now I'm just generating a new
Constant zero and so rsb instructions are now generated.

Original commit message:

The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 14:34:00 +00:00
James Henderson 84a9756a72 [llvm-dwarfdump] Add blank line after printing line table
This helps delineate it in the output from later tables or other output.

Reviewed by: JDevlieghere

Differential Revision: https://reviews.llvm.org/D71344
2019-12-12 14:06:10 +00:00
Hideto Ueno 4ecf25545c [Attributor][NFC] Fix comments and unnecessary comma 2019-12-12 13:42:40 +00:00
Hideto Ueno 827bade262 [Attributor] [NFC] Use `checkForAllUses` helpr in `AAHeapToStackImpl::updateImpl`
Summary: Remove `Worklist` iteration and make use `checkForAllUses`. There is no test chage.

Reviewers: sstefan1, jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71352
2019-12-12 13:27:53 +00:00
Hideto Ueno 63599bd072 [Attributor][NFC] Refactoring `AANoFreeArgument::updateImpl`
Summary: Refactoring `AANoFreeArgument::updateImpl`. There is no test change.

Reviewers: sstefan1, jdoerfert

Reviewed By: sstefan1

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71349
2019-12-12 13:27:53 +00:00
stozer e39e2b4a79 [DebugInfo] Prevent invalid fragments at ISel from dropping debug info
During SelectionDAG, if a value which is associated with a DBG_VALUE
needs to be split across multiple registers, the DBG_VALUE will be split
into a set of fragment expressions to recreate the original value.

If one or more of these fragments cannot be created, they would
previously be silently dropped, causing the old debug value to live past
its expiry date. This patch fixes this issue by keeping invalid
fragments while setting their value as Undef.

Differential revision: https://reviews.llvm.org/D70248
2019-12-12 12:28:39 +00:00
Russell Gallop f70f180148 [Support] Try to fix bot failure after 8ddcd1dc26
http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/41755
2019-12-12 12:20:11 +00:00
Russell Gallop 8ddcd1dc26 [Support] Extend TimeProfiler to support multiple threads
This makes TimeTraceProfilerInstance thread local. Added
timeTraceProfilerFinishThread() which moves the thread local instance to
a global vector of instances. timeTraceProfilerWrite() then writes
recorded data from all instances.

Threads are identified based on their thread ids. Totals are reported
with artificial thread ids higher than the real ones.

Replaced raw pointer for TimeTraceProfilerInstance with unique_ptr.

Differential Revision: https://reviews.llvm.org/D71059
2019-12-12 12:01:44 +00:00
Mirko Brkusanin d7357c52a4 [Mips] Add support for min/max/umin/umax atomics
In order to properly implement these atomic we need one register more than other
binary atomics. It is used for storing result from comparing values in addition
to the one that is used for actual result of operation.

https://reviews.llvm.org/D71028
2019-12-12 11:32:37 +01:00
Nicola Zaghen f798eb21ec Temporarily Revert "[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same."
This reverts commit 5f6208778f.

This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll
const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.
2019-12-12 10:29:54 +00:00
Nicola Zaghen 5f6208778f [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.

Differential Revision: https://reviews.llvm.org/D68328

Patch by Joseph Faulls!
2019-12-12 10:07:01 +00:00
Cullen Rhodes bbd16b6876 [AArch64][SVE] Remove nxv1f32 and nxv1f64 as legal types
Summary: Also cleans up ZPR register class definition.

Reviewers: sdesmalen, cameron.mcinally, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71351
2019-12-12 09:49:22 +00:00
Puyan Lotfi 756db63af9 [NFC][llvm][MIRVRegNamerUtils] Moving methods around. Making some private.
Making all externally unused methods private in MIRVRegNamerUtils.h.
Moving or deleting a couple other methods around.
2019-12-12 03:32:53 -05:00
Alexey Lapshin 71aaebc824 [DWARF5][DWARFVerifier] Check that Skeleton compilation unit does not have children.
That patch adds checking into DWARFVerifier that the Skeleton
compilation unit does not have children.

Differential Revision: https://reviews.llvm.org/D71244
2019-12-12 10:59:10 +03:00
Sam Parker f8ff3bf55b Revert "[ARM][MVE] Sink vector shift operand"
This reverts commit e0b966643f.

Instruction selection is failing with expensive checks.
2019-12-12 07:52:57 +00:00
Sam Parker e0b966643f [ARM][MVE] Sink vector shift operand
The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 07:35:21 +00:00
Wenlei He d275a06487 [AutoFDO] Statistic for context sensitive profile guided inlining
Summary: AutoFDO compilation has two places that do inlining - the sample profile loader that does inlining with context sensitive profile, and the regular inliner as CGSCC pass. Ideally we want most inlining to come from sample profile loader as that is driven by context sensitive profile and also retains context sensitivity after inlining. However the reality is most of the inlining actually happens during regular inliner. To track the number of inline instances from sample profile loader and help move more inlining to sample profile loader, I'm adding statistics and optimization remarks for sample profile loader's inlining.

Reviewers: wmi, davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70584
2019-12-11 21:37:21 -08:00
Puyan Lotfi f5b7a46837 [llvm][MIRVRegNamerUtils] Adding hashing on memoperands.
No more hash collisions for memoperands. Now the MIRCanonicalization
pass shouldn't hit hash collisions when dealing with nearly identical
memory accessing instructions when their memoperands are in fact different.

Differential Revision: https://reviews.llvm.org/D71328
2019-12-11 22:11:49 -05:00
Cameron McInally 7aa5c16088 [AArch64][SVE] Add patterns for scalable vselect
This patch matches scalable vector selects to predicated move instructions.

Differential Revision: https://reviews.llvm.org/D71298
2019-12-11 20:15:44 -06:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Reid Kleckner 85ba5f637a Rename TTI::getIntImmCost for instructions and intrinsics
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.

Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.

Split off from D71320

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D71381
2019-12-11 18:00:20 -08:00
Vedant Kumar 56232f950d Revert "[DWARF] Allow cross-CU references of subprogram definitions"
This reverts commit 30038da15b. It causes
the stage2 thinLTO bot to fail with:

Assertion failed: (CU.getDIE(CalleeSP) && "Expected declaration subprogram DIE for callee")

rdar://57840415
2019-12-11 15:55:48 -08:00
Sanjay Patel cdf5cfea8e Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()"
This reverts commit d1f0bdf2d2.
The patch can cause infinite loops in DAGCombiner.
2019-12-11 16:56:58 -05:00
Craig Topper 4b452952fe [LegalizeTypes] In SoftenFloatRes_FP_EXTEND, move the check for input already being promoted above the check for fp16 converting to something other than fp32.
The fp16 to larger than fp32 inserts an extend that need to
re-legalized if fp16 is promoted. But if we check for fp16
promotion first, then we can avoid emiting the fp_extend all
together.
2019-12-11 12:48:08 -08:00
Johannes Doerfert d23c61490c [OpenMP] Introduce the OpenMP-IR-Builder
This is the initial patch for the OpenMP-IR-Builder, as discussed on the
mailing list ([1] and later) and at the US Dev Meeting'19.

The design is similar to D61953 but:
  - in a non-WIP status, with proper documentation and working.
  - using a OpenMPKinds.def file to manage lists of directives, runtime
    functions, types, ..., similar to the current Clang implementation.
  - restricted to handle only (simple) barriers, to implement most
    `#pragma omp barrier` directives and most implicit barriers.
  - properly hooked into Clang to be used if possible (D69922).
  - compatible with the remaining code generation.

Parts have been extracted into D69853.

The plan is to have multiple people working on moving logic from Clang
here once the initial scaffolding (=this patch) landed.

[1] http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html

Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim

Subscribers: mgorny, hiraditya, bollu, guansong, jfb, cfe-commits, llvm-commits, penzn, ppenzin

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69785
2019-12-11 14:38:49 -06:00
Sam Clegg 881d877846 [WebAssembly] Add new `export_name` clang attribute for controlling wasm export names
This is equivalent to the existing `import_name` and `import_module`
attributes which control the import names in the final wasm binary
produced by lld.

This maps the existing

This attribute currently requires a string rather than using the
symbol name for a couple of reasons:

1. Avoid confusion with static and dynamic linking which is
   based on symbol name.  Exporting a function from a wasm module using
   this directive is orthogonal to both static and dynamic linking.
2. Avoids name mangling.

Differential Revision: https://reviews.llvm.org/D70520
2019-12-11 11:54:57 -08:00
Nikita Popov 8db5143b1a [InstCombine] Optimize overflow check base on uadd.with.overflow result
Fix for https://bugs.llvm.org/show_bug.cgi?id=40846.

This adds a combine for cases where a (a + b) < a style overflow
check is performed, but with a + b being the result of
uadd.with.overflow, so the overflow result is also already available
and we can just use it. Subsequently GVN/CSE will deduplicate the extracts.

We can run into this situation if you have both a uadd.with.overflow
and a manual add + overflow check in the same function (on the same
operands), in which case GVN will rewrite the add to the with.overflow
result and leave you with this pattern.

The implementation is a bit ugly because I'm handling the various
canonicalization edge cases.

This does not yet handle the negated version of this pattern.

Differential Revision: https://reviews.llvm.org/D58644
2019-12-11 20:52:04 +01:00
Danila Kutenin 19e83a9b4c [ValueTracking] Pointer is known nonnull after load/store
If the pointer was loaded/stored before the null check, the check
is redundant and can be removed. For now the optimizers do not
remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG.
The patch allows to use more nonnull constraints. Also, it found
one more optimization in some PowerPC test. This is my first llvm
review, I am free to any comments.

Differential Revision: https://reviews.llvm.org/D71177
2019-12-11 20:32:29 +01:00
Nikita Popov b361d3bbcd [MergeFuncs] Remove incorrect attribute copying
Fix for https://bugs.llvm.org/show_bug.cgi?id=44236. This code was
originally introduced in rG36512330041201e10f5429361bbd79b1afac1ea1.
However, the attribute copying was done in the wrong place (in general
call replacement, not thunk generation) and a proper fix was
implemented in D12581.

Previously this code was just unnecessary but harmless (because
FunctionComparator ensured that the attributes of the two functions
are exactly the same), but since byval was changed to accept a type
this copying is actively wrong and may result in malformed IR.

Differential Revision: https://reviews.llvm.org/D71173
2019-12-11 20:09:54 +01:00
Fangrui Song 25e21a09b3 Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D65958 and D70450 2019-12-11 11:04:03 -08:00
Andrzej Warzynski a75463c471 Add intrinsics for unary narrowing operations
Summary:
The following intrinsics for unary narrowing operations are added:
 * @llvm.aarch64.sve.sqxtnb
 * @llvm.aarch64.sve.uqxtnb
 * @llvm.aarch64.sve.sqxtunb
 * @llvm.aarch64.sve.sqxtnt
 * @llvm.aarch64.sve.uqxtnt
 * @llvm.aarch64.sve.sqxtunt

Reviewers: sdesmalen, rengolin, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71270
2019-12-11 18:55:51 +00:00
Florian Hahn 2675a3c880 [AArch64] Be more careful to skip debug operands in LdSt Optimizier.
This fixes crashes with $noreg operands.
2019-12-11 18:47:45 +00:00
Sanjay Patel d1f0bdf2d2 [SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets
to show the problem more generally.

We check the number of uses as a predicate for whether some
value is free to negate, but that use count can change as we
rewrite the expression in getNegatedExpression(). So something
that was marked free to negate during the cost evaluation
phase becomes not free to negate during the rewrite phase (or
the inverse - something that was not free becomes free).
This can lead to a crash/assert because we expect that
everything in an expression that is negatible to be handled
in the corresponding code within getNegatedExpression().

This patch skips the use check during the rewrite phase.
So we determine that some expression isNegatibleForFree
(identically to without this patch), but during the rewrite,
don't rely on use counts to decide how to create the optimal
expression.

Differential Revision: https://reviews.llvm.org/D70975
2019-12-11 13:30:39 -05:00
Bardia Mahjour 916d37a2bc [DA] Improve dump to show source and sink of the dependence
Summary:
The current da printer shows the dependence without indicating
which instructions are being considered as the src vs dst. It
also silently ignores call instructions, despite the fact that
they create confused dependence edges to other memory
instructions. This patch addresses these two issues plus a
couple of minor non-functional improvements.

Authored By: bmahjour

Reviewer: dmgreen, fhahn, philip.pfaffe, chandlerc

Reviewed By: dmgreen, fhahn

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71088
2019-12-11 11:48:16 -05:00
Florian Hahn 4fe92abceb [AArch64] Skip debug ops with regsOverlap in AArch64 LD/ST opt.
This fixes a crash when debug instructions are in between 2 stores.
2019-12-11 16:26:31 +00:00
Craig Topper 3adc819b7a [X86] Erase dead LEA instruction after converting it to MOV in FixupLEAPass::processInstrForSlow3OpLEA. 2019-12-11 07:51:23 -08:00
Ulrich Weigand ac473394ff [SystemZ] Fix 128-bit strict FMA expansion pre-z14
Before z14, we did not have any FMA instruction for 128-bit
floating-point, so the @llvm.fma.f128 intrinsic needs to be
expanded to a libcall on those platforms.

This worked correctly for regular FMA, but was implemented
incorrectly for the strict version.  This was not noticed
because we did not have test coverage for this case.

This patch fixes that incorrect expansion and adds the
missing test cases.
2019-12-11 16:32:08 +01:00
Kit Barton 942c9946cc [Loop] Add isRotated method to Loop class.
Summary:
This patch adds a method to determine if a loop is in rotated form (the latch is
an exiting block). It also modifies the getLoopGuardBranch method to use this
new method. This method can also be used in Loopfusion. Once this patch lands I
will make the corresponding changes there.

Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel

Reviewed By: Meinersbur

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65958
2019-12-11 09:43:10 -05:00
Russell Gallop df494f7512 [Support] Add TimeTraceScope constructor without detail arg
This simplifies code where no extra details are required
Also don't write out detail when it is empty.

Differential Revision: https://reviews.llvm.org/D71347
2019-12-11 14:32:21 +00:00
Matt Arsenault 49d731b5e0 Verifier: Check frame-pointer attribute values
There are a few places that check specific string attributes have
particular values, and assert if they are something else. The verifier
should catch these kinds of cases.
2019-12-11 19:53:49 +05:30
Kerry McLaughlin c0a3ab3655 Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
This reverts commit 3f5bf35f86 as it was
causing build failures in llvm-clang-x86_64-expensive-checks:

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045
2019-12-11 13:58:39 +00:00
Florian Hahn 17554b8961 [AArch64] Teach Load/Store optimizier to rename store operands for pairing.
In some cases, we can rename a store operand, in order to enable pairing
of stores.  For store pairs, that cannot be merged because the first
tored register is defined in between the second store, we try to find
suitable rename register.

First, we check if we can rename the given register:

1. The first store register must be killed at the store, which means we
   do not have to rename instructions after the first store.
2. We scan backwards from the first store, to find the definition of the
   stored register and check all uses in between are renamable. Along
   they way, we collect the minimal register classes of the uses for
   overlapping (sub/super)registers.

Second, we try to find an available register from the minimal physical
register class of the original register. A suitable register must not be

1. defined before FirstMI
2. between the previous definition of the register to rename
3. a callee saved register.

We use KILL flags to clear defined registers while scanning from the
beginning to the end of the block.

This triggers quite often, here are the top changes for MultiSource,
SPEC2000, SPEC2006 compiled with -O3 for iOS:

Metric: aarch64-ldst-opt.NumPairCreated

Program                                        base     patch    diff
 test-suite...nch/fourinarow/fourinarow.test     2.00    39.00   1850.0%
 test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test    46.00    80.00   73.9%
 test-suite...chmarks/Olden/power/power.test    70.00    96.00   37.1%
 test-suite...cations/hexxagon/hexxagon.test    29.00    39.00   34.5%
 test-suite...nchmarks/McCat/05-eks/eks.test   100.00   132.00   32.0%
 test-suite.../Trimaran/enc-rc4/enc-rc4.test    46.00    59.00   28.3%
 test-suite...T2006/473.astar/473.astar.test   160.00   200.00   25.0%
 test-suite.../Trimaran/enc-md5/enc-md5.test     8.00    10.00   25.0%
 test-suite...telecomm-gsm/telecomm-gsm.test   113.00   139.00   23.0%
 test-suite...ediabench/gsm/toast/toast.test   113.00   139.00   23.0%
 test-suite...Source/Benchmarks/sim/sim.test    91.00   111.00   22.0%
 test-suite...C/CFP2000/179.art/179.art.test    41.00    49.00   19.5%
 test-suite...peg2/mpeg2dec/mpeg2decode.test   245.00   279.00   13.9%
 test-suite...marks/Olden/health/health.test    16.00    18.00   12.5%
 test-suite...ks/Prolangs-C/cdecl/cdecl.test    90.00   101.00   12.2%
 test-suite...fice-ispell/office-ispell.test    91.00   100.00    9.9%
 test-suite...oxyApps-C/miniGMG/miniGMG.test   430.00   465.00    8.1%
 test-suite...lowfish/security-blowfish.test    39.00    42.00    7.7%
 test-suite.../Applications/spiff/spiff.test    42.00    45.00    7.1%
 test-suite...arks/mafft/pairlocalalign.test   2473.00  2646.00   7.0%
 test-suite.../VersaBench/ecbdes/ecbdes.test    29.00    31.00    6.9%
 test-suite...nch/beamformer/beamformer.test   220.00   235.00    6.8%
 test-suite...CFP2000/177.mesa/177.mesa.test   2110.00  2252.00   6.7%
 test-suite...ve-susan/automotive-susan.test   109.00   116.00    6.4%
 test-suite...s-C/unix-smail/unix-smail.test    65.00    69.00    6.2%
 test-suite...CI_Purple/SMG2000/smg2000.test   1194.00  1265.00   5.9%
 test-suite.../Benchmarks/nbench/nbench.test   472.00   500.00    5.9%
 test-suite...oxyApps-C/miniAMR/miniAMR.test   248.00   262.00    5.6%
 test-suite...quoia/CrystalMk/CrystalMk.test    18.00    19.00    5.6%
 test-suite...rks/tramp3d-v4/tramp3d-v4.test   7331.00  7710.00   5.2%
 test-suite.../Benchmarks/Bullet/bullet.test   5651.00  5938.00   5.1%
 test-suite...ternal/HMMER/hmmcalibrate.test   750.00   788.00    5.1%
 test-suite...T2006/456.hmmer/456.hmmer.test   764.00   802.00    5.0%
 test-suite...ications/JM/ldecod/ldecod.test   1028.00  1079.00   5.0%
 test-suite...CFP2006/444.namd/444.namd.test   1368.00  1434.00   4.8%
 test-suite...marks/7zip/7zip-benchmark.test   4471.00  4685.00   4.8%
 test-suite...6/464.h264ref/464.h264ref.test   3122.00  3271.00   4.8%
 test-suite...pplications/oggenc/oggenc.test   1497.00  1565.00   4.5%
 test-suite...T2000/300.twolf/300.twolf.test   742.00   774.00    4.3%
 test-suite.../Prolangs-C/loader/loader.test    24.00    25.00    4.2%
 test-suite...0.perlbench/400.perlbench.test   1983.00  2058.00   3.8%
 test-suite...ications/JM/lencod/lencod.test   4612.00  4785.00   3.8%
 test-suite...yApps-C++/PENNANT/PENNANT.test   995.00   1032.00   3.7%
 test-suite...arks/VersaBench/dbms/dbms.test    54.00    56.00    3.7%

Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D70450
2019-12-11 13:50:11 +00:00
Guillaume Chatelet 0a0d54b357 [Alignment][NFC] Introduce Align in IRBuilder
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71343
2019-12-11 14:41:23 +01:00
James Henderson 2f8155023a [DebugInfo] Fix printing of DW_LNS_set_isa
The Isa register is a uint8_t, but at least on Windows this is
internally an unsigned char, which meant that prior to this patch it got
formatted as an ASCII character, rather than a decimal number. This
patch fixes this by casting it to a uint64_t before printing. I did it
this way instead of using a uint8_t formatter because a) it is simpler,
and b) it allows us to change the internal type of Isa in the future
without this code breaking.

I also took the opportunity to test the printing of the other standard
opcodes.

Reviewed by: probinson

Differential Revision: https://reviews.llvm.org/D71274
2019-12-11 13:38:41 +00:00
Guillaume Chatelet 3491109587 Rollback assumeAligned in MemorySanitizer
Summary: Rollback of parts of D71213. After digging more into the code I think we should leave 0 when creating the instructions (CreateMemcpy, CreateMaskedStore, CreateMaskedLoad). It's probably fine for MemorySanitizer because Alignement is resolved but I'm having a hard time convincing myself it has no impact at all (although tests are passing).

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71332
2019-12-11 14:25:21 +01:00
Andrzej Warzynski 65651f197a [AArch64][SVE] Add DAG combine rules for gather loads and sext/zext
Summary:
These changes allow us to support sign-extending gather loads with the
exisiting intrinsics (i.e. @llvm.aarch64.sve.ld1.gather.*).

Reviewers: sdesmalen, huntergr, kmclaughlin, efriedma, rengolin, rovka, dancgr, mgudim

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential revision: https://reviews.llvm.org/D70812
2019-12-11 12:56:18 +00:00
Oliver Stannard 6ae3d310bd Revert "Reland [AArch64][MachineOutliner] Return address signing for outlined functions"
This reverts commit cec2d5c174.

Reverting because this is still creating outlined functions with return
address signing instructions with mismatches SP values. For example:

  int *volatile v;

  void foo(int x) {
    int a[x];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
  }

  void bar(int x) {
    int a[x];
    v = 0;
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
  }

This generates these two outlined functions, both of which modify SP
between the paciasp and retaa instructions:

  $ clang --target=aarch64-arm-none-eabi -march=armv8.3-a -c test2.c -o - -S -Oz -mbranch-protection=pac-ret+leaf
  ...
  OUTLINED_FUNCTION_0:                    // @OUTLINED_FUNCTION_0
          .cfi_sections .debug_frame
          .cfi_startproc
  // %bb.0:
          paciasp
          .cfi_negate_ra_state
          mov     w8, w0
          lsl     x8, x8, #2
          add     x8, x8, #15             // =15
          mov     x9, sp
          and     x8, x8, #0x7fffffff0
          sub     x8, x9, x8
          mov     x29, sp
          mov     sp, x8
          adrp    x9, v
          retaa
  ...
  OUTLINED_FUNCTION_1:                    // @OUTLINED_FUNCTION_1
          .cfi_startproc
  // %bb.0:
          paciasp
          .cfi_negate_ra_state
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          mov     sp, x29
          retaa
2019-12-11 12:06:20 +00:00
Simon Tatham 1fed9a0c0c [TableGen] Add bang-operators !getop and !setop.
Summary:
These allow you to get and set the operator of a dag node, without
affecting its list of arguments.

`!getop` is slightly fiddly because in many contexts you need its
return value to have a static type more specific than 'any record'. It
works to say `!cast<BaseClass>(!getop(...))`, but it's cumbersome, so
I made `!getop` take an optional type suffix itself, so that can be
written as the shorter `!getop<BaseClass>(...)`.

Reviewers: hfinkel, nhaehnle

Reviewed By: nhaehnle

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71191
2019-12-11 12:05:22 +00:00
Kerry McLaughlin 3f5bf35f86 [AArch64][SVE] Implement intrinsics for non-temporal loads & stores
Summary:
Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.

Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71000
2019-12-11 11:13:51 +00:00
Sjoerd Meijer d97cf1f889 [ARM][LowOverheadLoops] Remove dead loop update instructions.
After creating a low-overhead loop, the loop update instruction was still
lingering around hurting performance. This removes dead loop update
instructions, which in our case are mostly SUBS instructions.

To support this, some helper functions were added to MachineLoopUtils and
ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses
before a particular loop instruction, respectively.

This is a first version that removes a SUBS instruction when there are no other
uses inside and outside the loop block, but there are some more interesting
cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which
shows that there is room for improvement. For example, we can't handle this
case yet:

    ..
    dlstp.32  lr, r2
  .LBB0_1:
    mov r3, r2
    subs  r2, #4
    vldrh.u32 q2, [r1], #8
    vmov  q1, q0
    vmla.u32  q0, q2, r0
    letp  lr, .LBB0_1
  @ %bb.2:
    vctp.32 r3
    ..

which is a lot more tricky because r2 is not only used by the subs, but also by
the mov to r3, which is used outside the low-overhead loop by the vctp
instruction, and that requires a bit of a different approach, and I will follow
up on this.

Differential Revision: https://reviews.llvm.org/D71007
2019-12-11 10:20:19 +00:00
Simon Tatham bd0f271c9e [ARM][MVE] Add intrinsics for immediate shifts. (reland)
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.

There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.

In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: dmgreen, MarkMurrayARM

Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71065
2019-12-11 10:10:09 +00:00
Sam Parker ee7579409b [ARM][TypePromotion] Enable by default
Enable the TypePromotion pass my default (again).

This patch was originally committed in 393dacacf7.
This patch was reverted in a38396939c.

Differential Revision: https://reviews.llvm.org/D70998
2019-12-11 10:00:16 +00:00
QingShan Zhang eba7cbd3d0 [NFC][PowerPC] Remove the dead conditions in the if(cond) 2019-12-11 09:57:06 +00:00
shkzhang 1408e7e175 [PowerPC] [CodeGen] Use MachineBranchProbabilityInfo in EarlyIfPredicator to avoid the potential bug
Summary:
In the function `EarlyIfPredicator::shouldConvertIf()`, we call
`TII->isProfitableToIfCvt()` with `BranchProbability::getUnknown()`, it may
cause the potential assertion error for those hook which use `BranchProbability`
in `isProfitableToIfCvt()`, for example `SystemZ`.
`SystemZ` use `Probability < BranchProbability(1, 8))` in the function
`SystemZInstrInfo::isProfitableToIfCvt()`, if we call this function with
`BranchProbability::getUnknown()`, it will cause assertion error.

This patch is to fix the potential bug.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D71273
2019-12-11 04:46:00 -05:00
Florian Hahn 11f311875f [LiveRegUnits] Add phys_regs_and_masks iterator range (NFC).
This iterator range just includes physical registers and register masks,
which are interesting when dealing with register liveness.

Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D70562
2019-12-11 09:34:42 +00:00
Guillaume Chatelet 8a7c52bc22 [Alignment][NFC] Introduce Align in SROA
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71277
2019-12-11 09:34:38 +01:00
QingShan Zhang f99297176c [PowerPC] Exploitate the Vector Integer Average Instructions
PowerPC has instruction to do the semantics of this piece of code:

vector int foo(vector int m, vector int n) {
  return (m + n + 1) >> 1;
}
This patch is adding the match rule to select it.

Differential Revision: https://reviews.llvm.org/D71002
2019-12-11 07:25:57 +00:00
Craig Topper d4345636e6 [LegalizeTypes] Remove manual worklist management from SoftenFloatRes_FP_EXTEND.
I think this is no longer needed. The system should take care
of legalizing any new nodes that are added. I think this might
have been needed prior to r371709 or r307053.
2019-12-10 22:33:31 -08:00
Nico Weber caa4120906 Revert "[DebugInfo] Refactored macro related generation, added a test case for macinfo.dwo emission."
This reverts commit 307f60a1a3.

DebugInfo/X86/debug-macinfo-split-dwarf.ll fails on Windows:

Command Output (stdout):
--
$ ":" "RUN: at line 1"
$ "c:\src\llvm-project\out\gn\bin\llc.exe" "-mtriple=x86_64-pc-windows-gnu" "-O0" "-split-dwarf-file=foo.dwo" "-filetype=obj"
Assertion failed: Section && "Cannot switch to a null section!", file ../../llvm/lib/MC/MCStreamer.cpp, line 1103
Stack dump:
0.	Program arguments: c:\src\llvm-project\out\gn\bin\llc.exe -mtriple=x86_64-pc-windows-gnu -O0 -split-dwarf-file=foo.dwo -filetype=obj
2019-12-10 21:32:30 -05:00
Craig Topper 935d41e4bd [X86] Split v64i1 arguments into 2 v32i1s that will be promoted to v32i8 under min-legal-vector-width=256
This is an improvement to 88dacbd436
2019-12-10 17:29:02 -08:00