Commit Graph

139048 Commits

Author SHA1 Message Date
Matt Arsenault 71131db689 AMDGPU: Improve <2 x i24> arguments and return value handling
This was asserting for GlobalISel. For SelectionDAG, this was
passing this on the stack. Instead, scalarize this as if it were a
32-bit vector.
2020-09-16 11:21:56 -04:00
Sebastian Neubauer 833b3b0d3a [AMDGPU] Add v3f16/v3i16 support to SDag
Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.

This patch only implements it for the selection dag.
GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and
GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.

Differential Revision: https://reviews.llvm.org/D84420
2020-09-16 17:20:27 +02:00
Simon Pilgrim cd46151202 [X86] Assert that we've found a terminator instruction. NFCI.
Fixes clang static analayzer null dereference warning.
2020-09-16 16:17:49 +01:00
Jay Foad 90777e2924 [AMDGPU] Enable scheduling around FP MODE-setting instructions
Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is
marked as having unmodeled side effects, which makes the machine
scheduler treat it as a barrier. Now that we have proper implicit $mode
operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead
for setregs that only touch the FP MODE bits, to give the scheduler more
freedom.

Differential Revision: https://reviews.llvm.org/D87446
2020-09-16 16:10:47 +01:00
Simon Pilgrim aa4b0b755a [X86][SSE] Move VZEXT_MOVL(INSERT_SUBVECTOR(UNDEF,X,0)) handling into combineTargetShuffle.
Now that we're getting better at combining shuffles of different vector widths, this can now be performed as part of the standard target shuffle combines and isn't required for cleanup.

Exposed a minor issue in combineX86ShufflesRecursively where we failed to check if a shuffle's src ops were simple types.
2020-09-16 16:08:31 +01:00
Dangeti Tharun kumar 01e2b394ee [Partial Inliner] Compute intrinsic cost through TTI
https://bugs.llvm.org/show_bug.cgi?id=45932

assert(OutlinedFunctionCost >= Cloner.OutlinedRegionCost && "Outlined function cost should be no less than the outlined region") getting triggered in computeBBInlineCost.

Intrinsics like "assume" are considered regular function calls while computing costs.
This patch enables computeBBInlineCost to queries TTI for intrinsic call cost.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87132
2020-09-16 15:12:31 +01:00
Sanjay Patel 24238f09ed [SLP] fix formatting; NFC
Also move variable declarations closer to usage and add code comments.
2020-09-16 08:50:27 -04:00
Sam Parker 3ce9ec0cfa [ARM] Reorder some logic
Re-order some checks in ValidateMVEInst.
2020-09-16 13:39:22 +01:00
Sanjay Patel 6a23668e78 [SLP] remove uses of 'auto' that obscure functionality; NFC 2020-09-16 08:26:21 -04:00
Sanjay Patel 0cee1bf5d1 [SLP] remove redundant size check; NFC
We bail out on small array size anyway.
2020-09-16 08:11:19 -04:00
Sanjay Patel bbad998bab [SLP] move loop index variable declaration to its use; NFC 2020-09-16 07:59:31 -04:00
Sanjay Patel 158989184e [SLP] change poorly named variable; NFC
'V' shadows a function argument.
2020-09-16 07:59:31 -04:00
Sam Parker 1c421046d7 [RDA] Fix getUniqueReachingDef for self loops
We've fixed the case where this could return an instruction after the
given instruction, but also means that we can falsely return a
'unique' def when they could be one coming from the backedge of a
loop.

Differential Revision: https://reviews.llvm.org/D87751
2020-09-16 12:44:23 +01:00
Sam Parker a63b2a4614 [ARM] Fix tail predication predicate tracking
Clear the CurrentPredicate when we find an instruction which would
completely overwrite the VPR. This fix essentially means we're back
to not really being able to handle VPT instructions when tail
predicating.

Differential Revision: https://reviews.llvm.org/D87610
2020-09-16 11:59:29 +01:00
Sam Parker 86172ce378 [ARM] Add more validForTailPredication
Modify the unit test to inspect all MVE instructions and mark the
load/store/move of vpr/p0 as valid, as well as the remaining scalar
shifts.

Differential Revision: https://reviews.llvm.org/D87753
2020-09-16 11:51:50 +01:00
Simon Pilgrim 3f682611ab [DAG] Remover getOperand() call. NFCI. 2020-09-16 11:18:58 +01:00
Alok Kumar Sharma 159abe09d2 [DebugInfo][flang] DISubrange support for fortran assumed size array
This is needed to support assumed size array of fortran which can have missing upperBound/count
, contrary to current DISubrange support.
Example:
subroutine sub (array1, array2)
  integer :: array1 (*)
  integer :: array2 (4:9, 10:*)

  array1(7:8) = 9
  array2(5, 10) = 10
end subroutine
Now the validation check is relaxed for fortran.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D87500
2020-09-16 14:15:53 +05:30
Sam Tebbs ef0b9f3307 [ARM][LowOverheadLoops] Combine a VCMP and VPST into a VPT
This patch combines a VCMP followed by a VPST into a VPT, which has the
same semantics as the combination of the former two.
2020-09-16 09:27:10 +01:00
Yvan Roux 070b96962f [ARM][MachineOutliner] Add calls handling.
Handles calls inside outlined regions, by saving and restoring the link
register.

Differential Revision: https://reviews.llvm.org/D87136
2020-09-16 09:54:26 +02:00
Craig Topper 41f4cd60d5 [X86] Don't scalarize gather/scatters with non-power of 2 element counts. Widen instead.
We can pad the mask with zeros in order to widen. We already do
this for power 2 types that are smaller than a legal type.
2020-09-15 23:22:53 -07:00
Craig Topper 2ce1a697f0 [X86] Always use 16-bit displacement in 16-bit mode when there is no base or index register.
Previously we only did this if the immediate fit in 16 bits, but
the GNU assembler seems to just truncate.

Fixes PR46952
2020-09-15 19:31:48 -07:00
Krzysztof Parzyszek 5f4abb7fab [Hexagon] Replace incorrect pattern for vpackl HWI32 -> HVi8
V6_vdealb4w is not correct for pairs, use V6_vpackeh/V6_vpackeb instead.
2020-09-15 20:34:50 -05:00
Arthur Eubanks ba12e77ec1 [NewPM] Port strip* passes to NPM
strip-nondebug and strip-debug-declare have no existing associated tests

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87639
2020-09-15 18:25:12 -07:00
Arthur Eubanks f7aa1563eb [LowerSwitch][NewPM] Port lowerswitch to NPM
Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87726
2020-09-15 18:18:31 -07:00
Wenlei He 2c391a5a14 [LICM] Make Loop ICM profile aware again
D65060 was reverted because it introduced non-determinism by using BFI counts from already freed blocks. The parent of this revision fixes that by using a VH callback on blocks to prevent this from happening and makes sure BFI data is passed correctly in LoopStandardAnalysisResults.

This re-introduces the previous optimization of using BFI data to prevent LICM from hoisting/sinking if the instruction will end up moving to a colder block.

Internally at Facebook this change results in a ~7% win in a CPU related metric in one of our big services by preventing hoisting cold code into a hot pre-header like the added test case demonstrates.

Testing:
ninja check

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87551
2020-09-15 17:21:58 -07:00
Jessica Paquette ffe9986de4 [AArch64][GlobalISel] Refactor + improve CMN, ADDS, and ADD emit functions
These functions were extremely similar:

- `emitADD`
- `emitADDS`
- `emitCMN`

Refactor them a little, introducing a more generic `emitInstr` function to
do most of the work.

Also add support for the immediate + shifted register addressing modes in each
of them.

Update select-uaddo.mir to show that selecing ADDS now supports folding
immediates + shifts. (I don't think this can impact CMN, because the CMN checks
require a G_SUB with a non-constant on the RHS.)

This is around a 0.02% code size improvement on CTMark at -O3.

Differential Revision: https://reviews.llvm.org/D87529
2020-09-15 17:18:05 -07:00
Arthur Eubanks 91332c4dbb [CGSCC][NewPM] Fix adding mutually recursive new functions
When adding a new function via addNewFunctionIntoRefSCC(), it creates a
new node and immediately populates the edges. Since populateSlow() calls
G->get() on all referenced functions, it will create a node (but not
populate it) for functions that haven't yet been added. If we add two
mutually recursive functions, the assert that the node should never have
been created will fire when the second function is added. So here we
remove that assert since the node may have already been created (but not
yet populated).

createNode() is only called from addNewFunctionInto{,Ref}SCC().

https://bugs.llvm.org/show_bug.cgi?id=47502

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D87623
2020-09-15 16:44:08 -07:00
Volkan Keles 79378b1b75 GlobalISel: Fix a failing combiner test
test/CodeGen/AArch64/GlobalISel/combine-trunc.mir was failing
due to the different order for evaluating function arguments.
This patch updates the related code to fix the issue.
2020-09-15 16:40:38 -07:00
Wenlei He 2ea4c2c598 [BFI] Make BFI information available through loop passes inside LoopStandardAnalysisResults
~~D65060 uncovered that trying to use BFI in loop passes can lead to non-deterministic behavior when blocks are re-used while retaining old BFI data.~~

~~To make sure BFI is preserved through loop passes a Value Handle (VH) callback is registered on blocks themselves. When a block is freed it now also wipes out the accompanying BFI entry such that stale BFI data can no longer persist resolving the determinism issue. ~~

~~An optimistic approach would be to incrementally update BFI information throughout the loop passes rather than only invalidating them on removed blocks. The issues with that are:~~
~~1. It is not clear how BFI information should be incrementally updated: If a block is duplicated does its BFI information come with? How about if it's split/modified/moved around? ~~
~~2. Assuming we can address these problems the implementation here will be a massive undertaking. ~~

~~There's a known need of BFI in LICM analysis which requires correct but not incrementally updated BFI data. A follow-up change can register BFI in all loop passes so this preserved but potentially lossy data is available to any loop pass that wants it.~~

See: D75341 for an identical implementation of preserving BFI via VH callbacks. The previous statements do still apply but this change no longer has to be in this diff because it's already upstream 😄 .

This diff also moves BFI to be a part of LoopStandardAnalysisResults since the previous method using getCachedResults now (correctly!) statically asserts (D72893) that this data isn't static through the loop passes.

Testing
Ninja check

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D86156
2020-09-15 16:16:24 -07:00
Aditya Nandakumar 97203cfd6b [GISel] Add new GISel combiners for G_MUL
https://reviews.llvm.org/D87668

Patch adds two new GICombinerRules, one for G_MUL(X, 1) and another for G_MUL(X, -1).
G_MUL(X, 1) is an identity combine, and G_MUL(X, -1) gets replaced with G_SUB(0, X).
Patch additionally adds new combiner tests for the AArch64 target to test these
new combiner rules, as well as updates AMDGPU GISel tests.

Patch by mkitzan
2020-09-15 16:08:47 -07:00
Mircea Trofin 61fc10d6a5 [ThinLTO] add post-thinlto-merge option to -lto-embed-bitcode
This will embed bitcode after (Thin)LTO merge, but before optimizations.
In the case the thinlto backend is called from clang, the .llvmcmd
section is also produced. Doing so in the case where the caller is the
linker doesn't yet have a motivation, and would require plumbing through
command line args.

Differential Revision: https://reviews.llvm.org/D87636
2020-09-15 15:56:11 -07:00
Volkan Keles a4e35cc2ec GlobalISel: Add combines for G_TRUNC
https://reviews.llvm.org/D87050
2020-09-15 15:50:34 -07:00
Stanislav Mekhanoshin 277de43d88 [AMDGPU] Unify intrinsic ret/nortn interface
We have a single noret intrinsic an a lot of special handling
around it. Declare it just as any other but do not define rtn
instructions itself instead.

Differential Revision: https://reviews.llvm.org/D87719
2020-09-15 15:26:42 -07:00
Xun Li 7b4cc0961b [TSAN] Handle musttail call properly in EscapeEnumerator (and TSAN)
Call instructions with musttail tag must be optimized as a tailcall, otherwise could lead to incorrect program behavior.
When TSAN is instrumenting functions, it broke the contract by adding a call to the tsan exit function inbetween the musttail call and return instruction, and also inserted exception handling code.
This happend throguh EscapeEnumerator, which adds exception handling code and returns ret instructions as the place to insert instrumentation calls.
This becomes especially problematic for coroutines, because coroutines rely on tail calls to do symmetric transfers properly.
To fix this, this patch moves the location to insert instrumentation calls prior to the musttail call for ret instructions that are following musttail calls, and also does not handle exception for musttail calls.

Differential Revision: https://reviews.llvm.org/D87620
2020-09-15 15:20:05 -07:00
Huihui Zhang 3b7f5166bd [SLPVectorizer][SVE] Skip scalable-vector instructions before vectorizeSimpleInstructions.
For scalable type, the aggregated size is unknown at compile-time.
Skip instructions with scalable type to ensure the list of instructions
for vectorizeSimpleInstructions does not contains any scalable-vector instructions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D87550
2020-09-15 13:10:15 -07:00
Matt Arsenault 7d6ca2ec57 InferAddressSpaces: Fix assert with unreachable code
Invalid IR in unreachable code is technically valid IR. In this case,
the address space of the value was never inferred, and we tried to
rewrite it with an invalid address space value which would assert.
2020-09-15 15:48:43 -04:00
Muhammad Asif Manzoor d417488ef5 [AArch64][SVE] Add lowering for llvm fsqrt
Add the functionality to lower fsqrt for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D87707
2020-09-15 15:26:17 -04:00
Albion Fung 05aa997d51 [PowerPC] Implement __int128 vector divide operations
This patch implements __int128 vector divide operations for ISA3.1.

Differential Revision: https://reviews.llvm.org/D85453
2020-09-15 15:19:35 -04:00
Florian Hahn 3d42d54955 [ConstraintElimination] Add constraint elimination pass.
This patch is a first draft of a new pass that adds a more flexible way
to eliminate compares based on more complex constraints collected from
dominating conditions.

In particular, it aims at simplifying conditions of the forms below
using a forward propagation approach, rather than instcomine-style
ad-hoc backwards walking of def-use chains.

    if (x < y)
      if (y < z)
        if (x < z) <- simplify

or

    if (x + 2 < y)
        if (x + 1 < y) <- simplify assuming no wraps

The general approach is to collect conditions and blocks, sort them by
dominance and then iterate over the sorted list. Conditions are turned
into a linear inequality and add it to a system containing the linear
inequalities that hold on entry to the block. For blocks, we check each
compare against the system and see if it is implied by the constraints
in the system.

We also keep a stack of processed conditions and remove conditions from
the stack and the constraint system once they go out-of-scope (= do not
dominate the current block any longer).

Currently there still are the least the following areas for improvements

* Currently large unsigned constants cannot be added to the system
  (coefficients must be represented as integers)
* The way constraints are managed currently is not very optimized.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D84547
2020-09-15 19:31:11 +01:00
Craig Topper 05134877e6 [X86] Use Align in reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore. Correct pointer info.
If we offset the pointer, we also need to offset the pointer info

Differential Revision: https://reviews.llvm.org/D87593
2020-09-15 11:22:02 -07:00
Arthur Eubanks 3f69b2140f [NewPM][opt] Fix -globals-aa not being recognized as alias analysis in NPM
Was missing MODULE_ALIAS_ANALYSIS, previously only FUNCTION_ALIAS_ANALYSIS was taken into account.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87664
2020-09-15 11:18:19 -07:00
Fangrui Song 03f1516d60 [MemoryBuffer] Revert unintended MemoryBuffer change from D86996
Fixes SupportsTest MemoryBufferTest.mmapVolatileNoNull
2020-09-15 10:21:05 -07:00
Florian Hahn 3a59628f3c Revert "[DSE] Switch to MemorySSA-backed DSE by default."
This reverts commit fb109c42d9.

Temporarily revert due to a mis-compile pointed out at D87163.
2020-09-15 18:07:56 +01:00
Petr Hosek 9c73e55510 Revert "[DebugInfo] Remove dots from getFilenameByIndex return value"
This is failing on Windows bots due to path separator normalization.

This reverts commit 042c235068.
2020-09-15 10:06:47 -07:00
Fangrui Song 4452cc4086 [VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsan
Similar to the tsan suppression in
`Utils/VNCoercion.cpp:getLoadLoadClobberFullWidthSize` (rL175034; load widening used by GVN),
the D81766 optimization should be suppressed under tsan due to potential
spurious data race reports:

  struct A {
    int i;
    const short s; // the load cannot be vectorized because
    int modify;    // it overlaps with bytes being concurrently modified
    long pad1, pad2;
  };
  // __tsan_read16 does not know that some bytes are undef and accessing is safe

Similarly, under asan, users can mark memory regions with
`__asan_poison_memory_region`. A widened load can lead to a spurious
use-after-poison error. hwasan/memtag should be similarly suppressed.

`mustSuppressSpeculation` suppresses asan/hwasan/tsan but not memtag, so
we need to exclude memtag in `vectorizeLoadInsert`.

Note, memtag suppression can be relaxed if the load is aligned to the
its granule (usually 16), but that is out of scope of this patch.

Reviewed By: spatel, vitalybuka

Differential Revision: https://reviews.llvm.org/D87538
2020-09-15 09:47:21 -07:00
Jonas Devlieghere 127faae752 [lldb] Add -l/--language option to script command
Make it possible to run the script command with a different language
than currently selected.

  $ ./bin/lldb -l python
  (lldb) script -l lua
  >>> io.stdout:write("Hello, World!\n")
  Hello, World!

When passing the language option and a raw command, you need to separate
the flag from the script code with --.

  $ ./bin/lldb -l python
  (lldb) script -l lua -- io.stdout:write("Hello, World!\n")
  Hello, World!

Differential revision: https://reviews.llvm.org/D86996
2020-09-15 09:40:17 -07:00
Simon Pilgrim a43e68b58b [X86][AVX] lowerShuffleWithSHUFPS - handle missed canonicalization cases.
PR47534 exposes a case where calling lowerShuffleWithSHUFPS directly from a derived repeated mask (found by is128BitLaneRepeatedShuffleMask) results in us using an non-canonicalized mask.

The missed canonicalization in this case is trivial - just commute the mask so we have more (swapped) LHS than RHS references so lowerShuffleWithSHUFPS can handle it.
2020-09-15 17:31:08 +01:00
Guozhi Wei 243ffd0cad [MachineBasicBlock] Fix a typo in function copySuccessor
The condition used to decide if need to copy probability should be reversed.

Differential Revision: https://reviews.llvm.org/D87417
2020-09-15 09:18:18 -07:00
Simon Pilgrim 2b42d53e5e SLPVectorizer.h - remove unnecessary AliasAnalysis.h include. NFCI.
Forward declare AAResults instead of the (old) AliasAnalysis type.

Remove includes from SLPVectorizer.cpp that are already included in SLPVectorizer.h.
2020-09-15 16:24:05 +01:00
Sanjay Patel 8985755762 [InstSimplify] add limit folds for fmin/fmax
If the constant operand is the opposite of the min/max value,
then the result must be the other value.

This is based on the similar codegen transform proposed in:
D87571
2020-09-15 10:58:44 -04:00