Commit Graph

145800 Commits

Author SHA1 Message Date
Qiu Chaofan 033c9c2552 [PowerPC] Fix use check of swap-reduction
This will fix swap-reduction in DAGISel for cases where COPY_TO_REGCLASS
has multiple uses.
2021-04-07 15:55:52 +08:00
Max Kazantsev fee330824a [SCEV] Fix false-positive recognition of simple recurrences. PR49856
A value from reachable block may come to a Phi node as its input from
unreachable block. This may confuse matchSimpleRecurrence  which
has no access to DomTree and can falsely recognize something as a recurrency
because of this effect, as the attached test shows.

Patch `ae7b1e` deals with half of this problem, but it only accounts from
the case when an unreachable instruction comes to Phi as an input.

This patch provides a generalization by checking that no Phi block's
predecessor is unreachable (no matter what the input is).

Differential Revision: https://reviews.llvm.org/D99929
Reviewed By: reames
2021-04-07 13:55:17 +07:00
Petr Hosek a547b4e26b Revert "[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)"
This reverts commit 31d219d299 which
causes an infinite loop when compiling the XRay runtime.
2021-04-06 22:30:28 -07:00
Jonas Devlieghere 162c2759b6 [dsymutil] Stop emulating dsymutil-classic CIE caching behavior
Stop emulating dsymutil-classic which only cached the last used CIE for
reuse.
2021-04-06 20:15:41 -07:00
Jonas Devlieghere 233c24330b [dsymutil] Don't keep old abbreviations
Don't keep the old abbreviations around. This code existed for
compatibility with dsymutil-classic.
2021-04-06 19:50:17 -07:00
Jonas Devlieghere 5d07dc8977 [dsymutil] Don't emit .debug_pubnames and .debug_pubtypes
Consider the .debug_pubnames and .debug_pubtypes their own kind of
accelerator and stop emitting them together with the Apple-style
accelerator tables. The only reason we were still emitting both was for
(byte-for-byte) compatibility with dsymutil-classic.

 - This patch adds a new accelerator table kind "Pub" which can be
   specified with --accelerator=Pub.
 - This patch removes the ability to emit both pubnames/types and apple
   style accelerator tables. I don't think anyone is relying on that but
   it's worth pointing out.
 - This patch removes the --minimize option and makes this behavior the
   default. Specifying the flag will result in a warning but won't abort
   the program.

Differential revision: https://reviews.llvm.org/D99907
2021-04-06 19:01:45 -07:00
Yevgeny Rouban b5c63e30ca [NewPM] Set verify-cfg-preserved=1 by default for debug builds 2021-04-07 08:34:30 +07:00
Craig Topper 01a23dccb1 [RISCV] Add an assertion to the ReplaceNodeResults handling of bitcasts to make sure the VT is always a scalar integer. 2021-04-06 16:48:40 -07:00
Nicolás Alvarez a1aada75f5 [docs] Fix doxygen comments wrongly attached to the llvm namespace
Looking at the Doxygen-generated documentation for the llvm namespace
currently shows all sorts of random comments from different parts of the
codebase. These are mostly caused by:

- File doc comments that aren't marked with \file, so they're attached to
  the next declaration, which is usually "namespace llvm {".
- Class doc comments placed before the namespace rather than before the
  class.
- Code comments before the namespace that (in my opinion) shouldn't be
  extracted by doxygen at all.

This commit fixes these comments. The generated doxygen documentation now
has proper docs for several classes and files, and the docs for the llvm
and llvm::detail namespaces are now empty.

Reviewed By: thakis, mizvekov

Differential Revision: https://reviews.llvm.org/D96736
2021-04-07 01:20:18 +02:00
Craig Topper 2641c1f15e [RISCV] Don't custom type legalize fixed vector to scalar integer bitcasts if the fixed vector type isn't legal.
We encountered a hang in our internal code base. I'm having trouble
creating a test case because the test that hit it was testing some
code that is not upstream.
2021-04-06 15:00:33 -07:00
Sidharth Baveja d81d9e8b86 [SplitEdge] Update SplitCriticalEdge to return a nullptr only when the edge is not critical
Summary:
The function SplitCriticalEdge (called by SplitEdge) can return a nullptr in
cases where the edge is a critical. SplitEdge uses SplitCriticalEdge assuming it
can always split all critical edges, which is an incorrect assumption.

The three cases where the function SplitCriticalEdge will return a nullptr is:
1. DestBB is an exception block
2. Options.IgnoreUnreachableDests is set to true and
isa(DestBB->getFirstNonPHIOrDbgOrLifetime()) is not equal to a nullptr
3. LoopSimplify form must be preserved (Options.PreserveLoopSimplify is true)
and it cannot be maintained for a loop due to indirect branches

For each of these situations they are handled in the following way:
1. Modified the function ehAwareSplitEdge originally from
llvm/lib/Transforms/Coroutines/CoroFrame.cpp to handle the cases when the DestBB
is an exception block. This function is called directly in SplitEdge.
SplitEdge does not call SplitCriticalEdge in this case
2. Options.IgnoreUnreachableDests is set to false by default, so this situation
does not apply.
3. Return a nullptr in this situation since the SplitCriticalEdge also returned
nullptr. Nothing we can do in this case.

Reviewed By: asbirlea

Differential Revision:https://reviews.llvm.org/D94619
2021-04-06 21:24:40 +00:00
Philip Reames 4bf8985f4f Replace calls to IntrinsicInst::Create with CallInst::Create [nfc]
There is no IntrinsicInst::Create.  These are binding to the method in the super type.  Be explicitly about which method is being called.
2021-04-06 13:23:58 -07:00
Philip Reames 908215b346 Use AssumeInst in a few more places [nfc]
Follow up to a6d2a8d6f5.  These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
2021-04-06 13:18:53 -07:00
Artem Belevich d0615a93bb [NVPTX] Handle bitcast and ASC(101) when trying to avoid argument copy.
This allows us to skip the copy in few more cases.

Differential Revision: https://reviews.llvm.org/D99979
2021-04-06 13:06:00 -07:00
Philip Reames 9ef6aa020b Plumb AssumeInst through operand bundle apis [nfc]
Follow up to a6d2a8d6f5.  This covers all the public interfaces of the bundle related code.  I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.
2021-04-06 12:53:53 -07:00
Luís Marques 0c3bc1f3a4 [ASan][RISCV] Fix RISC-V memory mapping
Fixes the ASan RISC-V memory mapping (originally introduced by D87580 and
D87581). This should be an improvement both in terms of first principles
soundness and observed test failures --- test failures would occur
non-deterministically depending on the ASLR random offset.

On RISC-V Linux (64-bit), `TASK_UNMAPPED_BASE` is currently defined as
`PAGE_ALIGN(TASK_SIZE / 3)`. The non-power-of-two divisor makes the result
be the not very round number 0x1555556000. That address had to be further
rounded to ensure page alignment after the shadow scale shifting is applied.
Still, that value explains why the mapping table may look less regular than
expected.

Further cleanups:
- Moved the mapping table comment, to ensure that the two Linux/AArch64
tables stayed together;
- Removed mention of Sv48. Neither the original mapping nor this one are
compatible with an actual Linux Sv48 address space (mainline Linux still
operates Sv48 in Sv39 mode). A future patch can improve this;
- Removed the additional comments, for consistency.

Differential Revision: https://reviews.llvm.org/D97646
2021-04-06 20:46:17 +01:00
Amy Kwan bd6033eca7 [PowerPC] Materialize 34-bit constants with pli directly
Previously, 34-bit constants were materialized in selectI64Imm(), and we relied
on td pattern matching to instead produce a pli. This becomes problematic as
there is no guarantee that the 34-bit constant will reach the td pattern
selection for pli. It is also possible for other transformations (such as complex
bit permutations) to also produce and utilize the 34-bit constant materialized
through selectI64Imm().

This patch instead produces pli on Power10 directly whenever the constant fits
within 34-bits.

Differential Revision: https://reviews.llvm.org/D99906
2021-04-06 13:38:11 -05:00
Fangrui Song a7ef45bc5c [NewPM] Fix -Wunused-lambda-capture in -DLLVM_ENABLE_ASSERTIONS=off builds after D91327 2021-04-06 11:30:20 -07:00
Philip Reames fb41cae039 More precisely type code used for gc.relocate assertions [nfc] 2021-04-06 11:27:36 -07:00
Philip Reames a6d2a8d6f5 Add a subclass of IntrinsicInst for llvm.assume [nfc]
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class.  A follow up change will do the same for the newer assumption query/bundle mechanisms.
2021-04-06 11:16:22 -07:00
Arthur Eubanks 4e83e59eb8 [GVN] Add missing ICF update
performScalarPREInsertion() inserts instructions into blocks that we
need to tell ImplicitControlFlowTracking about, otherwise the ICF cache
may be invalid.

Fixes PR49193.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99909
2021-04-06 10:13:42 -07:00
Florian Hahn 4059c1c32d [SimplifyInst] Use correct type for GEPs with vector indices.
The current code does not properly handle vector indices unless they are
the first index.

At the moment LangRef gives the impression that the vector index must be
the one and only index (https://llvm.org/docs/LangRef.html#getelementptr-instruction).

But vector indices can appear at any position and according to the
verifier there may be multiple vector indices. If that's the case, the
number of elements must match.

This patch updates SimplifyGEPInst to properly handle those additional
cases.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99961
2021-04-06 17:56:10 +01:00
Craig Topper 3ae03f67fe [RISCV] Add helper function to share some of the code for isel of vector load/store intrinsics.
Many of the operands are handled the same or in the same order
for all these intrinsics. Factor out the code for selecting and
pushing them into the Operands vector.

Differential Revision: https://reviews.llvm.org/D99923
2021-04-06 09:54:24 -07:00
Jay Foad 8f798566a3 [AMDGPU] SIFoldOperands: use isUseMIInFoldList. NFC. 2021-04-06 17:53:48 +01:00
Paul Robinson 04b3c8c52c Pass -fcrash-diagnostics-dir along to LLVM
This allows frontend and backend diagnostic files to all go into the
same place.  Have it control the Windows (mini-)dump location.

Differential Revision: https://reviews.llvm.org/D99199
2021-04-06 09:30:52 -07:00
Victor Huang f98567b3fe [AIX][TLS] Add support for TLS variables to XCOFF object writer
This patch adds support for TLS variables to the XCOFF object writer:
- Add TData and TBSS sections
- Add CsectGroups for the mapping classes XCOFF::XMC_TL and XCOFF::XMC_UL
- Add XMC_UL in the enum entry of CsectStorageMapping class to print the string
  while reading the symbol properties for TLS variables
- Fix the starting address of TData and TBSS sections

Reviewed by: hubert.reinterpretcast, DiggerLin

Differential Revision: https://reviews.llvm.org/D98946
2021-04-06 10:46:07 -05:00
Simon Pilgrim 53283cc2f1 [X86][SSE] canonicalizeShuffleWithBinOps - add MOVSD/MOVSS handling. 2021-04-06 16:42:18 +01:00
Philip Reames 21d4839948 Move GCRelocateInst and GCResultInst to IntrinsicInst.h [nfc]
These two are part of the IntrinsicInst class hierarchy and it helps to cut down on some redundant includes.
2021-04-06 08:33:15 -07:00
Konstantin Zhuravlyov 844012940e AMDGPU: Add isBranch=1 to SOPP branch instructions
Differential Revision: https://reviews.llvm.org/D99955
2021-04-06 10:59:30 -04:00
Philip Reames 52ecd94cfb Remove last remnants of PR49607 migration [NFC]
The key change (4f5e92c) to switch gc.result and gc.relocate to being readnone landed nearly two weeks ago, and we haven't seen any fallout.  Time to remove the code added to make reverting easy.
2021-04-06 07:56:55 -07:00
Jan Svoboda fb6a5237aa Revert "[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction"
This reverts commit 167ea67d

This causes a bunch of build failures:
* http://lab.llvm.org:8011/#/builders/121/builds/6287
* http://green.lab.llvm.org/green/job/clang-stage1-RA/19915
2021-04-06 16:33:28 +02:00
Benjamin Kramer ce4acb01b3 Avoid unused variable warning in Release builds 2021-04-06 16:25:19 +02:00
Jay Foad efc7bf27f5 [AMDGPU] SIFoldOperands: use MachineRegisterInfo::hasOneNonDBGUser
NFC.
2021-04-06 15:23:58 +01:00
Jay Foad 005dcd196e [AMDGPU] SIFoldOperands: use range-based loops and make_early_inc_range
NFC.
2021-04-06 15:23:58 +01:00
Jay Foad ce9cca6c3a [AMDGPU] SIFoldOperands: rename tryFoldInst to tryFoldCndMask
This follows the pattern of the other tryFold* functions. NFC.
2021-04-06 15:23:58 +01:00
Jay Foad cf4f5292f6 [AMDGPU] SIFoldOperands: use getVRegDef instead of getUniqueVRegDef
We are in SSA so getVRegDef is equivalent but simpler. NFC.
2021-04-06 15:23:58 +01:00
Jay Foad e9608a84d8 [AMDGPU][SDag] Add IMG init also for image_gather4 instructions
This fixes an oversight in D99747 which moved the IMG init code from
SIAddIMGInit to AdjustInstrPostInstrSelection, but did not set the
hasPostISelHook flag on gather4 instructions.

Differential Revision: https://reviews.llvm.org/D99953
2021-04-06 14:47:20 +01:00
Kerry McLaughlin 7344f3d39a [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization
Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to
reorder FP operations. However, it may still be beneficial to vectorize the loop by moving
the reduction inside the vectorized loop and making sure that the scalar reduction value
be an input to the horizontal reduction, e.g:

  %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ]
  %load = load <8 x float>
  %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load)

This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added
by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions.
For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D98435
2021-04-06 14:45:34 +01:00
Simon Pilgrim 1dcb5b5e89 [X86] Improve optimizeCompareInstr for signed comparisons after ANDN instructions
Extend D94856 to handle 'andn' instructions as well
2021-04-06 14:16:16 +01:00
Roman Lebedev 31d219d299
[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)
https://alive2.llvm.org/ce/z/67w-wQ

We prefer `add`s over `sub`, and this particular xform
allows further folds to happen:

Fixes https://bugs.llvm.org/show_bug.cgi?id=49858
2021-04-06 15:58:14 +03:00
Simon Pilgrim b8aba76a4e LoopFlatten - CanWidenIV - Fix uninitialized variable warnings and use for-range loop. NFCI.
Fix static analysis uninitialized variable warnings, and use for-range loop iteration across WideIVs array.
2021-04-06 12:24:20 +01:00
Abhina Sreeskantharajan 82b3e28e83 [SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text
Problem:
On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable.

Solution:
This patch adds two new flags

  - OF_CRLF which indicates that CRLF translation is used.
  - OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.

Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.

So this is the behaviour per platform with my patch:

z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode

Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return

The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set.
```
  if (Flags & OF_CRLF)
    CrtOpenFlags |= _O_TEXT;
```

These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows.
./llvm/lib/Support/raw_ostream.cpp
./llvm/lib/TableGen/Main.cpp
./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
./llvm/unittests/Support/Path.cpp
./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
./clang/lib/Frontend/CompilerInstance.cpp
./clang/lib/Driver/Driver.cpp
./clang/lib/Driver/ToolChains/Clang.cpp

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99426
2021-04-06 07:23:31 -04:00
Kerry McLaughlin 857b8a73da [LoopVectorize] Change the identity element for FAdd
Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd.

Reviewed By: dmgreen, spatel

Differential Revision: https://reviews.llvm.org/D98963
2021-04-06 12:13:43 +01:00
Florian Hahn a6b06b785c [VPlan] Print VPValue operands for VPWidenPHI if possible.
For VPWidenPHIRecipes that model all incoming values as VPValue
operands, print those operands instead of printing the original PHI.

D99294 updates recipes of reduction PHIs to use the VPValue for the
incoming value from the loop backedge, making use of this new printing.
2021-04-06 12:11:21 +01:00
Dmitry Preobrazhensky 3eadcb86ab [AMDGPU][MC][GFX9] Corrected SMEM decoding
Corrected SMEM decoding when IMM=0 and OFFSET>127

Fixed bug 49819 (https://bugs.llvm.org/show_bug.cgi?id=49819)

Differential Revision: https://reviews.llvm.org/D99804
2021-04-06 14:10:46 +03:00
Simon Pilgrim 201877d572 [CostModel][X86] Improve accuracy of vXi8 multiply reduction costs
After rG47321c311bdbe0145b9bf45d822185c37b19fa50 we promote vXi8 reductions to vXi16 to create a much faster PMULLW mul reduction, followed by a (free) truncation. This avoids the high cost of repeated vXi8 multiplications (which extend+multiply+truncate to/from vXi16 types....).

Fixes the missing vXi8 mul reduction vectorization in PR42674 (Comment #20) 'mul16' test case.
2021-04-06 11:53:22 +01:00
madhur13490 167ea67d76 [IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction
This patch enhances hasAddressTaken() to ignore bitcasts as a
callee in callbase instruction. Such bitcast usage doesn't really take
the address in a useful meaningful way.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D98884
2021-04-06 09:23:46 +00:00
Simon Pilgrim ddbb58736a [KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI.
As promised in D98866
2021-04-06 10:11:41 +01:00
Sjoerd Meijer d5f1131c81 [AArch64] Default to zero-cycle-zeroing FP registers
It is generally beneficial to prefer "movi d0, #0" over "fmov s0, wzr" as this
is most efficient across all cores; it is recognised as a zeroing idiom. For
newer cores, fmov instructions can also be eliminated early and there is no
difference with movi, but some implementations lack this so is not true for
other/older cores. Thus this standardises on using movi as this should always
gives the same or better performance than the fmov with wzr.

Differential Revision: https://reviews.llvm.org/D99586
2021-04-06 09:47:50 +01:00
Sjoerd Meijer ef05b08c61 [AArch64] Use 64-bit movi for zeroing halfs/floats
This was using the .2d variant which zeros 128 bits, but using the .2s variant
that zeros 64 bits is faster on some cores.

This is a prep step for D99586 to always using movi for zeroing floats.

Differential Revision: https://reviews.llvm.org/D99710
2021-04-06 08:42:13 +01:00
Yevgeny Rouban 98742e42fc [NewPM] Fix unused lambda capture build error
Fixes commit 39e3e3aa51d: Redesign of PreserveCFG Checker
2021-04-06 13:14:16 +07:00
Yevgeny Rouban 39e3e3aa51 [NewPM] Redesign of PreserveCFG Checker
The reason for the NewPM redesign is described in the commit
  cba3e783389a: [NewPM] Disable PreservedCFGChecker ...

The checker introduces an internal custom CFG analysis that tracks
current up-to date CFG snapshot. The analysis is invalidated along
any other CFG related analysis (the key is CFGAnalyses). If the CFG
analysis is not invalidated at a functional pass exit then the checker
asserts that the CFG snapshot taken from this analysis is equals to
a snapshot of the current CFG.

Along the way:
- the function CFG::printDiff() is simplified by removing function
  name calculation. The name is printed by the caller;
- fixed CFG invalidated condition (see CFG::invalidate());
- StandardInstrumentations::registerCallbacks() gets additional
  optional parameter of type FunctionAnalysisManager*, which is
  needed by the checker to get the custom CFG analysis;
- several PM related tests updated to explicitly set
  -verify-cfg-preserved=1 as they need.

This patch is safe to land as the CFGChecker is left switched off
(the options -verify-cfg-preserved is false by default). It will be
switched on by a separate patch to minimize possible reverts.

Reviewed By: skatkov, kuhar

Differential Revision: https://reviews.llvm.org/D91327
2021-04-06 12:35:49 +07:00
Serguei Katkov 0057ec8034 [Statepoint] Factor-out utility function to get non-foldable area of STATEPOINT like instructions. NFC
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D99875
2021-04-06 11:44:37 +07:00
Craig Topper cb1028a0b9 [RISCV] When custom iseling masked stores, copy the mask into V0 instead of virtual register.
I missed a few intrinsics in 3dd4aa7d09
when I did this for masked loads and masked segment loads/stores.

Found while trying to share more code between these custom isel
functions.
2021-04-05 21:28:32 -07:00
Philip Reames 58ccbd0d08 Comment adjustments for a rename 2021-04-05 21:07:42 -07:00
Arthur Eubanks ea0e2ca1ac [SROA] Allow SROA on pointers with invariant group intrinsic uses
When we are able to SROA an alloca, we know all uses of it, meaning we
don't have to preserve the invariant group intrinsics and metadata.

It's possible that we could lose information regarding redundant
loads/stores, but that's unlikely to have any real impact since right
now the only user is Clang and vtables.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99760
2021-04-05 19:53:40 -07:00
Philip Reames 13deb6aac7 Exact ashr/lshr don't loose any set bits and are thus trivially invertible
Use that fact to improve isKnownNonEqual.
2021-04-05 19:22:36 -07:00
Philip Reames dc8d864e3a Address minor post commit feedback on 0e59dd 2021-04-05 18:22:17 -07:00
Stanislav Mekhanoshin 30b3aab329 Copy syncscope when expanding atomicrmw into cmpxchg loop
Fixes: SWDEV-280070

Differential Revision: https://reviews.llvm.org/D99902
2021-04-05 17:29:38 -07:00
Sanjay Patel e2a0f512ea [InstSimplify] fix potential miscompile in select value equivalence
This is the sibling fix to c590a9880d -
as there, we can't subsitute a vector value the equality
compare replacement that we are trying requires that the
comparison is true for the entire value. Vector select
can be partly true/false.
2021-04-05 16:52:34 -04:00
Craig Topper 780a47285a [RISCV] Add SDTCisInt to the SDTRVVSlide1 since it is only used for vslide1up.vx/vslide1down.vx.
The scalar type is already marked as XLenVT. The floating point
version would need a different rule.
2021-04-05 13:03:39 -07:00
Craig Topper af2837675a [RISCV] Split RISCVISD::VMV_S_XF_VL into separate integer and FP.
It's a bit silly, but it allows us to write stricter type
constraints for isel. There's still some extra type checks in
the generated table due to some type interference limitations
around HWMode.
2021-04-05 12:57:35 -07:00
Philip Reames b0e59dd6e1 Extract a helper for figuring out if an operator is invertible [nfc]
For use in an uncoming patch.  Left out the phi case (which could otherwise fit in this framework) as it would cause infinite recursion in said patch.  We can probably also leverage this in instcombine to ensure we keep the two sets of related analysis and transforms in sync.
2021-04-05 12:14:21 -07:00
Craig Topper 7edda698c0 [RISCV] Move VSLIDE1UP_VX pattern out of a loop that includes FP types.
FP would need VFSLIDE1UP_VF which uses an FP register.
2021-04-05 12:05:54 -07:00
Ricky Taylor 4db18d62af [M68k] Add support for Motorola literal syntax to AsmParser
These look like $00A0cf for hex and  %001010101 for binary. They are used in Motorola assembly syntax.

Differential Revision: https://reviews.llvm.org/D98519
2021-04-05 20:02:29 +01:00
Tom Stellard 982396ddd7 Revert "Fix build rules for LLVM_WITH_Z3 after D95727"
This reverts commit d66f9c4f1e.

This was a follow up fix for 43ceb74eb1, which
will be reverted.
2021-04-05 10:46:19 -07:00
Cyndy Ishida 0116d04d04 [TextAPI] move source code files out of subdirectory, NFC
TextAPI/ELF has moved out into InterfaceStubs, so theres no longer a
need to seperate out TextAPI between formats.

Reviewed By: ributzka, int3, #lld-macho

Differential Revision: https://reviews.llvm.org/D99811
2021-04-05 10:24:42 -07:00
Ta-Wei Tu 6a82ace5f2 [LoopFusion] Bails out if only the second candidate is guarded (PR48060)
If only the second candidate loop is guarded while the first one is not, fusioning
two loops might not be valid but this check is currently missing.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48060

Reviewed By: sidbav

Differential Revision: https://reviews.llvm.org/D99716
2021-04-06 01:08:56 +08:00
Fraser Cormack af3a839c70 [RISCV] Add support for bitcasts between scalars and fixed-length vectors
This patch supports bitcasts from scalar types to fixed-length vectors
and vice versa. It custom-lowers and custom-legalizes them to
EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element
vectors to hold the scalar where appropriate.

Previously, some of these would fail to select, others would be expanded
through stack loads and stores. Effort was made to ensure the codegen
avoids the stack for both legal and illegal scalar types.

Some of the codegen could be improved, but on first glance it looks like
a general optimization of EXTRACT_VECTOR_ELT when extracting an i64
element on RV32.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99667
2021-04-05 17:21:55 +01:00
Sanjay Patel c590a9880d [InstCombine] fix potential miscompile in select value equivalence
As shown in the example based on:
https://llvm.org/PR49832
...and the existing test, we can't substitute
a vector value because the equality compare
replacement that we are attempting requires
that the comparison is true for the entire
value. Vector select can be partly true/false.
2021-04-05 12:25:40 -04:00
John Paul Adrian Glaubitz 62a94b725c [M68k] Mark public functions with the LLVM_EXTERNAL_VISIBILITY macro
In 0dbcb36394, most most target symbols were made hidden by default
with the public ones marked with LLVM_EXTERNAL_VISIBILITY. When the
M68k target was added, this particular change was forgotten so that
external tools cannot make use of the public M68k target functions
in libLLVM.so. Thus, add the missing LLVM_EXTERNAL_VISIBILITY macro
to all public target functions in the M68k backend.

Differential Revision: https://reviews.llvm.org/D99869
2021-04-05 09:24:30 -07:00
Fraser Cormack 3f0df4d7b0 [RISCV] Expand scalable-vector truncstores and extloads
Caught in internal testing, these operations are assumed legal by
default, even for scalable vector types. Expand them back into separate
truncations and stores, or loads and extensions.

Also add explicit fixed-length vector tests for these operations, even
though they should have been correct already.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99654
2021-04-05 17:03:45 +01:00
Alexey Bataev 00a84f9a7f [SLP]Improve vectorization of the CmpInst instructions.
During vectorization better to postpone the vectorization of the CmpInst
instructions till the end of the basic block. Otherwise we may vectorize
it too early and may miss some vectorization patterns, like reductions.

Reworked part of D57059

Differential Revision: https://reviews.llvm.org/D99796
2021-04-05 06:22:51 -07:00
Alex Orlov 5f57793c4f * NFC. Refactored DIPrinter for better support of new print styles.
This patch introduces a DIPrinter interface to implement by different output style printer implementations. DIPrinterGNU and DIPrinterLLVM implement the GNU and LLVM output style printing respectively. No functional changes.

This refactoring clarifies and simplifies the code, and makes a new output style addition easier.

Reviewed By: jhenderson, dblaikie

Differential Revision: https://reviews.llvm.org/D98994
2021-04-05 15:40:41 +04:00
Simon Pilgrim 36d4f6d7f8 [X86] Fold xor(zext(xor(x,c1)),c2) -> xor(zext(x),xor(zext(c1),c2))
Fixes PR47603 (second case) by extending rG89afec348dbd3e5078f176e978971ee2d3b5dec8
2021-04-05 11:40:37 +01:00
Craig Topper 4708a05da0 [RISCV] Use gorciw for i32 orc.b intrinsic when Zbp is enabled.
The W version of orc.b does not exist in Zbp so we need to use
gorci encoding. If we have Zbp, we can use gorciw which can avoid a
sext.w in some cases.
2021-04-04 17:14:28 -07:00
Roman Lebedev 2760a808b9
[InstCombine] dropRedundantMaskingOfLeftShiftInput(): check that adding shift amounts doesn't overflow (PR49778)
This is identical to 781d077afb,
but for the other function.

For certain shift amount bit widths, we must first ensure that adding
shift amounts is safe, that the sum won't have an unsigned overflow.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49778
2021-04-04 23:26:41 +03:00
Roman Lebedev dceb3e5996
[NFC][InstCombine] Extract canTryToConstantAddTwoShiftAmounts() as helper 2021-04-04 23:26:41 +03:00
Craig Topper 98d5db3e3a [RISCV] Lower orc.b intrinsic to RISCVISD::GORCI.
This will allow us to share any future known bits, demaned bits,
or sign bits improvements.
2021-04-04 12:31:41 -07:00
Sanjay Patel c0645f1324 [InstCombine] fold popcount of exactly one bit to shift
This is discussed in https://llvm.org/PR48999 ,
but it does not solve that request.

The difference in the vector test shows that some
other logic transform is limited to scalar types.
2021-04-04 11:43:49 -04:00
Nikita Popov 9bad7de9a3 [SimplifyCFG] Handle two equal cases in switch to select
When converting a switch with two cases and a default into a
select, also handle the denegerate case where two cases have the
same value.

Generate this case directly as

  %or = or i1 %cmp1, %cmp2
  %res = select i1 %or, i32 %val, i32 %default

rather than

  %sel1 = select i1 %cmp1, i32 %val, i32 %default
  %res = select i1 %cmp2, i32 %val, i32 %sel1

as InstCombine is going to canonicalize to the former anyway.
2021-04-04 17:27:28 +02:00
Nikita Popov 72e0846ef8 [LVI] Don't bail on overdefined value in select
Even if one of the operands is overdefined, we may still produce
a non-overdefined result, e.g. due to a min/max operation. This
matches our handling elsewhere, e.g. for binary operators.

The slot poisoning comment refers to a much older LVI cache
implementation.
2021-04-04 11:11:01 +02:00
Craig Topper a2ea003fcb [RISCV] Don't convert fshr/fshl to target specific FSL/FSR node if shift amount is a constant.
As long as it's a constant we can directly pattern match it
without any problems. It's only when it isn't a constant that
we need to add an AND.

In theory this should allow more target independent optimizations
to remain active.
2021-04-03 23:13:30 -07:00
Juneyoung Lee 5207cde5cb [InstCombine] Conditionally fold select i1 into and/or
This patch fixes llvm.org/pr49688 by conditionally folding select i1 into and/or:

```
select cond, cond2, false
->
and cond, cond2
```

This is not safe if cond2 is poison whereas cond isn’t.

Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s.
To minimize its impact, this patch conservatively checks whether cond2 is an instruction that
creates a poison or its operand creates a poison.
This approach is similar to what InstSimplify's SimplifyWithOpReplaced is doing.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99674
2021-04-04 14:11:28 +09:00
Mircea Trofin b32e76c6d5 [mlgo] fix build rules
This was prompted by D95727, which had the side-effect to break the
'release' mode build bot for ML-driven policies. The problem is that now
the pre-compiled object files don't get transitively carried through as
'source' anymore; that being said, the previous way of consuming them
was problematic, because it was only working for static builds; in
dynamic builds, the whole tf_xla_runtime was linked, which is
undesirable.

The alternative is to treat tf_xla_runtime as an archive, which then
leads to the desired effect.

Differential Revision: https://reviews.llvm.org/D99829
2021-04-03 12:49:03 -07:00
Roman Lebedev 7727cc242d
[NFC][X86] Split VPMOV* AVX2 instructions into their own sched class
At least on all three Zen's, all such instructions cleanly map
into this new class with no overrides needed.
2021-04-03 22:39:07 +03:00
Nikita Popov 665065821e [FastISel] Remove kill tracking
This is a followup to D98145: As far as I know, tracking of kill
flags in FastISel is just a compile-time optimization. However,
I'm not actually seeing any compile-time regression when removing
the tracking. This probably used to be more important in the past,
before FastRA was switched to allocate instructions in reverse
order, which means that it discovers kills as a matter of course.

As such, the kill tracking doesn't really seem to serve a purpose
anymore, and just adds additional complexity and potential for
errors. This patch removes it entirely. The primary changes are
dropping the hasTrivialKill() method and removing the kill
arguments from the emitFast methods. The rest is mechanical fixup.

Differential Revision: https://reviews.llvm.org/D98294
2021-04-03 15:50:13 +02:00
Simon Pilgrim 89afec348d [X86] Fold xor(truncate(xor(x,c1)),c2) -> xor(truncate(x),xor(truncate(c1),c2))
Fixes PR47603

This should probably be transferable to DAGCombine - the main limitation with the existing trunc(logicop) DAG fold is we don't know if legalization has tried to promote truncated logicops already. We might be able to peek through extensions as well.
2021-04-03 12:43:05 +01:00
Simon Pilgrim 7c17f1ea84 [X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper (REAPPLIED)
Use the getTargetShuffleInputs helper for all shuffle decoding

Reapplied (after reversion in rGfa0aff6d6960) with fix+test for subvector splitting - we weren't accounting for peeking through bitcasts changing the vector element count of the shuffle sources.
2021-04-03 11:59:19 +01:00
Bjorn Pettersson d66f9c4f1e Fix build rules for LLVM_WITH_Z3 after D95727
Started to see build errors like this

../lib/Support/Z3Solver.cpp:19:10: fatal error: 'z3.h' file not found
#include <z3.h>
         ^~~~~~
1 error generated.

after commit 43ceb74eb1.

The -isystem path to the Z3_INCLUDE_DIR wen't missing in the compile
commands. No idea why target_include_directories stopped working with
that commit, but using include_directories seem to work better.
2021-04-03 12:25:37 +02:00
Nikita Popov b552e16b0b [Loads] Forward constant vector store to load of first element
InstCombine performs simple forwarding from stores to loads, but
currently only handles the case where the load and store have the
same size. This extends it to also handle a store of a constant
with a larger size followed by a load with a smaller size.

This is implemented through ConstantFoldLoadThroughBitcast() which
is fairly primitive (e.g. does not allow storing a large integer
and then loading a small one), but at least can forward the first
element of a vector store. Unfortunately it seems that we currently
don't have a generic helper for "read a constant value as a different
type", it's all tangled up with other logic in either
ConstantFolding or VNCoercion.

Differential Revision: https://reviews.llvm.org/D98114
2021-04-03 12:10:31 +02:00
Nikita Popov 9d20eaf9c0 [BasicAA] Don't store AATags in cache key (NFC)
The AAMDNodes part of the MemoryLocation is not used by the BasicAA
cache, so don't store it. This reduces the size of each cache entry
from 112 bytes to 48 bytes.
2021-04-03 11:32:01 +02:00
Nikita Popov 17b4e5d456 [BasicAA] Don't pass through AA metadata (NFCI)
BasicAA itself doesn't make use of AA metadata, but passes it
through to recursive queries and makes it part of the cache key.
Aliasing decisions that are based on AA metadata (i.e. TBAA and
ScopedAA) are based *only* on AA metadata, so checking them with
different pointer values or sizes is not useful, the result will
always be the same.

While this change is a mild compile-time improvement by itself,
the actual goal here is to reduce the size of AA cache keys in
a followup change.

Differential Revision: https://reviews.llvm.org/D90098
2021-04-03 11:21:50 +02:00
Simon Pilgrim 4ea5475a3f [KnownBits] Add KnownBits::haveNoCommonBitsSet helper. NFCI.
Include exhaustive test coverage.
2021-04-02 21:44:33 +01:00
Eric Astor 0499a9d688 [ms] [llvm-ml] Accept /WX to signal that warnings should be fatal.
Define -fatal-warnings to make warnings fatal, and accept /WX as an ML.EXE compatible alias for it.

Also make sure that if Warning() returns true, we always treat it as an error.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92504
2021-04-02 15:13:20 -04:00
Levy Hsu f78d932cf2 [RISCV] Add IR intrinsics for Zbc extension
Head files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
clmul
clmulh
clmulr

Differential Revision: https://reviews.llvm.org/D99711
2021-04-02 12:09:13 -07:00
Levy Hsu 944adbf285 Recommit "[RISCV] Add IR intrinsic for Zbb extension"
Forgot to amend the Author.

Original commit message:

Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b

Differential Revision: https://reviews.llvm.org/D99320
2021-04-02 11:50:19 -07:00
Craig Topper 1f0b309f24 Revert "[RISCV] Add IR intrinsic for Zbb extension"
This reverts commit 1808194590.

I forgot to change the author.
2021-04-02 11:47:02 -07:00
Cyndy Ishida 3a223cd4f3 [TextAPI] run clang-format on violating sections, NFC 2021-04-02 11:44:33 -07:00
Craig Topper 1808194590 [RISCV] Add IR intrinsic for Zbb extension
Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b
2021-04-02 11:23:57 -07:00
Fangrui Song 8e5f3d04f2 [SLPVectorizer] Fix divide-by-zero after D99719
Will add a test case later.
2021-04-02 11:13:51 -07:00
Eric Astor 15ec0ad77a [ms] [llvm-ml] Fix case-sensitivity for variables and textmacros
Make variables and text-macro references case-insensitive, to match ml.exe.

Also improve error handling for text-macro expansion.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92503
2021-04-02 14:08:02 -04:00
Levy Hsu b001d574d7 [RISCV] Add IR intrinsic for Zbr extension
Implementation for RISC-V Zbr extension intrinsic.

Header files are included in separate patch in case the name needs to be changed

RV32 / 64:
        crc32b
        crc32h
        crc32w
        crc32cb
        crc32ch
        crc32cw

RV64 Only:
        crc32d
        crc32cd

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99009
2021-04-02 10:58:45 -07:00
Craig Topper d7ffa82a8e [RISCV] Improve 64-bit integer constant materialization for more cases.
For positive constants we try shifting left to remove leading zeros
and fill the bottom bits with 1s. We then materialize that constant
shift it right.

This patch adds a new strategy to try filling the bottom bits with
zeros instead. This catches some additional cases.
2021-04-02 10:18:08 -07:00
Sanjay Patel 412fc74140 [InstCombine] fold not+or+neg
~((-X) | Y) --> (X - 1) & (~Y)

We generally prefer 'add' over 'sub', this reduces the
dependency chain, and this looks better for codegen on
x86, ARM, and AArch64 targets.

https://llvm.org/PR45755

https://alive2.llvm.org/ce/z/cxZDSp
2021-04-02 13:16:36 -04:00
Dimitry Andric 6abb92f210 [SCCP] Avoid modifying AdditionalUsers while iterating over it
When run under valgrind, or with a malloc that poisons freed memory,
this can lead to segfaults or other problems.

To avoid modifying the AdditionalUsers DenseMap while still iterating,
save the instructions to be notified in a separate SmallPtrSet, and use
this to later call OperandChangedState on each instruction.

Fixes PR49582.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D98602
2021-04-02 19:05:59 +02:00
Florian Hahn 8867fc69f0 [LV] Hoist mapping of IR operands to VPValues (NFC).
This patch moves mapping of IR operands to VPValues out of
tryToCreateWidenRecipe. This allows using existing VPValue operands when
widening recipes directly, which will be introduced in future patches.
2021-04-02 17:57:20 +01:00
Philip Reames 2c4548e18e [rs4gc] Use loops instead of straightline code for attribute stripping [nfc]
Mostly because I'm about to add more attributes and the straightline copies get much uglier.  What's currently there isn't too bad.
2021-04-02 09:25:15 -07:00
Philip Reames a505801e2b [rs4gc] Strip nofree and nosync attributes when lowering from abstract model
The safepoints being inserted exists to free memory, or coordinate with another thread to do so.  Thus, we must strip any inferred attributes and reinfer them after the lowering.

I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.
2021-04-02 09:12:24 -07:00
Brendon Cahoon 09a88278cb [GlobalISel] Allow different types for G_SBFX and G_UBFX operands
Change the definition of G_SBFX and G_UBFX so that the lsb and width
can have different types than the src and dst operands.

Differential Revision: https://reviews.llvm.org/D99739
2021-04-02 11:11:06 -04:00
Nikita Popov 4a3e006830 [LVI] Use range metadata on intrinsics
If we don't know how to handle an intrinsic, we should still
make use of normal call range metadata.
2021-04-02 16:45:31 +02:00
Alexey Bataev 5fcb07a070 [SLP]Fix a bug in min/max reduction, number of condition uses.
The ultimate reduction node may have multiple uses, but if the ultimate
reduction is min/max reduction and based on SelectInstruction, the
condition of this select instruction must have only single use.

Differential Revision: https://reviews.llvm.org/D99753
2021-04-02 07:09:44 -07:00
Nico Weber fa0aff6d69 Revert "[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper"
This reverts commit 500969f1d0.
Makes clang assert compiling avx2 code, see
https://bugs.chromium.org/p/chromium/issues/detail?id=1195353#c4
for a standalone repro.
2021-04-02 09:55:55 -04:00
Jun Ma 274ac9d40e [AArch64][SVE] Lowering sve.dot to DOT node
Differential Revision: https://reviews.llvm.org/D99699
2021-04-02 20:05:17 +08:00
Jun Ma ab3c5fb282 [NFC][SVE] Use SVE_4_Op_Imm_Pat for sve_intx_dot_by_indexed_elem 2021-04-02 20:05:17 +08:00
Jeroen Dobbelaere b82b305cf9 [InstCombine] Fix out-of-bounds ashr(shl) optimization
This fixes a crash found by the oss fuzzer and reported by @fhahn.
The suggestion of @RKSimon seems to be the correct fix here. (See D91343).

The oss fuzz report can be found here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32759

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D99792
2021-04-02 13:45:11 +02:00
Simon Pilgrim 500969f1d0 [X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper
Use the getTargetShuffleInputs helper for all shuffle decoding
2021-04-02 11:50:18 +01:00
Sander de Smalen 0f7bbbc481 Always emit error for wrong interfaces to scalable vectors, unless cmdline flag is passed.
In order to bring up scalable vector support in LLVM incrementally,
we introduced behaviour to emit a warning, instead of an error, when
asking the wrong question of a scalable vector, like asking for the
fixed number of elements.

This patch puts that behaviour under a flag. The default behaviour is
that the compiler will always error, which means that all LLVM unit
tests and regression tests will now fail when a code-path is taken that
still uses the wrong interface.

The behaviour to demote an error to a warning can be individually enabled
for tools that want to support experimental use of scalable vectors.
This patch enables that behaviour when driving compilation from Clang.
This means that for users who want to try out scalable-vector support,
fixed-width codegen support, or build user-code with scalable vector
intrinsics, Clang will not crash and burn when the compiler encounters
such a case.

This allows us to do away with the following pattern in many of the SVE tests:
  RUN: .... 2>%t
  RUN: cat %t | FileCheck --check-prefix=WARN
  WARN-NOT: warning: ...

The behaviour to emit warnings is only temporary and we expect this flag
to be removed in the future when scalable vector support is more stable.

This patch also has fixes the following tests:
 unittests:
   ScalableVectorMVTsTest.SizeQueries
   SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects
   AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG

 regression tests:
   Transforms/InstCombine/vscale_gep.ll

Reviewed By: paulwalker-arm, ctetreau

Differential Revision: https://reviews.llvm.org/D98856
2021-04-02 10:55:22 +01:00
Florian Hahn 0f3230390b
[SLP] Better estimate cost of no-op extracts on target vectors.
The motivation for this patch is to better estimate the cost of
extracelement instructions in cases were they are going to be free,
because the source vector can be used directly.

A simple example is

    %v1.lane.0 = extractelement <2 x double> %v.1, i32 0
    %v1.lane.1 = extractelement <2 x double> %v.1, i32 1

    %a.lane.0 = fmul double %v1.lane.0, %x
    %a.lane.1 = fmul double %v1.lane.1, %y

Currently we only consider the extracts free, if there are no other
users.

In this particular case, on AArch64 which can fit <2 x double> in a
vector register, the extracts should be free, independently of other
users, because the source vector of the extracts will be in a vector
register directly, so it should be free to use the vector directly.

The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on
certain AArch64 CPUs.

It looks like this does not impact any code in
SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto.

This originally regressed after D80773, so if there's a better
alternative to explore, I'd be more than happy to do that.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D99719
2021-04-02 10:40:12 +01:00
Fraser Cormack 3b48d849d4 [RISCV] Optimize more redundant VSETVLIs
D99717 introduced some test cases which showed that the output of one
vsetvli into another would not be picked up by the RISCVCleanupVSETVLI
pass. This patch teaches the optimization about such a pattern. The
pattern is quite common when using the RVV vsetvli intrinsic to pass the
VL onto other intrinsics.

The second test case introduced by D99717 is left unoptimized by this
patch. It is a rarer case and will require us to rewire any uses of the
redundant vset[i]vli's output to the previous one's.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99730
2021-04-02 10:04:07 +01:00
Evgeniy Brevnov 2388aae401 [NARY-REASSOCIATE] Support reassociation of min/max
Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.

Reviewed By: mkazantsev, lebedev.ri

Differential Revision: https://reviews.llvm.org/D88287
2021-04-02 15:30:13 +07:00
Roman Lebedev a26f1bf67e
[PassManager] Run additional LICM before LoopRotate
Loop rotation often has to perform code duplication
from header into preheader, which introduces PHI nodes.

>>! In D99204, @thopre wrote:
>
> With loop peeling, it is important that unnecessary PHIs be avoided or
> it will leads to spurious peeling. One source of such PHIs is loop
> rotation which creates PHIs for invariant loads. Those PHIs are
> particularly problematic since loop peeling is now run as part of simple
> loop unrolling before GVN is run, and are thus a source of spurious
> peeling.
>
> Note that while some of the load can be hoisted and eventually
> eliminated by instruction combine, this is not always possible due to
> alignment issue. In particular, the motivating example [1] was a load
> inside a class instance which cannot be hoisted because the `this'
> pointer has an alignment of 1.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp

Now, we could enhance LoopRotate to avoid duplicating code when not needed,
but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*)
We have LICM, and in fact we already run it right after LoopRotation.

We could try to move it to before LoopRotation,
that is basically free from compile-time perspective:
https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions
But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular:
| statistic name                                   | LoopRotate-LICM | LICM-LoopRotate |     Δ |       % | abs(%) |
| asm-printer.EmittedInsts                         | 9015930         | 9015799         |  -131 |   0.00% |  0.00% |
| indvars.NumElimCmp                               | 3536            | 3544            |     8 |   0.23% |  0.23% |
| indvars.NumElimExt                               | 36725           | 36580           |  -145 |  -0.39% |  0.39% |
| indvars.NumElimIV                                | 1197            | 1187            |   -10 |  -0.84% |  0.84% |
| indvars.NumElimIdentity                          | 143             | 136             |    -7 |  -4.90% |  4.90% |
| indvars.NumElimRem                               | 4               | 5               |     1 |  25.00% | 25.00% |
| indvars.NumLFTR                                  | 29842           | 29890           |    48 |   0.16% |  0.16% |
| indvars.NumReplaced                              | 2293            | 2227            |   -66 |  -2.88% |  2.88% |
| indvars.NumSimplifiedSDiv                        | 6               | 8               |     2 |  33.33% | 33.33% |
| indvars.NumWidened                               | 26438           | 26329           |  -109 |  -0.41% |  0.41% |
| instcount.TotalBlocks                            | 1178338         | 1173840         | -4498 |  -0.38% |  0.38% |
| instcount.TotalFuncs                             | 111825          | 111829          |     4 |   0.00% |  0.00% |
| instcount.TotalInsts                             | 9905442         | 9896139         | -9303 |  -0.09% |  0.09% |
| lcssa.NumLCSSA                                   | 425871          | 423961          | -1910 |  -0.45% |  0.45% |
| licm.NumHoisted                                  | 378357          | 378753          |   396 |   0.10% |  0.10% |
| licm.NumMovedCalls                               | 2193            | 2208            |    15 |   0.68% |  0.68% |
| licm.NumMovedLoads                               | 35899           | 31821           | -4078 | -11.36% | 11.36% |
| licm.NumPromoted                                 | 11178           | 11154           |   -24 |  -0.21% |  0.21% |
| licm.NumSunk                                     | 13359           | 13587           |   228 |   1.71% |  1.71% |
| loop-delete.NumDeleted                           | 8547            | 8402            |  -145 |  -1.70% |  1.70% |
| loop-instsimplify.NumSimplified                  | 12876           | 11890           |  -986 |  -7.66% |  7.66% |
| loop-peel.NumPeeled                              | 1008            | 925             |   -83 |  -8.23% |  8.23% |
| loop-rotate.NumNotRotatedDueToHeaderSize         | 368             | 365             |    -3 |  -0.82% |  0.82% |
| loop-rotate.NumRotated                           | 42015           | 42003           |   -12 |  -0.03% |  0.03% |
| loop-simplifycfg.NumLoopBlocksDeleted            | 240             | 242             |     2 |   0.83% |  0.83% |
| loop-simplifycfg.NumLoopExitsDeleted             | 497             | 20              |  -477 | -95.98% | 95.98% |
| loop-simplifycfg.NumTerminatorsFolded            | 618             | 336             |  -282 | -45.63% | 45.63% |
| loop-unroll.NumCompletelyUnrolled                | 11028           | 11032           |     4 |   0.04% |  0.04% |
| loop-unroll.NumUnrolled                          | 12608           | 12529           |   -79 |  -0.63% |  0.63% |
| mem2reg.NumDeadAlloca                            | 10222           | 10221           |    -1 |  -0.01% |  0.01% |
| mem2reg.NumPHIInsert                             | 192110          | 192106          |    -4 |   0.00% |  0.00% |
| mem2reg.NumSingleStore                           | 637650          | 637643          |    -7 |   0.00% |  0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 814             | 812             |    -2 |  -0.25% |  0.25% |
| scalar-evolution.NumTripCountsComputed           | 283108          | 282934          |  -174 |  -0.06% |  0.06% |
| scalar-evolution.NumTripCountsNotComputed        | 106712          | 106718          |     6 |   0.01% |  0.01% |
| simple-loop-unswitch.NumBranches                 | 5178            | 4752            |  -426 |  -8.23% |  8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped    | 914             | 503             |  -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches                 | 20              | 18              |    -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial                  | 183             | 95              |   -88 | -48.09% | 48.09% |

... but that actually regresses LICM (-12% `licm.NumMovedLoads`),
loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`),
simple-loop-unswitch (`NumTrivial`).

What if we instead have LICM both before and after LoopRotate?
| statistic name                                | LoopRotate-LICM | LICM-LoopRotate-LICM |     Δ |       % | abs(%) |
| asm-printer.EmittedInsts                      | 9015930         | 9014474              | -1456 |  -0.02% |  0.02% |
| indvars.NumElimCmp                            | 3536            | 3546                 |    10 |   0.28% |  0.28% |
| indvars.NumElimExt                            | 36725           | 36681                |   -44 |  -0.12% |  0.12% |
| indvars.NumElimIV                             | 1197            | 1185                 |   -12 |  -1.00% |  1.00% |
| indvars.NumElimIdentity                       | 143             | 146                  |     3 |   2.10% |  2.10% |
| indvars.NumElimRem                            | 4               | 5                    |     1 |  25.00% | 25.00% |
| indvars.NumLFTR                               | 29842           | 29899                |    57 |   0.19% |  0.19% |
| indvars.NumReplaced                           | 2293            | 2299                 |     6 |   0.26% |  0.26% |
| indvars.NumSimplifiedSDiv                     | 6               | 8                    |     2 |  33.33% | 33.33% |
| indvars.NumWidened                            | 26438           | 26404                |   -34 |  -0.13% |  0.13% |
| instcount.TotalBlocks                         | 1178338         | 1173652              | -4686 |  -0.40% |  0.40% |
| instcount.TotalFuncs                          | 111825          | 111829               |     4 |   0.00% |  0.00% |
| instcount.TotalInsts                          | 9905442         | 9895452              | -9990 |  -0.10% |  0.10% |
| lcssa.NumLCSSA                                | 425871          | 425373               |  -498 |  -0.12% |  0.12% |
| licm.NumHoisted                               | 378357          | 383352               |  4995 |   1.32% |  1.32% |
| licm.NumMovedCalls                            | 2193            | 2204                 |    11 |   0.50% |  0.50% |
| licm.NumMovedLoads                            | 35899           | 35755                |  -144 |  -0.40% |  0.40% |
| licm.NumPromoted                              | 11178           | 11163                |   -15 |  -0.13% |  0.13% |
| licm.NumSunk                                  | 13359           | 14321                |   962 |   7.20% |  7.20% |
| loop-delete.NumDeleted                        | 8547            | 8538                 |    -9 |  -0.11% |  0.11% |
| loop-instsimplify.NumSimplified               | 12876           | 12041                |  -835 |  -6.48% |  6.48% |
| loop-peel.NumPeeled                           | 1008            | 924                  |   -84 |  -8.33% |  8.33% |
| loop-rotate.NumNotRotatedDueToHeaderSize      | 368             | 365                  |    -3 |  -0.82% |  0.82% |
| loop-rotate.NumRotated                        | 42015           | 42005                |   -10 |  -0.02% |  0.02% |
| loop-simplifycfg.NumLoopBlocksDeleted         | 240             | 241                  |     1 |   0.42% |  0.42% |
| loop-simplifycfg.NumTerminatorsFolded         | 618             | 619                  |     1 |   0.16% |  0.16% |
| loop-unroll.NumCompletelyUnrolled             | 11028           | 11029                |     1 |   0.01% |  0.01% |
| loop-unroll.NumUnrolled                       | 12608           | 12525                |   -83 |  -0.66% |  0.66% |
| mem2reg.NumPHIInsert                          | 192110          | 192073               |   -37 |  -0.02% |  0.02% |
| mem2reg.NumSingleStore                        | 637650          | 637652               |     2 |   0.00% |  0.00% |
| scalar-evolution.NumTripCountsComputed        | 283108          | 282998               |  -110 |  -0.04% |  0.04% |
| scalar-evolution.NumTripCountsNotComputed     | 106712          | 106691               |   -21 |  -0.02% |  0.02% |
| simple-loop-unswitch.NumBranches              | 5178            | 5185                 |     7 |   0.14% |  0.14% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914             | 925                  |    11 |   1.20% |  1.20% |
| simple-loop-unswitch.NumTrivial               | 183             | 179                  |    -4 |  -2.19% |  2.19% |
| simple-loop-unswitch.NumBranches              | 5178            | 4752                 |  -426 |  -8.23% |  8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914             | 503                  |  -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches              | 20              | 18                   |    -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial               | 183             | 95                   |   -88 | -48.09% | 48.09% |

I.e. we end up with less instructions, less peeling, more LICM activity,
also note how none of those 4 regressions are here. Namely:

| statistic name                                   | LICM-LoopRotate | LICM-LoopRotate-LICM |     Δ |        % |   abs(%) |
| asm-printer.EmittedInsts                         | 9015799         | 9014474              | -1325 |   -0.01% |    0.01% |
| indvars.NumElimCmp                               | 3544            | 3546                 |     2 |    0.06% |    0.06% |
| indvars.NumElimExt                               | 36580           | 36681                |   101 |    0.28% |    0.28% |
| indvars.NumElimIV                                | 1187            | 1185                 |    -2 |   -0.17% |    0.17% |
| indvars.NumElimIdentity                          | 136             | 146                  |    10 |    7.35% |    7.35% |
| indvars.NumLFTR                                  | 29890           | 29899                |     9 |    0.03% |    0.03% |
| indvars.NumReplaced                              | 2227            | 2299                 |    72 |    3.23% |    3.23% |
| indvars.NumWidened                               | 26329           | 26404                |    75 |    0.28% |    0.28% |
| instcount.TotalBlocks                            | 1173840         | 1173652              |  -188 |   -0.02% |    0.02% |
| instcount.TotalInsts                             | 9896139         | 9895452              |  -687 |   -0.01% |    0.01% |
| lcssa.NumLCSSA                                   | 423961          | 425373               |  1412 |    0.33% |    0.33% |
| licm.NumHoisted                                  | 378753          | 383352               |  4599 |    1.21% |    1.21% |
| licm.NumMovedCalls                               | 2208            | 2204                 |    -4 |   -0.18% |    0.18% |
| licm.NumMovedLoads                               | 31821           | 35755                |  3934 |   12.36% |   12.36% |
| licm.NumPromoted                                 | 11154           | 11163                |     9 |    0.08% |    0.08% |
| licm.NumSunk                                     | 13587           | 14321                |   734 |    5.40% |    5.40% |
| loop-delete.NumDeleted                           | 8402            | 8538                 |   136 |    1.62% |    1.62% |
| loop-instsimplify.NumSimplified                  | 11890           | 12041                |   151 |    1.27% |    1.27% |
| loop-peel.NumPeeled                              | 925             | 924                  |    -1 |   -0.11% |    0.11% |
| loop-rotate.NumRotated                           | 42003           | 42005                |     2 |    0.00% |    0.00% |
| loop-simplifycfg.NumLoopBlocksDeleted            | 242             | 241                  |    -1 |   -0.41% |    0.41% |
| loop-simplifycfg.NumLoopExitsDeleted             | 20              | 497                  |   477 | 2385.00% | 2385.00% |
| loop-simplifycfg.NumTerminatorsFolded            | 336             | 619                  |   283 |   84.23% |   84.23% |
| loop-unroll.NumCompletelyUnrolled                | 11032           | 11029                |    -3 |   -0.03% |    0.03% |
| loop-unroll.NumUnrolled                          | 12529           | 12525                |    -4 |   -0.03% |    0.03% |
| mem2reg.NumDeadAlloca                            | 10221           | 10222                |     1 |    0.01% |    0.01% |
| mem2reg.NumPHIInsert                             | 192106          | 192073               |   -33 |   -0.02% |    0.02% |
| mem2reg.NumSingleStore                           | 637643          | 637652               |     9 |    0.00% |    0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 812             | 814                  |     2 |    0.25% |    0.25% |
| scalar-evolution.NumTripCountsComputed           | 282934          | 282998               |    64 |    0.02% |    0.02% |
| scalar-evolution.NumTripCountsNotComputed        | 106718          | 106691               |   -27 |   -0.03% |    0.03% |
| simple-loop-unswitch.NumBranches                 | 4752            | 5185                 |   433 |    9.11% |    9.11% |
| simple-loop-unswitch.NumCostMultiplierSkipped    | 503             | 925                  |   422 |   83.90% |   83.90% |
| simple-loop-unswitch.NumSwitches                 | 18              | 20                   |     2 |   11.11% |   11.11% |
| simple-loop-unswitch.NumTrivial                  | 95              | 179                  |    84 |   88.42% |   88.42% |

{F15983613} {F15983615} {F15983616}
(this is vanilla llvm testsuite + rawspeed + darktable)

As an example of the code where early LICM only is bad, see:
https://godbolt.org/z/GzEbacs4K

This does have an observable compile-time regression of +~0.5% geomean
https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions
but i think that's basically nothing, and there's potential that it might
be avoidable in the future by fixing clang to produce alignment information
on function arguments, thus making the second run unneeded.

Differential Revision: https://reviews.llvm.org/D99249
2021-04-02 11:11:42 +03:00
Wenlei He c5605857bb [CSSPGO] Skip dangling probe value when computing profile summary
Recently we switched to use InvalidProbeCount = UINT64_MAX (instead of 0) to represent dangling probe, but UINT64_MAX is not excluded when computing profile summary. This caused profile summary to produce incorrect hot/cold threshold. The change fixed it by excluding UINT64_MAX from summary builder.

Differential Revision: https://reviews.llvm.org/D99788
2021-04-01 22:49:11 -07:00
Juneyoung Lee c664769330 [AssumeBundles] offset should be added to correctly calculate align
This is a patch to fix the bug in alignment calculation (see https://reviews.llvm.org/D90529#2619492).

Consider this code:

```
call void @llvm.assume(i1 true) ["align"(i32* %a, i32 32, i32 28)]
%arrayidx = getelementptr inbounds i32, i32* %a, i64 -1
; aligment of %arrayidx?
```

The llvm.assume guarantees that `%a - 28` is 32-bytes aligned, meaning that `%a` is 32k + 28 for some k.
Therefore `a - 4` cannot be 32-bytes aligned but the existing code was calculating the pointer as 32-bytes aligned.

The reason why this happened is as follows.
`DiffSCEV` stores `%arrayidx - %a` which is -4.
`OffSCEV` stores the offset value of “align”, which is 28.
`DiffSCEV` + `OffSCEV` = 24 should be used for `a - 4`'s offset from 32k, but `DiffSCEV` - `OffSCEV` = 32 was being used instead.

Reviewed By: Tyker

Differential Revision: https://reviews.llvm.org/D98759
2021-04-02 12:32:05 +09:00
Yang Fan bc6001ce1e
[X86] Fix -Wunused-function warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:9212:13: warning: ‘bool isHorizOp(unsigned int)’ defined but not used [-Wunused-function]
 9212 | static bool isHorizOp(unsigned Opcode) {
      |             ^~~~~~~~~
```
2021-04-02 09:38:12 +08:00
Philip Reames 91790c6785 [indvars[ Fix pr49802 by checking for SCEVCouldNotCompute
The code is assuming that having an exact exit count for the loop implies that exit counts for every exit are known.  This used to be true, but when we added handling for dead exits we broke this invariant.  The new invariant is that an exact loop count implies that any exits non trivially dead have exit counts.

We could have fixed this by either a) explicitly checking for a dead exit, or b) just testing for SCEVCouldNotCompute.  I chose the second as it was simpler.

(Debugging this took longer than it should have since I'd mistyped the original assert and it wasn't checking what it was meant to...)

p.s. Sorry for the lack of test case.  Getting things into a state to actually hit this is difficult and fragile.  The original repro involves loop-deletion leaving SCEV in a slightly inprecise state which lets us bypass other transforms in IndVarSimplify on the way to this one.  All of my attempts to separate it into a standalone test failed.
2021-04-01 17:53:44 -07:00
Philip Reames b23a314146 [funcattrs] Respect nofree attribute on callsites (not just callee) 2021-04-01 14:45:49 -07:00
Craig Topper 766d27dc85 [RISCV] Add isel patterns to handle vrsub intrinsic with 2 vector operands.
This occurs when we type legalize an i64 scalar input on RV32. We
need to manually splat, which requires a vector input. Rather
than special case this in lowering just pattern match it.
2021-04-01 14:10:21 -07:00
David Green da98177cda [ARM] Allow v6m runtime loop unrolling
This removes the restriction that only Thumb2 targets enable runtime
loop unrolling, allowing it for Thumb1 only cores as well. The existing
T2 heuristics are used (for the time being) to control when and how
unrolling is performed.

Differential Revision: https://reviews.llvm.org/D99588
2021-04-01 21:21:40 +01:00
Craig Topper dbbc95e3e5 [RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat.
The default legalization strategy is PromoteFloat which keeps
half in single precision format through multiple floating point
operations. Conversion to/from float is done at loads, stores,
bitcasts, and other places that care about the exact size being 16
bits.

This patches switches to the alternative method softPromoteHalf.
This aims to keep the type in 16-bit format between every operation.
So we promote to float and immediately round for any arithmetic
operation. This should be closer to the IR semantics since we
are rounding after each operation and not accumulating extra
precision across multiple operations. X86 is the only other
target that enables this today. See https://reviews.llvm.org/D73749

I had to update getRegisterTypeForCallingConv to force f16 to
use f32 when the F extension is enabled. This way we can still
pass it in the lower bits of an FPR for ilp32f and lp64f ABIs.
The softPromoteHalf would otherwise always give i16 as the
argument type.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D99148
2021-04-01 12:41:57 -07:00
Philip Reames 1e69a5af92 [Attributor] Cleanup detection of non-relaxed atomics in nosync inference
The code was checking for cases which are disallowed by the verifier.  Delete dead code and adjust style.
2021-04-01 12:01:29 -07:00
Philip Reames 8e596f7e27 [Attributor] Cleanup intrinsic handling in nosync inference [mostly NFC]
Mostly stylistic adjustment, but the old code didn't handle the memcpy.inline intrinsic.  By using the matcher class, we now do.
2021-04-01 11:49:59 -07:00
Philip Reames 6ef4505298 [funcattrs] Infer nosync from readnone and non-convergent
This implements the most basic possible nosync inference. The choice of inference rule is taken from the comments in attributor and the discussion on the review of the change which introduced the nosync attribute (0626367202).

This is deliberately minimal. As noted in code comments, I do plan to add a more robust inference which actually scans the function IR directly, but a) I need to do some refactoring of the attributor code to use common interfaces, and b) I wanted to get something in. I also wanted to minimize the "interesting" analysis discussion since that's time intensive.

Context: This combines with existing nofree attribute inference to help prove dereferenceability in the ongoing deref-at-point semantics work.

Differential Revision: https://reviews.llvm.org/D99749
2021-04-01 11:37:34 -07:00
Philip Reames db357891f0 Infer dereferenceability from malloc and friends
Hookup TLI when inferring object size from allocation calls. This allows the analysis to prove dereferenceability for known allocation functions (such as malloc/new/etc) in addition to those marked explicitly with the allocsize attribute.

This is a follow up to 0129cd5 now that the bug fixed by e2c6621e6 is resolved.

As noted in the test, this relies on being able to prove that there is no free between allocation and context (e.g. hoist location). At the moment, this is handled conservatively. I'm working strengthening out ability to reason about no-free regions separately.

Differential Revision: https://reviews.llvm.org/D99737
2021-04-01 11:33:35 -07:00
Martin Storsjö 4391d764e1 [ARM] Remove an unused parameter in ARMWinCOFFObjectWriter. NFC.
This writer only ever operates on 32 bit arm code.

Differential Revision: https://reviews.llvm.org/D99575
2021-04-01 21:25:41 +03:00
Philip Reames ffa15e9463 Extract isVolatile helper on Instruction [NFCI]
We have this logic duplicated in several cases, none of which were exhaustive.  Consolidate it in one place.

I don't believe this actually impacts behavior of the callers.  I think they all filter their inputs such that their partial implementations were correct.  If not, this might be fixing a cornercase bug.
2021-04-01 11:24:02 -07:00
Nick Desaulniers 52338af569 [MC][ARM] add .w suffixes for RSB/RSBS T1
See also:
F5.1.167 RSB, RSBS (register) T1 shift or rotate by value variant
of the Arm ARM.

Link: https://github.com/ClangBuiltLinux/linux/issues/1309

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D99542
2021-04-01 10:45:37 -07:00
Philip Reames 6b05d753e0 Mark unordered memset/memmove/memcpy as nosync
Mostly a means to remove a bit of code from attributor in advance of implementing a FuncAttr inference for nosync.
2021-04-01 10:38:54 -07:00
Craig Topper d157e3f387 [RISCV] Fix handling of nxvXi64 vmsgt(u).vx intrinsics on RV32.
We need to splat the scalar separately and use .vv, but there is
no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with
swapped operands.

We also need to get VT to use for the splat from an operand rather
than the result since the result VT is nxvXi1.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D99704
2021-04-01 10:38:05 -07:00
Nick Desaulniers 1addc231cd [MC][ARM] add .w suffixes for ORN/ORNS T1
See also:
F5.1.128 ORN, ORNS (register) T1 shift or rotate by value variant
of the Arm ARM.

Link: https://github.com/ClangBuiltLinux/linux/issues/1309

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D99538
2021-04-01 10:27:09 -07:00
Craig Topper b7c2e577cc [RISCV] Add custom type legalization to form MULHSU when possible.
There's no target independent ISD opcode for MULHSU, so custom
legalize 2*XLen multiplies ourselves. We have to be a little
careful to prefer MULHU or MULHSU.

I thought about doing this in isel by pattern matching the
(add (mul X, (srai Y, XLen-1)), (mulhu X, Y)) pattern. I decided
against this because the add might become part of a chain of adds.
I don't trust DAG combine not to reassociate with other adds making
it difficult to find both pieces again.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D99479
2021-04-01 10:15:55 -07:00
Jay Foad fdc4f19e2f [AMDGPU] Remove SIAddIMGInit pass which is now unused
Differential Revision: https://reviews.llvm.org/D99748
2021-04-01 18:13:17 +01:00
Jay Foad 3d07a6d891 [AMDGPU][GlobalISel] Add IMG init in selectImageIntrinsic
Doing this during instruction selection avoids the cost of running
SIAddIMGInit which is yet another pass over the MIR.

Differential Revision: https://reviews.llvm.org/D99670
2021-04-01 18:13:17 +01:00
Jay Foad 4af6251cea [AMDGPU][SDag] Add IMG init in AdjustInstrPostInstrSelection
Doing this in a post-isel hook avoids the cost of running SIAddIMGInit
which is yet another pass over the MIR.

Differential Revision: https://reviews.llvm.org/D99747
2021-04-01 18:13:17 +01:00
Craig Topper d61b40ed27 [RISCV] Improve 64-bit integer materialization for some cases.
This adds a new integer materialization strategy mainly targeted
at 64-bit constants like 0xffffffff where there are 32 or more trailing
ones with leading zeros. We can materialize these by using an addi -1
and srli to restore the leading zeros. This matches what gcc does.

I haven't limited to just these cases though. The implementation
here takes the constant, shifts out all the leading zeros and
shifts ones into the LSBs, creates the new sequence, adds an srli,
and checks if this is shorter than our original strategy.

I've separated the recursive portion into a standalone function
so I could append the new strategy outside of the recursion. Since
external users are no longer using the recursive function, I've
cleaned up the external interface to return the sequence instead of
taking a vector by reference.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D98821
2021-04-01 09:12:52 -07:00
Jay Foad b1fbfd9e4c [AMDGPU] Small cleanup to constructRetValue and its caller. NFC. 2021-04-01 16:36:16 +01:00
Philip Reames e2c6621e63 [deref-at-point] restrict inference of dereferenceability based on allocsize attribute
Support deriving dereferenceability facts from allocation sites with known object sizes while correctly accounting for any possibly frees between allocation and use site. (At the moment, we're conservative and only allowing it in functions where we know we can't free.)

This is part of the work on deref-at-point semantics. I'm making the change unconditional as the miscompile in this case is way too easy to trip by accident, and the optimization was only recently added (by me).

There will be a follow up patch wiring through TLI since that should now be doable without introducing widespread miscompiles.

Differential Revision: https://reviews.llvm.org/D95815
2021-04-01 08:34:40 -07:00
Mircea Trofin ce61def529 [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration
The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs()
needs to be called before iterating the interfering live ranges.

The rest of the patch offers support that is the case: instead of  clearing the query's
InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix
is invalidated (existing triggering mechanism).

Without the change in RegAllocGreedy.cpp, the compiler ices.

This patch should make it more easily discoverable by developers that
collectInterferringVregs needs to be called before iterating.

I will follow up with a subsequent patch to improve the usability and maintainability of Query.

Differential Revision: https://reviews.llvm.org/D98232
2021-04-01 08:33:28 -07:00
Anirudh Prasad 7b921a6747 [AsmParser][SystemZ][z/OS] Add in support to accept "#" as part of an Identifier token
- This patch adds in support to accept the "#" character as part of an Identifier.
- This support is needed especially for the HLASM dialect since "#" is treated as part of the valid "Alphabet" range
- The way this is done is by making use of the previous precedent set by the `AllowAtInIdentifier` field in `MCAsmLexer.h`. A new field called `AllowHashInIdentifier` is introduced.
- The static function `IsIdentifierChar` is also updated to accept the `#` character if the `AllowHashInIdentifier` field is set to true.
Note: The field introduced in `MCAsmLexer.h` could very well be moved to `MCAsmInfo.h`. I'm not opposed to it. I decided to put it in `MCAsmLexer` since there seems to be some sort of precedent already with `AllowAtInIdentifier`.

Reviewed By: abhina.sreeskantharajan, nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D99277
2021-04-01 11:24:43 -04:00
Bradley Smith 2f45e632c0 [AArch64][SVE] Improve codegen for select nodes with fixed types
Additionally, move the existing fixed vselect tests to *-vselect.ll.

Differential Revision: https://reviews.llvm.org/D99418
2021-04-01 15:54:37 +01:00