Commit Graph

145210 Commits

Author SHA1 Message Date
Bing1 Yu 113f077f80 [X86] Pass to transform tdpbf16ps intrinsics to scalar operation.
In previous patch https://reviews.llvm.org/D93594, we only scalarize tilezero, tileload, tilestore and tiledpbssd. In this patch we scalarize tdpbf16ps intrinsic.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D96110
2021-03-22 13:00:40 +08:00
Max Kazantsev 8fab9f824f [IndVars] Sharpen context in eliminateIVComparison
When eliminating comparisons, we can use common dominator of
all its users as context. This gives better results when ICMP is not
computed right before the branch that uses it.

Differential Revision: https://reviews.llvm.org/D98924
Reviewed By: lebedev.ri
2021-03-22 11:55:57 +07:00
Lang Hames fc36a511c6 [JITLink][ELF/x86-64] Add support for R_X86_64_GOTPC64 and R_X86_64_GOT64.
Start adding support for ELF x86-64 large code model, PIC relocations.
2021-03-21 21:52:54 -07:00
Lang Hames 0a74ec3299 [JITLink] Start laying the groundwork for ELF x86-64 large code model support.
Introduces DefineExternalSectionStartAndEndSymbols.h, which defines a template
for a JITLink pass that transforms external symbols meeting a user-supplied
predicate into defined symbols pointing at the start and end of a Section
identified by the predicate. JITLink.h is updated with a new makeAbsolute
function to support this pass.

Also renames BasicGOTAndStubsBuilder to PerGraphGOTAndPLTStubsBuilder -- the new
name better describes the intent of this GOT and PLT stubs builder, and will
help to distinguish it from future GOT and PLT stub builders that build entries
that may be shared between multiple graphs.
2021-03-21 20:56:47 -07:00
Lang Hames 209ceed745 [JITLink][ELF/x86-64] Add Delta32, NegDelta32, NegDelta64 support.
These were missing, but are used in eh-frame section support.
2021-03-21 20:15:40 -07:00
Roman Lebedev e3a4701627
[clang][CodeGen] Lower Likelihood attributes to @llvm.expect intrin instead of branch weights
08196e0b2e exposed LowerExpectIntrinsic's
internal implementation detail in the form of
LikelyBranchWeight/UnlikelyBranchWeight options to the outside.

While this isn't incorrect from the results viewpoint,
this is suboptimal from the layering viewpoint,
and causes confusion - should transforms also use those weights,
or should they use something else, D98898?

So go back to status quo by making LikelyBranchWeight/UnlikelyBranchWeight
internal again, and fixing all the code that used it directly,
which currently is only clang codegen, thankfully,
to emit proper @llvm.expect intrinsics instead.
2021-03-21 22:50:21 +03:00
Roman Lebedev 37d6be9052
Revert "[BranchProbability] move options for 'likely' and 'unlikely'"
Upon reviewing D98898 i've come to realization that these are
implementation detail of LowerExpectIntrinsicPass,
and they should not be exposed to outside of it.

This reverts commit ee8b53815d.
2021-03-21 22:50:21 +03:00
Craig Topper 30080b003e [DAGCombiner] Minor compile time improvement to (sext_in_reg (sign_extend_vector_inreg x)) optimization.
Don't bother calling ComputeNumSignBits if N00Bits < ExtVTBits. No
matter what answer we get back this will be true:
(N00Bits - DAG.ComputeNumSignBits(N00, DemandedSrcElts)) < ExtVTBits)

So we might as well save the computation. This makes the code more
consistent with the similar (sext_in_reg (sext x)) handling above.
2021-03-21 11:16:41 -07:00
Nikita Popov d11d5d1c5f [ValueTracking] Improve mul handling in isKnownNonEqual()
X != X * C is true if:
 * C is not 0 or 1
 * X is not 0
 * mul is nsw or nuw

Proof: https://alive2.llvm.org/ce/z/uwF29z

This is motivated by one of the cases in D98422.
2021-03-21 18:41:35 +01:00
Matt Arsenault 20a24af01d MIR: Fix missing serialization for HasTailCall 2021-03-21 13:14:04 -04:00
Matt Arsenault a0f5aad6d7 AMDGPU: Fix allowing immediates for tail call pseudo.
The pseudo was using SSrc_b64, so it allowed folding immediates into
the destination operand for a tail call to null. However, this is not
a valid operand for the s_setpc_b64 this will be lowered to. Avoids
printing the operand as an invalid immediate.

Avoids a regression when tail calls are enabled in GlobalISel (somehow
tail calls to null get deleted in the DAG).
2021-03-21 13:14:04 -04:00
Nikita Popov 9f864d2025 Reapply [ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast()
There seems to be an impedance mismatch between what the type
system considers an aggregate (structs and arrays) and what
constants consider an aggregate (structs, arrays and vectors).

Adjust the type check to consider vectors as well. The previous
version of the patch dropped the type check entirely, but it
turns out that getAggregateElement() does require the constant
to be an aggregate in some edge cases: For Poison/Undef the
getNumElements() API is called, without checking in advance that
we're dealing with an aggregate. Possibly the implementation should
avoid doing that, but for now I'm adding an assert so the next
person doesn't fall into this trap.
2021-03-21 17:48:21 +01:00
Nikita Popov daae927f9c [InstSimplify] Clean up SimplifyReplacedWithOp implementation (NFCI)
Replace Op with RepOp up-front, and then always work with the new
operands, rather than checking for replacement in various places.
2021-03-21 15:30:30 +01:00
Matt Arsenault 1098acd46d GlobalISel: Avoid unnecessary truncation to i64
We can just directly pass through the APInt to create a new constant.
2021-03-21 10:07:41 -04:00
Matt Arsenault 6314a72730 AMDGPU/GlobalISel: Enable CSE in pre-legalizer combiner 2021-03-21 10:07:37 -04:00
Simon Pilgrim 64c2641c89 [DAG] Limit (sext_in_reg (zero_extend_vector_inreg x)) to exact sign extension
As commented by @craig.topper on rG1ba5c550d418, we can't guarantee that we'll be extending zero bits, just sign bit. So, revert to the old code for zero_extend_vector_inreg cases.
2021-03-21 14:01:37 +00:00
Simon Pilgrim 3179588947 [X86][AVX] ComputeNumSignBitsForTargetNode - add X86ISD::VBROADCAST handling for scalar sources
The target shuffle code handles vector sources, but X86ISD::VBROADCAST can also accept a scalar source for splatting.

Added as an extension to PR49658
2021-03-21 12:22:51 +00:00
David Green 6d9d2049c8 [ARM] VINS f16 pattern
This adds an extra pattern for inserting an f16 into a odd vector lane
via an VINS. If the dual-insert-lane pattern does not happen to apply,
this can help with some simple cases.

Differential Revision: https://reviews.llvm.org/D95471
2021-03-21 12:00:06 +00:00
luxufan 02ffbac844 [RISCV] remove redundant instruction when eliminate frame index
The reason for generating mv a0, a0 instruction is when the stack object offset is large then int<12>. To deal this situation, in the elimintateFrameIndex function, it will
create a virtual register, which needs the register scavenger to scavenge it. If the machine instruction that contains the stack object and the opcode is ADDI(the addi
was generated by frameindexNode), and then this instruction's destination register was the same as the register that was generated by the register scavenger, then the
mv a0, a0 was generated. So to eliminnate this instruction, in the eliminateFrameIndex function, if the instrution opcode is ADDI, then the virtual register can't be created.

Differential Revision: https://reviews.llvm.org/D92479
2021-03-21 18:54:00 +08:00
Simon Pilgrim 297b9bc3fa [X86][AVX] computeKnownBitsForTargetNode - add X86ISD::VBROADCAST handling for scalar sources
The target shuffle code handles vector sources, but X86ISD::VBROADCAST can also accept a scalar source for splatting.

Suggested by @craig.topper on PR49658
2021-03-21 10:40:57 +00:00
Simon Pilgrim 54a05f2ec8 [X86] computeKnownBitsForTargetNode - add X86ISD::PMULUDQ handling
Reuse the existing KnownBits multiplication code to handle what is effectively a ISD::UMUL_LOHI varient
2021-03-21 09:57:20 +00:00
Jessica Clarke b2bb003774
[RISCV] Update comment in RISCVInstrInfoM.td
Missed in 07ed62b7d5.
2021-03-20 22:35:40 +00:00
Craig Topper 07ed62b7d5 [RISCV] Disable (mul (and X, 0xffffffff), (and Y, 0xffffffff)) optimization when Zba is enabled.
This optimization is trying to save SRLI instructions needed to
implement the ANDs. If we have zext.w we won't save anything.
Because we don't check that the multiply is the only user of the
AND we might even increase instruction count.
2021-03-20 15:31:45 -07:00
Craig Topper b0d8823a8a [RISCV] Add isel pattern to optimize (mul (and X, 0xffffffff), (and Y, 0xffffffff)) on RV64
This patterns computes the full 64 bit product of a 32x32 unsigned
multiply. This requires a two pairs of SLLI+SRLI to zero the
upper 32 bits of the inputs.

We can do better than this by using two SLLI to move the lower
bits to the upper bits then use MULHU to compute the product. This
is the high half of a full 64x64 product. Since we put 32 0s in the lower
bits of the inputs we know the 128-bit product will have zeros in the
lower 64 bits. So the upper 64 bits, which MULHU computes, will contain
the original 64 bit product we were after.

The same trick would work for (mul (sext_inreg X, i32), (sext_inreg Y, i32))
using MULHS, but sext_inreg is sext.w which is already one instruction so we
wouldn't save anything.

Differential Revision: https://reviews.llvm.org/D99026
2021-03-20 14:55:46 -07:00
Sanjay Patel ee8b53815d [BranchProbability] move options for 'likely' and 'unlikely'
This makes the settings available for use in other passes by housing
them within the Support lib, but NFC otherwise.

See D98898 for the proposed usage in SimplifyCFG
(where this change was originally included).

Differential Revision: https://reviews.llvm.org/D98945
2021-03-20 14:46:46 -04:00
Fangrui Song 879760c245 [VE] Fix types of multiclass template arguments in TableGen files
There were not properly checked before `[TableGen] Improve handling of template arguments`.
2021-03-20 10:36:51 -07:00
Jeroen Dobbelaere 77080a1eb6 Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types.
Now that intrinsic name mangling can cope with unnamed types, the custom name mangling in PredicateInfo (introduced by D49126) can be removed.
(See D91250, D48541)

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D91661
2021-03-20 11:37:09 +01:00
Wang, Pengfei 2327513b85 [X86] Fix a bug when calculating the ldtilecfg insertion points.
The BB we initialized the ldtilecfg is special. We don't need to check
if its predecessor BBs need to insert ldtilecfg for calls.

We reused the flag HasCallBeforeAMX, so that the predecessors won't be
added to CfgNeedInsert.

This case happens only when the entry BB is in a loop. We need to hoist
the first tile config point out of the loop in future.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D98845
2021-03-20 17:48:59 +08:00
Juneyoung Lee 319d093b87 [CFLGraph] Fix a crash due to missing handling of freeze
https://reviews.llvm.org/D85534#2636321
2021-03-21 02:14:13 +09:00
Carl Ritson 6c9cac5da1 [AMDGPU] Add MDT update missing from D98915 2021-03-20 13:38:58 +09:00
Nemanja Ivanovic ea48bf8649 [PowerPC][NFC] Do not produce i64 constants in 32-bit mode
There are some instances where we produce constants of type MVT::i64
unconditionally in the target DAG combines. This is not actually
valid in 32-bit mode.
2021-03-19 22:54:47 -05:00
Craig Topper d5c1d305b3 [RISCV] Rename WriteShift/ReadShift scheduler classes to WriteShiftImm/ReadShiftImm. Move variable shifts from WriteIALU/ReadIALU to new WriteShiftReg/ReadShiftReg.
Previously only immediate shifts were in WriteShift. Register
shifts were grouped with IALU. Seems likely that immediate shifts
would be as fast or faster than register shifts. And that immediate
shifts wouldn't be any faster than IALU. So if any deserved to be in
their own group it should be register shifts not immediate shifts.

Rather than try to flip them let's just add more granularity
and give each kind their own class. I've used new names for both to
make them unambiguous and to force any downstream implementations to
be forced to put correct information in their scheduler models.

Reviewed By: evandro

Differential Revision: https://reviews.llvm.org/D98911
2021-03-19 20:39:49 -07:00
Carl Ritson fe5f4c397f [AMDGPU] Rename SIInsertSkips Pass
Pass no longer handles skips.  Pass now removes unnecessary
unconditional branches and lowers early termination branches.
Hence rename to SILateBranchLowering.

Move code to handle returns to epilog from SIPreEmitPeephole
into SILateBranchLowering. This means SIPreEmitPeephole only
contains optional optimisations, and all required transforms
are in SILateBranchLowering.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D98915
2021-03-20 11:48:04 +09:00
Carl Ritson 5df2af8b0e [AMDGPU] Merge SIRemoveShortExecBranches into SIPreEmitPeephole
SIRemoveShortExecBranches is an optimisation so fits well in the
context of SIPreEmitPeephole.

Test changes relate to early termination from kills which have now
been lowered prior to considering branches for removal.
As these use s_cbranch the execz skips are now retained instead.
Currently either behaviour is valid as kill with EXEC=0 is a nop;
however, if early termination is used differently in future then
the new behaviour is the correct one.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D98917
2021-03-20 11:26:42 +09:00
Carl Ritson b76c09023d [AMDGPU] Allow index optimisation in SIPreEmitPeephole for bundles
Add code so duplication index register changes can be removed from
inside bundles.

Reviewed By: rampitec, foad

Differential Revision: https://reviews.llvm.org/D98940
2021-03-20 10:26:23 +09:00
Anshil Gandhi 697f90ebfa [NFC] [PowerPC] Determine Endianness in PPCTargetMachine
The TargetMachine uses the triple to determine endianness. Just
use that logic rather than replicating it in PPCSubtarget.

Differential revision: https://reviews.llvm.org/D98674
2021-03-19 20:22:16 -05:00
Lang Hames 602e19ed79 [JITLink] Don't issue lookups for empty symbol sets.
Issuing a lookup for an empty symbol set is legal, but can actually result in
unrelated work being done if there was a work queue left over from the previous
lookup. We can avoid doing this unrelated work (reducing stack depth and
interleaving of debugging output) by not issuing these no-op lookups in the
first place.
2021-03-19 16:10:47 -07:00
Christoffer Lernö 528f6f7d61 Add type attributes to LLVM C API
The LLVM C API is missing type attributes as is needed by attributes
such as sret and byval. This patch adds three missing wrapper
functions.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=48249

https://reviews.llvm.org/D97763
2021-03-19 19:07:04 -04:00
Arthur Eubanks a17394dc88 [NewPM] Verify LoopAnalysisResults after a loop pass
All loop passes should preserve all analyses in LoopAnalysisResults. Add
checks for those when the checks are enabled (which is by default with
expensive checks on).

Note that due to PR44815, we don't check LAR's ScalarEvolution.
Apparently calling SE.verify() can change its results.

This is a reland of https://reviews.llvm.org/D98820 which was reverted
due to unacceptably large compile time regressions in normal debug
builds.
2021-03-19 14:56:37 -07:00
Jessica Paquette 4773dd5ba9 [GlobalISel] Add G_SBFX + G_UBFX (bitfield extraction opcodes)
There is a bunch of similar bitfield extraction code throughout *ISelDAGToDAG.

E.g, ARMISelDAGToDAG, AArch64ISelDAGToDAG, and AMDGPUISelDAGToDAG all contain
code that matches a bitfield extract from an and + right shift.

Rather than duplicating code in the same way, this adds two opcodes:

- G_UBFX (unsigned bitfield extract)
- G_SBFX (signed bitfield extract)

They work like this

```
%x = G_UBFX %y, %lsb, %width
```

Where `lsb` and `width` are

- The least-significant bit of the extraction
- The width of the extraction

This will extract `width` bits from `%y`, starting at `lsb`. G_UBFX zero-extends
the result, while G_SBFX sign-extends the result.

This should allow us to use the combiner to match the bitfield extraction
patterns rather than duplicating pattern-matching code in each target.

Differential Revision: https://reviews.llvm.org/D98464
2021-03-19 14:37:19 -07:00
Arthur Eubanks a1ab5627f0 Revert "[NewPM] Verify LoopAnalysisResults after a loop pass"
This reverts commit 94c269baf5.

Still causes too large of compile time regression in normal debug
builds. Will put under expensive checks instead.
2021-03-19 14:31:08 -07:00
Arthur Eubanks 94c269baf5 [NewPM] Verify LoopAnalysisResults after a loop pass
All loop passes should preserve all analyses in LoopAnalysisResults. Add
    checks for those.

    Note that due to PR44815, we don't check LAR's ScalarEvolution.
    Apparently calling SE.verify() can change its results.

    Only verify MSSA when VerifyMemorySSA, normally it's very expensive.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98820
2021-03-19 13:26:45 -07:00
Craig Topper 1066dcb550 [AArch64] Fix LowerMGATHER to return the chain result for floating point gathers.
Found by adding asserts to LegalizeDAG to make sure custom legalized
results had the right types.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D98968
2021-03-19 11:53:46 -07:00
David Green a2e0312cda [ARM] Tone down the MVE scalarization overhead
The scalarization overhead was set deliberately high for MVE, whilst the
codegen was new. It helps protect us against the negative ramifications
of mixing scalar and vector instructions. This decreases that,
especially for floating point where the cost of extracting/inserting
lane elements can be low. For integer the cost is still fairly high due
to the cross-register-bank copy, but is no longer n^2 in the length of
the vector.

In general, this will decrease the cost of scalarizing floats and long
integer vectors. i64 increase in cost, having a high cost before and
after this patch. For floats this allows up to start doing things like
vectorizing fdiv instructions, even if they are scalarized.

Differential Revision: https://reviews.llvm.org/D98245
2021-03-19 18:30:11 +00:00
Philip Reames 5698537f81 Update basic deref API to account for possiblity of free [NFC]
This patch is plumbing to support work towards the goal outlined in the recent llvm-dev post "[llvm-dev] RFC: Decomposing deref(N) into deref(N) + nofree".

The point of this change is purely to simplify iteration on other pieces on way to making the switch. Rebuilding with a change to Value.h is slow and painful, so I want to get the API change landed. Once that's done, I plan to more closely audit each caller, add the inference rules in their own patch, then post a patch with the langref changes and test diffs. The value of the command line flag is that we can exercise the inference logic in standalone patches without needing the whole switch ready to go just yet.

Differential Revision: https://reviews.llvm.org/D98908
2021-03-19 11:17:19 -07:00
Alexey Bataev 14ae0cf0f5 [Cost]Canonicalize the cost for logical or/and reductions.
The generic cost of logical or/and reductions should be cost of bitcast
<ReduxWidth x i1> to iReduxWidth + cmp eq|ne iReduxWidth.

Differential Revision: https://reviews.llvm.org/D97961
2021-03-19 11:01:58 -07:00
Craig Topper 5d315691c4 [RISCV] Add missing bitcasts to the results of lowerINSERT_SUBVECTOR and lowerEXTRACT_SUBVECTOR when handling mask vectors.
Found by adding asserts to LegalizeDAG to catch incorrect result
types being returned.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98964
2021-03-19 10:54:33 -07:00
Craig Topper 95998b898c [Hexagon] Return an i64 for result 0 from LowerREADCYCLECOUNTER instead of an i32.
As far as I can tell, the node coming in has an i64 result so the
return should have the same type. The HexagonISD node used for
this has a type profile that says the result is i64.

Found while trying to add assserts to LegalizeDAG to catch
result type mismatches.

Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D98962
2021-03-19 10:54:33 -07:00
Andrei Elovikov 92205cb27f [NFC][VPlan] Guard print routines with "#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)"
Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D98897
2021-03-19 10:50:12 -07:00
Andrei Elovikov 93a9d2de8f [VPlan] Add plain text (not DOT's digraph) dumps
I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially
   inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in
   LIT test would become too obscure. I can imagine that we'd want to CHECK
   against VPlan dumps after multiple transformations instead. That would be
   easier with plain text dumps than with DOT format.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96628
2021-03-19 10:50:12 -07:00