Commit Graph

32464 Commits

Author SHA1 Message Date
Benjamin Kramer 8bc0bb9564 Add a conversion from double to bf16
This introduces a new compiler-rt function `__truncdfbf2`.
2022-06-15 12:56:31 +02:00
Benjamin Kramer fb34d531af Promote bf16 to f32 when the target doesn't support it
This is modeled after the half-precision fp support. Two new nodes are
introduced for casting from and to bf16. Since casting from bf16 is a
simple operation I opted to always directly lower it to integer
arithmetic. The other way round is more complicated if you want to
preserve IEEE semantics, so it's handled by a new __truncsfbf2
compiler-rt builtin.

This is of course very bare bones, but sufficient to get a semi-softened
fadd on x86.

Possible future improvements:
 - Targets with bf16 conversion instructions can now make fp_to_bf16 legal
 - The software conversion to bf16 can be replaced by a trivial
   implementation under fast math.

Differential Revision: https://reviews.llvm.org/D126953
2022-06-15 12:56:31 +02:00
Keith Walker 94fac097ad [DebugInfo][ARM] Not readonly check for RWPI globals
When compiling for the RWPI relocation model [1], the debug information
is wrong for readonly global variables.

Writable global variables are accessed by the static base register (R9
on ARM) in the RWPI relocation model.  This is being correctly generated

Readonly global variables are not accessed by the static base register
in the RWPI relocation model. This case is incorrectly generating the
same debugging information as for writable global variables.

References:
[1] ARM Read-Write Position Independence: https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#read-write-position-independence-rwpi

Differential Revision: https://reviews.llvm.org/D126361
2022-06-15 11:52:12 +01:00
Simon Pilgrim f096d5926d [DAG] Fix SDLoc mismatch in (shl (srl x, c1), c2) -> and(shift(x,c3)) fold
Noticed by @craig.topper on D125836 which uses a tweaked copy of the same code.

Differential Revision: https://reviews.llvm.org/D127772
2022-06-15 11:07:59 +01:00
Ping Deng c06f77ec0d [SelectionDAG] fold 'Op0 - (X * MulC)' to 'Op0 + (X << log2(-MulC))'
Reviewed By: craig.topper, spatel

Differential Revision: https://reviews.llvm.org/D127474
2022-06-15 05:50:18 +00:00
Paul Robinson 0ce33c2941 [PS5] Default to 'sce' debugger tuning 2022-06-14 15:28:28 -07:00
Serguei Katkov d713f0eab8 Revert "[MachineSSAUpdater] compile time improvement in GetValueInMiddleOfBlock"
It looks like it causes buildbot failures.
As an example: https://lab.llvm.org/buildbot/#/builders/121/builds/20364

Revert to investigate...

This reverts commit 6bf2791814.
2022-06-14 20:27:21 +07:00
Florian Hahn e5c4308ba1
[InterleavedLoadComb] Rename uses when inserting new uses.
This fixes a crash due to uses needing to be renamed.
2022-06-14 13:15:23 +01:00
Serguei Katkov 6bf2791814 [MachineSSAUpdater] compile time improvement in GetValueInMiddleOfBlock
GetValueInMiddleOfBlock uses result of GetValueAtEndOfBlockInternal if there is no value
defined for current basic block.

If there is already a value it tries (in this order):

to find single register coming from all predecessors
find existing phi node which matches our incoming registers
build new phi.
The compile time improvement is to use current available value if
it is defined out of current BB or it is a PHI register.
This is due to it can be used in the middle basic block.

Reviewed By: sameerds
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D126523
2022-06-14 18:00:34 +07:00
Guillaume Chatelet c0e85f1c3b [NFC][Alignment] Use Align in SafeStack 2022-06-14 10:56:36 +00:00
Guillaume Chatelet 6725d80640 [NFC][Alignment] Use Align in shouldAlignPointerArgs 2022-06-14 10:56:36 +00:00
Denis Antrushin c0e965e222 [Statepoints] FixupStatepoint: Clear isKill flag if COPY is not deleted.
When spilling CSRs, FixupStatepoint pass does simple copy propagation,
trying to find COPY instruction which defines register being spilled
and spill COPY source instead. I.e., if we have CSR $x and found
  $x = COPY $y
we will spill $y instead.
But we may be unable to delete COPY instruction for some reason.
Then, spill will be inserted after it, adding another use of $y.
If COPY instruction was last use of $y (killed it), after insertion of
the spill it is not, so `isKill` flag must be cleared. We failed to do
so and this patch fixes this issue.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D127308
2022-06-14 10:52:32 +03:00
Kazu Hirata a2232da2a5 [CodeGen] Remove addSEHCatchHandler and addSEHCleanupHandler (NFC)
The last uses of these functions are removed on Oct 9, 2015 in commit
14e773500e.
2022-06-13 23:08:49 -07:00
Kazu Hirata 34ff78c5cf [CodeGen] Remove restrictRef (NFC)
The last use was removed on Apr 14, 2017 in commit
4fe9d6c640.
2022-06-13 23:08:48 -07:00
Serguei Katkov 095bf6be28 [Greedy RegAlloc] Fix the handling of split register in last chance re-coloring.
This is a fix for https://github.com/llvm/llvm-project/issues/55827.

When register we are trying to re-color is split the original register (we tried to recover)
has no uses after the split. However in rollback actions we assign back physical register to it.
Later it causes different assertions. One of them is in attached test.

This CL fixes this by avoiding assigning physical register back to register which has no usage
or its live interval now is empty.

Reviewed By: arsenm, qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D127281
2022-06-14 12:04:17 +07:00
Kazu Hirata 145cc9db2b [CodeGen] Remove futureWeight (NFC)
The last use was removed on Jun 5, 2022 in
commit 5c06f7168f, which itself was
a patch to remove unused code.
2022-06-13 17:10:23 -07:00
Guillaume Chatelet 5a293d21fc [NFC][Alignment] Use getAlign in SelectionDAGBuilder 2022-06-13 15:13:05 +00:00
Kazu Hirata 23d9ca10ae [CodeGen] Remove EvictionTrack (NFC)
The last of getEvictor use was removed on Jun 5, 2022 in commit
5c06f7168f, which was itself a patch to
remove unused code.

Once we remove getEvictor, EvictionTrack becomes a write-only data
structure.  The data in it won't affect compilation, so the entire
class is essentially dead.
2022-06-13 07:21:29 -07:00
Kazu Hirata 246e83e973 [GlobalISel] Remove buildSequence (NFC)
The last use was removed on Jun 27, 2019 in commit
8138996128.
2022-06-13 06:58:36 -07:00
Nikita Popov b9a7dea917 [SelectionDAG] Handle trapping aggregate (PR49839)
Call canTrap() on Constant to account for trapping
ConstantAggregate.
2022-06-13 15:06:53 +02:00
Simon Pilgrim 7d8fd4f5db [DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere
Another issue unearthed by D127115

We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with).

D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded.

This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late.

Differential Revision: https://reviews.llvm.org/D127595
2022-06-13 11:48:18 +01:00
Jez Ng d4bcb45db7 [MC][re-land] Omit DWARF unwind info if compact unwind is present where eligible
This reverts commit d941d59783.

Differential Revision: https://reviews.llvm.org/D122258
2022-06-12 17:24:19 -04:00
Simon Pilgrim 1cf9b24da3 [DAG] Enable ISD::FSHL/R SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

This helps with several of the regressions from D125836
2022-06-12 19:25:20 +01:00
Jez Ng d941d59783 Revert "[MC] Omit DWARF unwind info if compact unwind is present where eligible"
This reverts commit ef501bf85d.
2022-06-12 10:47:08 -04:00
Jez Ng ef501bf85d [MC] Omit DWARF unwind info if compact unwind is present where eligible
Previously, omitting unnecessary DWARF unwinds was only done in two
cases:
* For Darwin + aarch64, if no DWARF unwind info is needed for all the
  functions in a TU, then the `__eh_frame` section would be omitted
  entirely. If any one function needed DWARF unwind, then MC would emit
  DWARF unwind entries for all the functions in the TU.
* For watchOS, MC would omit DWARF unwind on a per-function basis, as
  long as compact unwind was available for that function.

This diff makes it so that we omit DWARF unwind on a per-function basis
for Darwin + aarch64 as well. In addition, we introduce the flag
`--emit-dwarf-unwind=` which can toggle between `always`,
`no-compact-unwind` (only emit DWARF when CU cannot be emitted for a
given function), and the target platform `default`.  `no-compact-unwind`
is particularly useful for newer x86_64 platforms: we don't want to omit
DWARF unwind for x86_64 in general due to possible backwards compat
issues, but we should make it possible for people to opt into this
behavior if they are only targeting newer platforms.

**Motivation:** I'm working on adding support for `__eh_frame` to LLD,
but I'm concerned that we would suffer a perf hit. Processing compact
unwind is already expensive, and that's a simpler format than EH frames.
Given that MC currently produces one EH frame entry for every compact
unwind entry, I don't think processing them will be cheap. I tried to do
something clever on LLD's end to drop the unnecessary EH frames at parse
time, but this made the code significantly more complex. So I'm looking
at fixing this at the MC level instead.

**Addendum:** It turns out that there was a latent bug in the X86
backend when `OmitDwarfIfHaveCompactUnwind` is naively enabled, which is
not too surprising given that this combination has not been heretofore
used.

For functions that have unwind info that cannot be encoded with CU, MC
would end up dropping both the compact unwind entry (OK; existing
behavior) as well as the DWARF entries (not OK).  This diff fixes things
so that we emit the DWARF entry, as well as a CU entry with encoding
`UNWIND_X86_MODE_DWARF` -- this basically tells the unwinder to look for
the DWARF entry. I'm not 100% sure the `UNWIND_X86_MODE_DWARF` CU entry
is necessary, this was the simplest fix. ld64 seems to be able to handle
both the absence and presence of this CU entry. Ultimately ld64 (and
LLD) will synthesize `UNWIND_X86_MODE_DWARF` if it is absent, so there
is no impact to the final binary size.

Reviewed By: davide, lhames

Differential Revision: https://reviews.llvm.org/D122258
2022-06-12 10:03:56 -04:00
Simon Pilgrim 54ae4ca755 [DAG] visitSRL - pull out ShiftVT. NFC. 2022-06-12 14:02:23 +01:00
Simon Pilgrim cf5c63d187 [DAG] visitVECTOR_SHUFFLE - fold splat(insert_vector_elt()) and splat(scalar_to_vector()) to build_vector splats
Addresses a number of regressions identified in D127115
2022-06-11 21:06:42 +01:00
Simon Pilgrim 44a0cd25df [DAG] visitINSERT_VECTOR_ELT - add <1 x ???> insert_vector_elt(v0,extract_vector_elt(v1,0),0) special case handling
Check if we're just replacing one v1x?? vector with another
2022-06-11 19:30:00 +01:00
Simon Pilgrim a71ad6a3c8 [DAG] visitINSERT_VECTOR_ELT - fold insert_vector_elt(scalar_to_vector(x),v,i) -> build_vector()
Allow scalar_to_vector nodes to be used for the start of a build_vector creation
2022-06-11 15:29:22 +01:00
Simon Pilgrim 693f4db1ec [DAG] visitINSERT_VECTOR_ELT - refactor BUILD_VECTOR insertion to remove early-out. NFCI.
Remove the early-out cases so we can more easily add additional folds in the future.
2022-06-11 12:01:13 +01:00
Paul Walker 10d55c4634 [SelectionDAG] Remove invalid TypeSize conversion from WidenVecOp_BITCAST.
Differential Revision: https://reviews.llvm.org/D127322
2022-06-11 10:41:13 +01:00
Kazu Hirata a98965d92f [CodeGen] Use llvm::erase_value (NFC) 2022-06-10 22:59:48 -07:00
Fangrui Song adf4142f76 [MC] De-capitalize SwitchSection. NFC
Add SwitchSection to return switchSection. The API will be removed soon.
2022-06-10 22:50:55 -07:00
Eli Friedman 0ff51d5dde Fix interaction of CFI instructions with MachineOutliner.
1. When checking if a candidate contains a CFI instruction, actually
iterate over all of the instructions, instead of stopping halfway
through.
2. Make sure copied CFI directives refer to the correct instruction.

Fixes https://github.com/llvm/llvm-project/issues/55842

Differential Revision: https://reviews.llvm.org/D126930
2022-06-10 13:37:49 -07:00
Guillaume Chatelet 95083fa3b8 [NFC] Remove deadcode 2022-06-10 15:13:42 +00:00
Simon Pilgrim 91adbc3208 [DAG] SimplifyDemandedVectorElts - adding SimplifyMultipleUseDemandedVectorElts handling to ISD::CONCAT_VECTORS
Attempt to look through multiple use operands of ISD::CONCAT_VECTORS nodes

Another minor improvement for D127115
2022-06-10 16:06:43 +01:00
Guillaume Chatelet 38637ee477 [clang] Add support for __builtin_memset_inline
In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`.

The idea is to get support from the compiler to easily write efficient memory function implementations.

This patch could be split in two:
 - one for the LLVM part adding the `llvm.memset.inline.*` intrinsics.
 - and another one for the Clang part providing the instrinsic as a builtin.

Differential Revision: https://reviews.llvm.org/D126903
2022-06-10 13:13:59 +00:00
Nikita Popov c10921fa1a [CGP] Also freeze ctlz/cttz operand when despeculating
D125887 changed the ctlz/cttz despeculation transform to insert
a freeze for the introduced branch on zero. While this does fix
the "branch on poison" issue, we may still get in trouble if we
pick a different value for the branch and for the ctz argument
(i.e. non-zero for the branch, but zero for the ctz). To avoid
this, we should use the same frozen value in both positions.

This does cause a regression in RISCV codegen by introducing an
additional sext. The DAG looks like this:

    t0: ch = EntryToken
        t2: i64,ch = CopyFromReg t0, Register:i64 %3
      t4: i64 = AssertSext t2, ValueType:ch:i32
    t23: i64 = freeze t4
          t9: ch = CopyToReg t0, Register:i64 %0, t23
          t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32>
        t18: ch = TokenFactor t9, t16
            t25: i64 = sign_extend_inreg t23, ValueType:ch:i32
          t24: i64 = setcc t25, Constant:i64<0>, seteq:ch
        t28: i64 = and t24, Constant:i64<1>
      t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68>
    t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80>

I don't see a really obvious way to improve this, as we can't push
the freeze past the AssertSext (which may produce poison).

Differential Revision: https://reviews.llvm.org/D126638
2022-06-10 09:46:10 +02:00
Simon Moll b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
Simon Pilgrim 7dbfcfa735 [DAG] combineInsertEltToShuffle - if EXTRACT_VECTOR_ELT fails to match an existing shuffle op, try to replace an undef op if there is one.
This should fix a number of shuffle regressions in D127115 where the re-ordered combines mean we fail to fold a EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT sequence into a BUILD_VECTOR if we extract from more than one vector source.
2022-06-09 14:56:14 +01:00
Guillaume Chatelet dc3367970e [SelectionDAG] Handle bzero/memset libcalls globally instead of per target
Differential Revision: https://reviews.llvm.org/D127279
2022-06-09 08:34:55 +00:00
Craig Topper 4bcfc41846 [SelectionDAG] Teach computeKnownBits that a nsw self multiply produce a positive value.
This matches what we do in IR. For the RISC-V test case, this allows
us to use -8 for the AND mask instead of materializing a constant in a register.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127335
2022-06-08 14:55:58 -07:00
Kai Nacke d897a14c2e [SystemZ] Fix check for zero size when lowering memcmp.
During lowering of memcmp/bcmp, the check for a size of 0 is done
in 2 different ways. In rare cases this can lead to a crash in
SystemZSelectionDAGInfo::EmitTargetCodeForMemcmp(). The root cause
is that SelectionDAGBuilder::visitMemCmpBCmpCall() checks for a
constant int value which is not yet evaluated. When the value is
turned into a SDValue, then the evaluation is done and results in
a ConstantSDNode. But EmitTargetCodeForMemcmp() expects the special
case of 0 length to be handled, which results in an assertion.

The fix is to turn the value into a SDValue, so that both functions
use the same check.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D126900
2022-06-08 14:52:13 -04:00
Simon Pilgrim b84c10d4bc [DAG] visitVSELECT - don't wait for truncation of sub before attempting to match with getTruncatedUSUBSAT
Fixes some X86 PSUBUS regressions encountered in D127115 where the truncate was being replaced with a PACKSS/PACKUS before the fold got called again
2022-06-08 16:16:35 +01:00
Joseph Huber 9e0dbd2a2a [Target] Remove `startswith` for adding `SHF_EXCLUDE` to offload section
Summary:
We use the special section name `.llvm.offloading` to store device
imagees in the host object file. We want these to be stripped by the
linker as they are not used after linking so we use the `SHF_EXCLUDE`
flag to instruct the linker to drop them. We used to do this for all
sections that started with `.llvm.offloading` when we encoded metadata
in the section name itself. Now we embed a special binary containing the
metadata, we should only add the flag on this name specifically.
2022-06-08 09:56:51 -04:00
Paul Walker d88354213c [SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST.
Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work
with TypeSize directly rather than silently casting to unsigned.

To accomplish this I've extended TypeSize with an interface that
essentially allows TypeSize division when both operands have the
same number of dimensions.

There still exists combinations of scalable vector bitcasts that
cause compiler crashes. I call these out by adding "is missing"
entries to sve-bitcast.

Depends on D126957.
Fixes: #55114

Differential Revision: https://reviews.llvm.org/D127126
2022-06-08 10:30:07 +01:00
Paul Walker a1121c31d8 [SVE] Fix incorrect code generation for bitcasts of unpacked vector types.
Bitcasting between unpacked scalable vector types of different
element counts is not a NOP because the live elements are laid out
differently.
               01234567
e.g. nxv2i32 = XX??XX??
     nxv4f16 = X?X?X?X?

Differential Revision: https://reviews.llvm.org/D126957
2022-06-08 10:30:07 +01:00
Chuanqi Xu 0e10f12844 [NFC] Remove commented cerr debugging loggings
There are some unused cerr debugging loggings in the codes. It is weird
to remain such commented debug helpers in the product.
2022-06-08 15:58:06 +08:00
Kito Cheng 7207373e1e Revert "[SplitKit] Handle early clobber + tied to def correctly"
Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS.

This reverts commit e14d04909d.
2022-06-08 13:05:35 +08:00
Kito Cheng e14d04909d [SplitKit] Handle early clobber + tied to def correctly
Spliter will try to extend a live range into `r` slot for a use operand,
that's works on most situaion, however that not work correctly when the operand
has tied to def, and the def operand is early clobber.

Give an example to demo what's wrong:
  0  %0 = ...
 16  early-clobber %0 = Op %0 (tied-def 0), ...
 32  ... = Op %0

Before extend:
 %0 = [0r, 0d) [16e, 32d)

The point we want to extend is 0d to 16e not 16r in this case, but if
we use 16r here we will extend nothing because that already contained
in [16e, 32d).

This patch add check for detect such case and adjust the extend point.

Detailed explanation for testcase: https://reviews.llvm.org/D126047

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D126048
2022-06-08 11:33:05 +08:00