22a0edd0 introduced a config IsStrictFPEnabled, which controls the
strict floating point mutation (transforming some strict-fp operations
into non-strict in ISel). This patch disables the mutation by default
since we've finished PowerPC strict-fp enablement in backend.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87222
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.
One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87220
This patch makes these operations legal, and add necessary codegen
patterns.
There's still some issue similar to D77033 for conversion from v1i128
type. But normal type tests synced in vector-constrained-fp-intrinsic
are passed successfully.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D83654
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.
Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.
The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.
All this also applies to the BotHeightReduce heuristic in the bottom
zone.
Differential Revision: https://reviews.llvm.org/D72392
We currently miss a number of opportunities to emit single-instruction
VMRG[LH][BHW] instructions for shuffles on little endian subtargets. Although
this in itself is not a huge performance opportunity since loading the permute
vector for a VPERM can always be pulled out of loops, producing such merge
instructions is useful to downstream optimizations.
Since VPERM is essentially opaque to all subsequent optimizations, we want to
avoid it as much as possible. Other permute instructions have semantics that can
be reasoned about much more easily in later optimizations.
This patch does the following:
- Canonicalize shuffles so that the first element comes from the first vector
(since that's what most of the mask matching functions want)
- Switch the elements that come from splat vectors so that they match the
corresponding elements from the other vector (to allow for merges)
- Adds debugging messages for when a shuffle is matched to a VPERM so that
anyone interested in improving this further can get the info for their code
Differential revision: https://reviews.llvm.org/D77448
This patch adds handling of constrained FP intrinsics about round,
truncate and extend for PowerPC target, with necessary tests.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D64193
The fix for PR39865 took care of some of the handling for half precision
but it missed a number of issues that still exist. This patch fixes the
remaining issues that cause crashes in the PPC back end.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45776
Differential revision: https://reviews.llvm.org/D79283
This patch adds strict-fp intrinsics support for fma, fsqrt, fmaxnum and
fminnum on PowerPC.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D72749
There are a few patterns where we use a superclass for inputs to this
instruction rather than the correct class. This can sometimes lead to
unncessary copies.
This change implements constant folding to constrained versions of
intrinsics, implementing rounding: floor, ceil, trunc, round, rint and
nearbyint.
Differential Revision: https://reviews.llvm.org/D72930
Exploit native VSX rounding instruction, x(v|s)r(d|s)pic, which does
rounding using current rounding mode.
According to C standard library, rint may raise INEXACT exception while
nearbyint won't.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D72685
Commit 0f0330a787 legalized these nodes on PPC without consideration of
unsafe math which means that we get inexact exceptions raised for nearbyint.
Since this doesn't conform to the standard, switch this legalization to depend
on unsafe fp math.
VSX provides a full complement of rounding instructions yet we somehow ended up
with some of them legal and others not. This just legalizes all of the FP
rounding nodes and the FP -> int rounding nodes with unsafe math.
Differential revision: https://reviews.llvm.org/D69949
Summary:
If we didn't set the value for hasSideEffects bit in our td file, `llvm-tblgen`
will set it as true for those instructions which has no match pattern.
The instructions `MTLR` and `MFLR` don't set the hasSideEffects flag and don't
have match pattern, so their hasSideEffects flag will be set true by
`llvm-tblgen`.
But in fact, we can use `[LR]` to model the two instructions, so they should not
have SideEffects.
This patch is to modify the hasSideEffects of MTLR and MFLR from 1 to 0.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D71390
The following intrinsics currently carry a rounding mode metadata argument:
llvm.experimental.constrained.minnum
llvm.experimental.constrained.maxnum
llvm.experimental.constrained.ceil
llvm.experimental.constrained.floor
llvm.experimental.constrained.round
llvm.experimental.constrained.trunc
This is not useful since the semantics of those intrinsics do not in any way
depend on the rounding mode. In similar cases, other constrained intrinsics
do not have the rounding mode argument. Remove it here as well.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71218
Summary:
This patch fixes an issue where the PPC MI peephole optimization pass incorrectly remove a vector swap.
Specifically, the pass can combine a splat/swap to a splat/copy. It uses `TargetRegisterInfo::lookThruCopyLike` to determine that the operands to the splat are the same. However, the current logic only compares the operands based on register numbers. In the case where the splat operands are ultimately feed from the same physical register, the pass can incorrectly remove a swap if the feed register for one of the operands has been clobbered.
This patch adds a check to ensure that the registers feeding are both virtual registers or the operands to the splat or swap are both the same register.
Here is an example in pseudo-MIR of what happens in the test cased added in this patch:
Before PPC MI peephole optimization:
```
%arg = XVADDDP %0, %1
$f1 = COPY %arg.sub_64
call double rint(double)
%res.first = COPY $f1
%vec.res.first = SUBREG_TO_REG 1, %res.first, %subreg.sub_64
%arg.swapped = XXPERMDI %arg, %arg, 2
$f1 = COPY %arg.swapped.sub_64
call double rint(double)
%res.second = COPY $f1
%vec.res.second = SUBREG_TO_REG 1, %res.second, %subreg.sub_64
%vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0
%vec.res = XXPERMDI %vec.res.splat, %vec.res.splat, 2
; %vec.res == [ %vec.res.second[0], %vec.res.first[0] ]
```
After optimization:
```
; ...
%vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0
; lookThruCopyLike(%vec.res.first) == lookThruCopyLike(%vec.res.second) == $f1
; so the pass replaces the swap with a copy:
%vec.res = COPY %vec.res.splat
; %vec.res == [ %vec.res.first[0], %vec.res.second[0] ]
```
As best as I can tell, this has occurred since r288152, which added support for lowering certain vector operations to direct moves in the form of a splat.
Committed for vddvss (Colin Samples). Thanks Colin for the patch!
Differential Revision: https://reviews.llvm.org/D69497
A set of function attributes is required in any function that uses constrained
floating point intrinsics. None of our tests use these attributes.
This patch fixes this.
These tests have been tested against the IR verifier changes in D68233.
Reviewed by: andrew.w.kaylor, cameron.mcinally, uweigand
Approved by: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D67925
llvm-svn: 373761
Summary:
This is brought up in
https://reviews.llvm.org/D64662?id=209923#inline-599490
CFI information are non-relevant to quite some testcases,
we should get rid of checking them when its unecessary.
This patch avoid generating cfi info in testcases that are not
testing prolog/epilog or exception handling.
Reviewers: kbarton, hfinkel, nemanjai, #powerpc
Reviewed By: hfinkel
Subscribers: MaskRay, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67016
llvm-svn: 370505
This patch changes the DAG legalizer to respect the operation actions
set by the target for strict floating-point operations. (Currently, the
legalizer will usually fall back to mutate to the non-strict action
(which is assumed to be legal), and only skip mutation if the strict
operation is marked legal.)
With this patch, if whenever a strict operation is marked as Legal or
Custom, it is passed to the target as usual. Only if it is marked as
Expand will the legalizer attempt to mutate to the non-strict operation.
Note that this will now fail if the non-strict operation is itself
marked as Custom -- the target will have to provide a Custom definition
for the strict operation then as well.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D65226
llvm-svn: 368012
Use the PPC vector min/max instructions for computing the corresponding
operation as these should be faster than the compare/select sequences
we currently emit.
Differential revision: https://reviews.llvm.org/D47332
llvm-svn: 362759
The powerpc64-"nonle" tests are removed. They fail because of a bug that
Drew is currently working on that affects multiple targets.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Hal Finkel, Kevin P. Neal
Approved by: Hal Finkel
Differential Revision: http://reviews.llvm.org/D62388
llvm-svn: 361985