Commit Graph

7 Commits

Author SHA1 Message Date
Matt Arsenault 301162c4fe AMDGPU: Replace i64 add/sub lowering
Use VOP3 add/addc like usual.

This has some tradeoffs. Inline immediates fold
a little better, but other constants are worse off.
SIShrinkInstructions could be made smarter to handle
these cases.

This allows us to avoid selecting scalar adds where we
need to track the carry in scc and replace its users.
This makes it easier to use the carryless VALU adds.

llvm-svn: 318340
2017-11-15 21:51:43 +00:00
Matt Arsenault aafff87dda AMDGPU: Do not fold clamp instructions when sources are different
Patch by hakzsam (Samuel Pitoiset)

llvm-svn: 314951
2017-10-05 00:13:17 +00:00
Matt Arsenault 6b114d2c50 AMDGPU: Select clamp pattern with v2f16
llvm-svn: 312087
2017-08-30 01:20:17 +00:00
Stanislav Mekhanoshin 79da2a7698 [AMDGPU] Remove getBidirectionalReasonRank
This method inverts the Reason field of a scheduling candidate.
It does right comparison between RegCritical and RegExcess, but
everything else is broken. In fact it can prefer less strong reason
such as Weak over RegCritical because Weak > -RegCritical.

The CandReason enum is properly sorted, so just remove artificial
ranking.

Differential Revision: https://reviews.llvm.org/D30557

llvm-svn: 297536
2017-03-11 00:29:27 +00:00
Matt Arsenault 79a45db7f5 AMDGPU: Use clamp with f64
llvm-svn: 295908
2017-02-22 23:53:37 +00:00
Matt Arsenault d5c6515b68 AMDGPU: Fold FP clamp as modifier bit
The manual is unclear on the details of this. It's not
clear to me if denormals are not allowed with clamp,
or if that is only omod. Not allowing denorms for
fp16 or fp64 isn't useful so I also question if that
is really a restriction. Same with whether this is valid
without IEEE mode enabled.

llvm-svn: 295905
2017-02-22 23:27:53 +00:00
Matt Arsenault 2fdf2a1a18 AMDGPU: Redefine clamp node as clamp 0.0-1.0
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.

Also allow using clamp with f16, and use knowledge
of dx10_clamp.

llvm-svn: 295788
2017-02-21 23:35:48 +00:00