Commit Graph

43 Commits

Author SHA1 Message Date
Nicolai Haehnle 28212cc689 AMDGPU: Remove PHI loop condition optimization
Summary:
The optimization to early break out of loops if all threads are dead was
never fully implemented.

But the PHI node analyzing is actually causing a number of problems, so
remove all the extra code for it.

(This does actually regress code quality in a few places because it
 ends up relying more heavily on phi's of i1, which we don't do a
 great job with. However, since it fixes real bugs in the wild, we
 should take this change. I have some prototype changes to improve
 i1 lowering in general -- not just for control flow -- which should
 help recover the code quality, I just need to make those changes
 fit for general consumption. -- Nicolai)

Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1
Patch-by: Christian König <christian.koenig@amd.com>

Reviewers: arsenm, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53359

llvm-svn: 345718
2018-10-31 13:26:48 +00:00
Konstantin Zhuravlyov bb30ef7af4 AMDGPU: Add clamp bit to dot intrinsics
Differential Revision: https://reviews.llvm.org/D49874

llvm-svn: 338470
2018-08-01 01:31:30 +00:00
Farhana Aleen c370d7b33d [AMDGPU] [AMDGPU] Support a fdot2 pattern.
Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z))
                   -> fdot2((v2f16)S0, (v2f16)S1, (float)z)

Author: FarhanaAleen

Reviewed By: rampitec, b-sumner

Subscribers: AMDGPU

Differential Revision: https://reviews.llvm.org/D49146

llvm-svn: 337198
2018-07-16 18:19:59 +00:00
Stanislav Mekhanoshin 1a1687f1bb [AMDGPU] Convert rcp to rcp_iflag
If a source of rcp instruction is a result of any conversion from
an integer convert it into rcp_iflag instruction. No FP exception
can ever happen except division by zero if a single precision rcp
argument is a representation of an integral number.

Differential Revision: https://reviews.llvm.org/D48569

llvm-svn: 335742
2018-06-27 15:33:33 +00:00
Stanislav Mekhanoshin 8fd3c4e431 [AMDGPU] DAG combine to produce V_PERM_B32
Differential Revision: https://reviews.llvm.org/D48099

llvm-svn: 334559
2018-06-12 23:50:37 +00:00
Tom Stellard 1b95fed6f7 AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP
Summary:
We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp
is gone.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47181

llvm-svn: 333153
2018-05-24 05:28:34 +00:00
Marek Olsak 13e4741275 AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D41663

llvm-svn: 323908
2018-01-31 20:18:04 +00:00
Wei Ding 5676acad9e Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
Differential Revision: http://reviews.llvm.org/D37348

llvm-svn: 315610
2017-10-12 19:37:14 +00:00
Matt Arsenault 71bcbd451f AMDGPU: Start adding tail call support
Handle the sibling call cases.

llvm-svn: 310753
2017-08-11 20:42:08 +00:00
Matt Arsenault b62a4eb524 AMDGPU: Initial implementation of calls
Includes a hack to fix the type selected for
the GlobalAddress of the function, which will be
fixed by changing the default datalayout to use
generic pointers for 0.

llvm-svn: 309732
2017-08-01 19:54:18 +00:00
Stanislav Mekhanoshin e3eb42cef6 [AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setcc
This simplification allows to avoid generating v_cndmask_b32
to serialize condition code between compare and use.

Differential Revision: https://reviews.llvm.org/D34300

llvm-svn: 305962
2017-06-21 22:05:06 +00:00
Matt Arsenault 2b1f9aa577 AMDGPU: Start defining a calling convention
Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as well.

llvm-svn: 303308
2017-05-17 21:56:25 +00:00
Marek Olsak 2d82590f64 AMDGPU: Add new amdgcn.init.exec intrinsics
v2: More tests, bug fixes, cosmetic changes.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D31762

llvm-svn: 301677
2017-04-28 20:21:58 +00:00
Matt Arsenault 3e02538a02 AMDGPU: Move trap lowering to DAG
Fixes traps in any block besides the entry block,
and fixes depending on a live-in physical register
by using a virtual register copy.

Also happens to stop emitting a nop in the case
debug trap is not supported.

llvm-svn: 301206
2017-04-24 17:49:13 +00:00
Matt Arsenault 8edfaee7be AMDGPU: Remove unnecessary ands when f16 is legal
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.

Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.

llvm-svn: 299246
2017-03-31 19:53:03 +00:00
Matt Arsenault 5b20fbb748 AMDGPU: Rename SI_RETURN
This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.

llvm-svn: 298452
2017-03-21 22:18:10 +00:00
Matt Arsenault c5b641ac02 AMDGPU: Cleanup control flow intrinsics
Move backend internal intrinsics along with the rest of the
normal intrinsics, and use the Intrinsic::getDeclaration
API instead of manually constructing the type list.

It's surprising this was working before. fdiv.fast had
the wrong number of parameters. The control flow intrinsic
declaration attributes were not being applied, and
their types were inconsistent. The actual IR use types
did not match the declaration, and were closer to the
types used for the patterns. The brcond lowering
was changing the types, so introduce new nodes for those.

llvm-svn: 298119
2017-03-17 20:41:45 +00:00
Matt Arsenault 86e02ce2dc AMDGPU: Fix unnecessary ands when packing f16 vectors
computeKnownBits didn't handle fp_to_fp16 to report
the high bits as 0. ARM maps the generic node to an instruction
that does not modify the high bits of the register, so introduce
a target node where the high bits are known 0.

llvm-svn: 297873
2017-03-15 19:04:26 +00:00
Wei Ding 4d3d4ca1b3 AMDGPU : Replace FMAD with FMA when denormals are enabled.
Differential Revision: http://reviews.llvm.org/D29958

llvm-svn: 296186
2017-02-24 23:00:29 +00:00
Matt Arsenault 1f17c66890 AMDGPU: Add cvt.pkrtz intrinsic
Convert llvm.SI.packf16 test uses

llvm-svn: 295797
2017-02-22 00:27:34 +00:00
Matt Arsenault 2fdf2a1a18 AMDGPU: Redefine clamp node as clamp 0.0-1.0
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.

Also allow using clamp with f16, and use knowledge
of dx10_clamp.

llvm-svn: 295788
2017-02-21 23:35:48 +00:00
Matt Arsenault 824de226a1 AMDGPU: Remove dead node definitions
llvm-svn: 295247
2017-02-15 22:23:04 +00:00
Jan Vesely f170504c41 AMDGPU/R600: Serialize vector trunc stores to private AS
Add DUMMY_CHAIN SDNode to denote stores of interest

Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915
Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411

Differential Revision: https://reviews.llvm.org/D27964

llvm-svn: 292651
2017-01-20 21:24:26 +00:00
Matt Arsenault 4165efdc58 AMDGPU: Add replacement export intrinsics
llvm-svn: 292205
2017-01-17 07:26:53 +00:00
Jan Vesely d48445d513 AMDGPU/SI: Implement sendmsghalt intrinsic
v2: expose using amdgcn prefix

Differential Revision: https://reviews.llvm.org/D23511

llvm-svn: 290977
2017-01-04 18:06:55 +00:00
Tom Stellard 8485fa096e AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.
Patch By: Wei Ding

Summary: This patch fixes the fdiv precision issues.

Reviewers: b-sumner, cfang, wdng, arsenm

Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D26424

llvm-svn: 288879
2016-12-07 02:42:15 +00:00
Matt Arsenault 7bee6ac798 AMDGPU: Refactor exp instructions
Structure the definitions a bit more like the other classes.

The main change here is to split EXP with the done bit set
to a separate opcode, so we can set mayLoad = 1 so that it won't
be reordered before the other exp stores, since this has the special
constraint that if the done bit is set then this should be the last
exp in she shader.

Previously all exp instructions were inferred to have unmodeled
side effects.

llvm-svn: 288695
2016-12-05 20:23:10 +00:00
Matt Arsenault 2712d4a3d8 AMDGPU: Select mulhi 24-bit instructions
llvm-svn: 279902
2016-08-27 01:32:27 +00:00
Wei Ding 07e03712d3 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

llvm-svn: 276998
2016-07-28 16:42:13 +00:00
Matt Arsenault 32fc527c65 AMDGPU: Add fp legacy instruction intrinsics
This could use some additional optimization work
to use mad/mac legacy.

llvm-svn: 276764
2016-07-26 16:45:45 +00:00
Matt Arsenault 03006fd3c4 AMDGPU: Only use legal inline immediates with kill pseudo
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.

These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.

llvm-svn: 275988
2016-07-19 16:27:56 +00:00
Matt Arsenault c96e1deffa AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32
llvm-svn: 275871
2016-07-18 18:35:05 +00:00
Matt Arsenault 9babdf4265 AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.

Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.

llvm-svn: 273467
2016-06-22 20:15:28 +00:00
Jan Vesely fbcb754e66 AMDGPU: Make CONST_DATA_PTR available to R600
Rename to AMDGPUconstdata_ptr

Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19786

llvm-svn: 269474
2016-05-13 20:39:18 +00:00
Tom Stellard 354a43c7bc AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}
Summary:
Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+.

32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý.

Patch by: Vedran Miletić

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: jvesely, scchan, kanarayan, arsenm

Differential Revision: http://reviews.llvm.org/D17280

llvm-svn: 265170
2016-04-01 18:27:37 +00:00
Matt Arsenault 79963e80b8 AMDGPU: Rename intrinsic to better match instruction name
Also fixes missing f32 test.

llvm-svn: 260780
2016-02-13 01:03:00 +00:00
Matt Arsenault f639c32739 AMDGPU: Match some med3 patterns
llvm-svn: 259089
2016-01-28 20:53:42 +00:00
Marek Olsak 8a0f335ad6 AMDGPU/SI: Add support for non-void functions
Summary:
Return values can be stored in SGPRs (i32) and VGPRs (f32).

This will be used by functions which expect some bytecode or other binary to
be appended at the end. It allows defining in which registers the return
values will be stored.

v2: don't do this for compute shaders

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16033

llvm-svn: 257621
2016-01-13 17:23:04 +00:00
Matt Arsenault de5fbe9c60 AMDGPU: Pattern match ffbh pattern to instruction.
The hardware instruction's output on 0 is -1 rather than 32.
Eliminate a test and select to -1. This removes an extra instruction
from the compatability function with HSAIL's firstbit instruction.

llvm-svn: 257352
2016-01-11 17:02:00 +00:00
Matt Arsenault d079285e05 AMDGPU: Use generic bitreverse intrinsic
Also fix bug in vector legalization for bitreverse.

llvm-svn: 255512
2015-12-14 17:25:38 +00:00
Tom Stellard 45bb48ea19 R600 -> AMDGPU rename
llvm-svn: 239657
2015-06-13 03:28:10 +00:00
Tom Stellard 1be1aa84ec Revert "AMDGPU: Add core backend files for R600/SI codegen v6"
This reverts commit 4ea70107c5e51230e9e60f0bf58a0f74aa4885ea.

llvm-svn: 160303
2012-07-16 18:19:53 +00:00
Tom Stellard bcce80fa95 AMDGPU: Add core backend files for R600/SI codegen v6
llvm-svn: 160270
2012-07-16 14:17:08 +00:00