Commit Graph

135 Commits

Author SHA1 Message Date
Matt Arsenault a5840c3c39 Codegen support for atomicrmw fadd/fsub
llvm-svn: 351851
2019-01-22 18:36:06 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Marek Olsak c5cec5e1fa AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52944

llvm-svn: 351351
2019-01-16 15:43:53 +00:00
Matt Arsenault 687ec75d10 DAG: Change behavior of fminnum/fmaxnum nodes
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.

There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.

llvm-svn: 344914
2018-10-22 16:27:27 +00:00
Matt Arsenault ab41193312 AMDGPU: Expand atomicrmw nand in IR
llvm-svn: 343559
2018-10-02 03:50:56 +00:00
Matt Arsenault 0da6350dc8 AMDGPU: Remove remnants of old address space mapping
llvm-svn: 341165
2018-08-31 05:49:54 +00:00
Tim Renouf 904343f879 [AMDGPU] Add support for multi-dword s.buffer.load intrinsic
Summary:
Patch by Marek Olsak and David Stuttard, both of AMD.

This adds a new amdgcn intrinsic supporting s.buffer.load, in particular
multiple dword variants. These are convenient to use from some front-end
implementations.

Also modified the existing llvm.SI.load.const intrinsic to common up the
underlying implementation.

This modification also requires that we can lower to non-uniform loads correctly
by splitting larger dword variants into sizes supported by the non-uniform
versions of the load.

V2: Addressed minor review comments.
V3: i1 glc is now i32 cachepolicy for consistency with buffer and
    tbuffer intrinsics, plus fixed formatting issue.
V4: Added glc test.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D51098

Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b
llvm-svn: 340684
2018-08-25 14:53:17 +00:00
Matt Arsenault bb8e64e7f5 AMDGPU: Fix not respecting byval alignment in call frame setup
This was hackily adding in the 4-bytes reserved for the callee's
emergency stack slot. Treat it like a normal stack allocation
so we get the correct alignment padding behavior. This fixes
an inconsistency between the caller and callee.

llvm-svn: 340396
2018-08-22 11:09:45 +00:00
Matt Arsenault 7121bed210 AMDGPU: Custom lower fexp
This will allow the library to just use __builtin_expf directly
without expanding this itself. Note f64 still won't work because
there is no exp instruction for it.

llvm-svn: 339902
2018-08-16 17:07:52 +00:00
Matt Arsenault 6c7ba82900 AMDGPU: Address todo for handling 1/(2 pi)
llvm-svn: 339814
2018-08-15 21:03:55 +00:00
Matt Arsenault c3dc8e65e2 DAG: Enhance isKnownNeverNaN
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.

Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.

llvm-svn: 338910
2018-08-03 18:27:52 +00:00
Matt Arsenault 4bec7d4261 Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"
Reverts r337079 with fix for msan error.

llvm-svn: 337535
2018-07-20 09:05:08 +00:00
Farhana Aleen c370d7b33d [AMDGPU] [AMDGPU] Support a fdot2 pattern.
Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z))
                   -> fdot2((v2f16)S0, (v2f16)S1, (float)z)

Author: FarhanaAleen

Reviewed By: rampitec, b-sumner

Subscribers: AMDGPU

Differential Revision: https://reviews.llvm.org/D49146

llvm-svn: 337198
2018-07-16 18:19:59 +00:00
Evgeniy Stepanov 1971ba097d Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"
This reverts commit r337021.

WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7
    #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3
    #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3
    #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18
    #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23
    #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5
    #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5
    #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3
    #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26

[...]

Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv'
    #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192

llvm-svn: 337079
2018-07-14 01:20:53 +00:00
Matt Arsenault de95077780 AMDGPU: Fix handling of alignment padding in DAG argument lowering
This was completely broken if there was ever a struct argument, as
this information is thrown away during the argument analysis.

The offsets as passed in to LowerFormalArguments are not useful,
as they partially depend on the legalized result register type,
and they don't consider the alignment in the first place.

Ignore the Ins array, and instead figure out from the raw IR type
what we need to do. This seems to fix the padding computation
if the DAG lowering is forced (and stops breaking arguments
following padded arguments if the arguments were only partially
lowered in the IR)

llvm-svn: 337021
2018-07-13 16:40:25 +00:00
Tom Stellard 5bfbae5cb1 AMDGPU: Refactor Subtarget classes
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
  AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

llvm-svn: 336851
2018-07-11 20:59:01 +00:00
Tom Stellard c5a154db48 AMDGPU: Separate R600 and GCN TableGen files
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc.  This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself.  This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

llvm-svn: 335942
2018-06-28 23:47:12 +00:00
Matt Arsenault 75e7192ba3 AMDGPU: Remove MFI::ABIArgOffset
We have too many mechanisms for tracking the various offsets
used for kernel arguments, so remove one. There's still a lot of
confusion with these because there are two different "implicit"
argument areas located at the beginning and end of the kernarg
segment.

Additionally, the offset was determined based on the memory
size of the split element types. This would break in a future
commit where v3i32 is decomposed into separate i32 pieces.

llvm-svn: 335830
2018-06-28 10:18:55 +00:00
Stanislav Mekhanoshin 1a1687f1bb [AMDGPU] Convert rcp to rcp_iflag
If a source of rcp instruction is a result of any conversion from
an integer convert it into rcp_iflag instruction. No FP exception
can ever happen except division by zero if a single precision rcp
argument is a representation of an integral number.

Differential Revision: https://reviews.llvm.org/D48569

llvm-svn: 335742
2018-06-27 15:33:33 +00:00
Nicolai Haehnle db6911a6f9 AMDGPU: Remove old-style image intrinsics
Summary:
This also removes the need for atomic pseudo instructions, since
we select the correct encoding directly in SITargetLowering::lowerImage
for dimension-aware image intrinsics.

Mesa uses dimension-aware image intrinsics since
commit a9a7993441.

Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a

Reviewers: arsenm, rampitec, mareko, tpr, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48167

llvm-svn: 335231
2018-06-21 13:37:45 +00:00
Tom Stellard 264c171f36 AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering
Summary:
The code that handles ISD:Register and ISD::CopyFromReg assumes
the target is amdgcn, so this is broken on r600.  We don't
need this analysis on r600 anyway so we can safely move
it to SITargetLowering.

Reviewers: alex-t, arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46298

llvm-svn: 334607
2018-06-13 15:06:37 +00:00
Stanislav Mekhanoshin 8fd3c4e431 [AMDGPU] DAG combine to produce V_PERM_B32
Differential Revision: https://reviews.llvm.org/D48099

llvm-svn: 334559
2018-06-12 23:50:37 +00:00
Tom Stellard 1b95fed6f7 AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP
Summary:
We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp
is gone.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47181

llvm-svn: 333153
2018-05-24 05:28:34 +00:00
Tom Stellard b12f4dec08 AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLowering
Summary: This is always false for R600.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47180

llvm-svn: 333016
2018-05-22 19:37:55 +00:00
Matt Arsenault 67a9815a5c AMDGPU: Custom lower v4i16/v4f16 vector operations
Avoids stack access.

Also handle extract hi elt pattern from truncate + shift
to avoid a couple test regressions.

llvm-svn: 332453
2018-05-16 11:47:30 +00:00
Matt Arsenault 762d498808 AMDGPU: Add combine for trunc of bitcast from build_vector
If the truncate is only accessing the first element of the vector,
we can use the original source value.

This helps with some combine ordering issues after operations are
lowered to integer operations between bitcasts of build_vector.
In particular it stops unnecessarily materializing the unused
top half of a vector in some cases.

llvm-svn: 331909
2018-05-09 18:37:39 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Matt Arsenault 4cc0b85276 AMDGPU: Fix build warning about override
llvm-svn: 326713
2018-03-05 16:25:10 +00:00
Alexander Timofeev 2e5eeceeb7 Pass Divergence Analysis data to Selection DAG to drive divergence
dependent instruction selection.

Differential revision: https://reviews.llvm.org/D35267

llvm-svn: 326703
2018-03-05 15:12:21 +00:00
Marek Olsak 13e4741275 AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D41663

llvm-svn: 323908
2018-01-31 20:18:04 +00:00
Changpeng Fang 4737e892de AMDGPU/SI: Add d16 support for image intrinsics.
Summary:
  This patch implements d16 support for image load, image store and image sample intrinsics.

Reviewers:
  Matt, Brian.

Differential Revision:
  https://reviews.llvm.org/D3991

llvm-svn: 322903
2018-01-18 22:08:53 +00:00
Daniil Fukalov d5fca554e2 [AMDGPU] add LDS f32 intrinsics
added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics
to allow generate ds_{add|min|max}[_rtn]_f32 instructions
needed for OpenCL float atomics in LDS

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D37985

llvm-svn: 322656
2018-01-17 14:05:05 +00:00
Changpeng Fang 44dfa1de3b AMDGPU/SI: Add d16 support for buffer intrinsics.
Differential Revision:
  https://reviews.llvm.org/D38906

Reviewers:
  Matt and Brian.

llvm-svn: 322402
2018-01-12 21:12:19 +00:00
Mark Searles e4f067ebe2 [AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for AMDGPU. Commit dbbb6c5fc3642987430866dffdf710df4f616ac7 turned on MergeConsecutiveStores() before Instruction Selection for all targets. Enough AMDGPU compiles go into an infinite loop ( MergeConsecutiveStores() merges two stores; LegalizeStoreOps() un-merges; MergeConsecutiveStores() re-merges, etc. ) to warrant turning it off until the issues can be addressed.
Differential Revision: https://reviews.llvm.org/D41377

llvm-svn: 321100
2017-12-19 19:26:23 +00:00
Vedran Miletic ad21f2687d [AMDGPU] Add custom lowering for llvm.log{,10}.{f16,f32} intrinsics
AMDGPU backend errors with "unsupported call to function" upon
encountering a call to llvm.log{,10}.{f16,f32} intrinsics. This patch
adds custom lowering to avoid that error on both R600 and SI.

Reviewers: arsenm, jvesely

Subscribers: tstellar

Differential Revision: https://reviews.llvm.org/D29942

llvm-svn: 319025
2017-11-27 13:26:38 +00:00
David Blaikie b3bde2ea50 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Marek Olsak 5cec64195c AMDGPU: Lower buffer store and atomic intrinsics manually
Summary:
Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every
buffer store and atomic instruction.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39060

llvm-svn: 317754
2017-11-09 01:52:48 +00:00
Matt Arsenault 6119f80034 AMDGPU: Remove redundant combine
This combine was already done in two places. The
generic combiner already has done this since
r217610, for adds (with a single use).

This one was added in r303641, and added support for handling
or as well. r313251 later added support to the generic
combine for or. It also turns out the isOrEquivalentToAdd
check is not necessary for this combine.

Additionally, we already reproduce this combine in yet
another place in the backend, although in that version
multiple uses of the add are still folded if it will
allow a fold into the addressing mode. That version needs
to be improved to understand ors though, as well as the
correct legal offsets for private.

llvm-svn: 317526
2017-11-07 00:06:32 +00:00
Matt Arsenault 4f6318fe1b AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32
llvm-svn: 317492
2017-11-06 17:04:37 +00:00
Matt Arsenault 4d70754e3c AMDGPU: Implement isFPExtFoldable
This helps match v_mad_mix* in some cases.

llvm-svn: 315744
2017-10-13 20:18:59 +00:00
Wei Ding 5676acad9e Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
Differential Revision: http://reviews.llvm.org/D37348

llvm-svn: 315610
2017-10-12 19:37:14 +00:00
Matt Arsenault 71bcbd451f AMDGPU: Start adding tail call support
Handle the sibling call cases.

llvm-svn: 310753
2017-08-11 20:42:08 +00:00
Matt Arsenault a176cc5b93 AMDGPU: Don't use report_fatal_error for unsupported call types
llvm-svn: 310004
2017-08-03 23:32:41 +00:00
Matt Arsenault 8623e8d864 AMDGPU: Pass special input registers to functions
llvm-svn: 309998
2017-08-03 23:00:29 +00:00
Matt Arsenault b34635550a AMDGPU: Return correct type during argument lowering
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.

Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.

t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t3: i16 = truncate t2
t5: i16 = AssertZext t3, ValueType:ch:i8
t6: i8 = truncate t5
t7: i32 = zero_extend t6
llvm-svn: 308082
2017-07-15 05:52:59 +00:00
David Stuttard 70e8bc1bf3 [AMDGPU] Add intrinsics for tbuffer load and store
Intrinsic already existed for llvm.SI.tbuffer.store

Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.*

Added CodeGen tests for the 2 new variants added.
Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr

Differential Revision: https://reviews.llvm.org/D30687

llvm-svn: 306031
2017-06-22 16:29:22 +00:00
Matt Arsenault e0e68a757e AMDGPU: Cleanup CreateLiveInRegister
llvm-svn: 305748
2017-06-19 21:52:45 +00:00
Stanislav Mekhanoshin a96ec3f360 [AMDGPU] Convert shl (add) into add (shl)
shl (or|add x, c2), c1 => or|add (shl x, c1), (c2 << c1)
This allows to fold a constant into an address in some cases as
well as to eliminate second shift if the expression is used as
an address and second shift is a result of a GEP.

Differential Revision: https://reviews.llvm.org/D33432

llvm-svn: 303641
2017-05-23 15:59:58 +00:00
Matt Arsenault 2b1f9aa577 AMDGPU: Start defining a calling convention
Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as well.

llvm-svn: 303308
2017-05-17 21:56:25 +00:00
Matt Arsenault bf5482e4bb AMDGPU: Pull fneg out of extract_vector_elt
This allows folding source modifiers in more f16 cases.
Makes it easier to select per-component packed neg modifiers.

llvm-svn: 302813
2017-05-11 17:26:25 +00:00