Summary:
On FreeBSD11.0 the FileCheck NOT string "1.0" will be matched by
`.amd_amdgpu_isa "amdgcn-unknown-freebsd11.0--gfx802"` at the end of the
file. Add a CHECK for that directive to avoid failing the test.
Reviewers: rampitec, kzhuravl
Reviewed By: rampitec, kzhuravl
Subscribers: emaste, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, krytarowski
Differential Revision: https://reviews.llvm.org/D39306
llvm-svn: 316616
In a case when number of output constraint operands that has matched input operands
doesn't fit to signed char, TargetLowering::ParseConstraints() can try to access
ConstraintOperands (that is std::vector) with negative index.
Reviewers: rampitec, arsenm
Differential Review: https://reviews.llvm.org/D39125
llvm-svn: 316574
This code added in r297930 assumed that it could create
a select with a condition type that is just an integer
bitcast of the selected type. For AMDGPU any vselect is
going to be scalarized (although the vector types are legal),
and all select conditions must be i1 (the same as getSetCCResultType).
This logic doesn't really make sense to me, but there's
never really been a consistent policy in what the select
condition mask type is supposed to be. Try to extend
the logic for skipping the transform for condition types
that aren't setccs. It doesn't seem quite right to me though,
but checking conditions that seem more sensible (like whether the
vselect is going to be expanded) doesn't work since this
seems to depend on that also.
llvm-svn: 316554
This updates the MIRPrinter to include the regclass when printing
virtual register defs, which is already valid syntax for the
parser. That is, given 64 bit %0 and %1 in a "gpr" regbank,
%1(s64) = COPY %0(s64)
would now be written as
%1:gpr(s64) = COPY %0(s64)
While this change alone introduces a bit of redundancy with the
registers block, it allows us to update the tests to be more concise
and understandable and brings us closer to being able to remove the
registers block completely.
Note: We generally only print the class in defs, but there is one
exception. If there are uses without any defs whatsoever, we'll print
the class on all uses. I'm not completely convinced this comes up in
meaningful machine IR, but for now the MIRParser and MachineVerifier
both accept that kind of stuff, so we don't want to have a situation
where we can print something we can't parse.
llvm-svn: 316479
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.
Also allow kill in all shader stages.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D38544
llvm-svn: 316427
The range should be assumed to be the hardware maximum
if a workitem intrinsic is used in a callable function
which does not know the restricted limit of the calling
kernel.
llvm-svn: 316346
This converts a large and somewhat arbitrary set of tests to use
update_mir_test_checks. I ran the script on all of the tests I expect
to need to modify for an upcoming mir syntax change and kept the ones
that obviously didn't change the tests in ways that might make it
harder to understand.
llvm-svn: 316137
- Emit NT_AMD_AMDGPU_ISA
- Add assembler parsing for isa version directive
- If isa version directive does not match command line arguments, then return error
Differential Revision: https://reviews.llvm.org/D38748
llvm-svn: 315808
Summary:
The interpolation mode workaround ensures that at least one
interpolation mode is enabled in PSInputAddr. It does not also check
PSInputEna on the basis that the user might enable bits in that
depending on run-time state.
However, for amdpal os type, the user does not enable some bits after
compilation based on run-time states; the register values being
generated here are the final ones set in the hardware. Therefore, apply
the workaround to PSInputAddr and PSInputEnable together. (The case
where a bit is set in PSInputAddr but not in PSInputEnable is where the
frontend set up an input arg for a particular interpolation mode, but
nothing uses that input arg. Really we should have an earlier pass that
removes such an arg.)
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D37758
llvm-svn: 315591
- Move PAL metadata definitions to AMDGPUMetadata
- Make naming consistent with HSA metadata
Differential Revision: https://reviews.llvm.org/D38745
llvm-svn: 315523
- Rename AMDGPUCodeObjectMetadata to AMDGPUMetadata (PAL metadata will be included in this file in the follow up change)
- Rename AMDGPUCodeObjectMetadataStreamer to AMDGPUHSAMetadataStreamer
- Introduce HSAMD namespace
- Other minor name changes in function and test names
llvm-svn: 315522
opt-bisect/optnone disable the AMDGPUUniformAnnotateValues pass.
The heuristic in the custom selector for brcond deferred the
branch uniformity check to the pattern, which would fail.
llvm-svn: 315360
This patch adds a post-linking pass which replaces the function pointer of enqueued
block kernel with a global variable (runtime handle) and adds
runtime-handle attribute to the enqueued block kernel.
In LLVM CodeGen the runtime-handle metadata will be translated to
RuntimeHandle metadata in code object. Runtime allocates a global buffer
for each kernel with RuntimeHandel metadata and saves the kernel address
required for the AQL packet into the buffer. __enqueue_kernel function
in device library knows that the invoke function pointer in the block
literal is actually runtime handle and loads the kernel address from it
and puts it into AQL packet for dispatching.
This cannot be done in FE since FE cannot create a unique global variable
with external linkage across LLVM modules. The global variable with internal
linkage does not work since optimization passes will try to replace loads
of the global variable with its initialization value.
Differential Revision: https://reviews.llvm.org/D38610
llvm-svn: 315352
Summary:
See https://llvm.org/PR33743 for more details
It seems that for non-power of 2 vector sizes, the algorithm can produce
non-matching sizes for input and result causing an assert.
This usually isn't a problem as the isAnyExtend check will weed these out, but
in some cases (most often with lots of undefined values for the mask indices) it
can pass this check for non power of 2 vectors.
Adding in an extra check that ensures that bit size will match for the result
and input (as required)
Subscribers: nhaehnle
Differential Revision: https://reviews.llvm.org/D35241
llvm-svn: 315307
Summary:
Atomic buffer operations do not work (and trap on gfx9) when the
components are unaligned, even if their sum is aligned.
Previously, we generated an offset of 4156 without an SGPR by
splitting it as 4095 + 61 (immediate + inline constant). The
highest offset for which we can do this correctly is 4156 = 4092 + 64.
Fixes dEQP-GLES31.functional.ssbo.atomic.*
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D37850
llvm-svn: 315302
Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions.
This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions.
Passes OpenCL conformance test_integer_ops quick_[u]long_math
Differential Revision: https://reviews.llvm.org/D38607
llvm-svn: 315081
The inliner performs some kind of dead code elimination as it goes,
but there are cases that are not really caught by it. We might
at some point consider teaching the inliner about them, but it
is OK for now to run GlobalOpt + GlobalDCE in tandem as their
benefits generally outweight the cost, making the whole pipeline
faster.
This fixes PR34652.
Differential Revision: https://reviews.llvm.org/D38154
llvm-svn: 314997
Summary:
For the amdpal OS type:
We write an AMDGPU_PAL_METADATA record in the .note section in the ELF
(or as an assembler directive). It contains key=value pairs of 32 bit
ints. It is a merge of metadata from codegen of the shaders, and
metadata provided by the frontend as _amdgpu_pal_metadata IR metadata.
Where both sources have a key=value with the same key, the two values
are ORed together.
This .note record is part of the amdpal ABI and will be documented in
docs/AMDGPUUsage.rst in a future commit.
Eventually the amdpal OS type will stop generating the .AMDGPU.config
section once the frontend has safely moved over to using the .note
records above instead of .AMDGPU.config.
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D37753
llvm-svn: 314829