This wasn't necessary before since they are always enabled
for kernels, but this is necessary if they need to be
forwarded to a callable function.
llvm-svn: 308226
Summary:
Previously, CodeGen checked first src operand type to determine if omod is supported by instruction. This isn't correct for some instructions: e.g. V_CMP_EQ_F32 has floating-point src operands but desn't support omod.
Changed .td files to check if dst operand instead of src operand.
Reviewers: arsenm, vpykhtin
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D35350
llvm-svn: 308179
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.
Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.
t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t3: i16 = truncate t2
t5: i16 = AssertZext t3, ValueType:ch:i8
t6: i8 = truncate t5
t7: i32 = zero_extend t6
llvm-svn: 308082
In moveToVALU(), move to vector ALU is performed, all instrs in
the use chain will be visited. We do not want the same node to be
pushed to the visit worklist more than once.
Differential Revision: https://reviews.llvm.org/D34726
llvm-svn: 308039
Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that
is possible to further optimize fcanonicalize and remove it
if applied to min/max given their operands are known not to be
an sNaN or that sNaNs are not supported.
Additionally we can remove fcanonicalize if denorms are supported
for the VT and we know that its argument is never a NaN.
Differential Revision: https://reviews.llvm.org/D35335
llvm-svn: 307976
We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
That is possible to omit this multiplication if source of the
fcanonicalize instruction is known to be flushed/quieted, i.e.
if it comes from another instruction known to do the normalization
and we are using IEEE mode to quiet sNaNs.
Differential Revision: https://reviews.llvm.org/D35218
llvm-svn: 307848
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
llvm-svn: 307722
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.
Differential revision: https://reviews.llvm.org/D32536
llvm-svn: 307346
Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.
An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:
%div = fdiv fast float %v, %0, !fpmath !12!12 = !{float 2.500000e+00}
Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.
This change allows an "fdiv fast" to be always lowered as rcp + mul.
Differential Revision: https://reviews.llvm.org/D34844
llvm-svn: 307308
Summary:
During remat, some subranges might end up having invalid segments which caused problems for later
coalescing.
Added in a check to remove segments that are invalidated as part of the remat.
See http://llvm.org/PR33524
Subscribers: MatzeB, qcolombet
Differential Revision: https://reviews.llvm.org/D34391
llvm-svn: 307247
This is a short-term fix for PR33650 aimed to get the modules build bots green again.
Remove all the places where we use the LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR
macros to try to locally specialize a global template for a global type. That's
not how C++ works.
Instead, we now centrally define how to format vectors of fundamental types and
of string (std::string and StringRef). We use flow formatting for the former
cases, since that's the obvious right thing to do; in the latter case, it's
less clear what the right choice is, but flow formatting is really bad for some
cases (due to very long strings), so we pick block formatting. (Many of the
cases that were using flow formatting for strings are improved by this change.)
Other than the flow -> block formatting change for some vectors of strings,
this should result in no functionality change.
Differential Revision: https://reviews.llvm.org/D34907
Corresponding updates to clang, clang-tools-extra, and lld to follow.
llvm-svn: 306878
Given no NaNs and no signed zeroes it folds:
(fmul X, (select (fcmp X > 0.0), -1.0, 1.0)) -> (fneg (fabs X))
(fmul X, (select (fcmp X > 0.0), 1.0, -1.0)) -> (fabs X)
Differential Revision: https://reviews.llvm.org/D34579
llvm-svn: 306592
That is pretty common for clang to produce code like
(shl %x, (and %amt, 31)). In this situation we can still perform
trunc (shl) into shl (trunc) conversion given the known value
range of shift amount.
Differential Revision: https://reviews.llvm.org/D34723
llvm-svn: 306499
Depending on the compare code that can be either an argument of
sext or negate of it. This helps to avoid v_cndmask_b64 instruction
for sext. A reversed value can be further simplified and folded into
its parent comparison if possible.
Differential Revision: https://reviews.llvm.org/D34545
llvm-svn: 306446
Apparently this replacement can really be substituting the
same as the original register. Avoid restarting the loop
when there's been no change in the register uses.
llvm-svn: 306441
Also factored out function to check if a boolean is an already
deserialized value which does not require v_cndmask_b32 to be
loaded. Added binary logical operators to its check.
Differential Revision: https://reviews.llvm.org/D34500
llvm-svn: 306439
Summary:
1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it.
2. There were several problems with support of VOPC instructions in SDWA peephole pass.
Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl
Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye
Differential Revision: https://reviews.llvm.org/D34626
llvm-svn: 306413
Summary:
With scalar stores, M0 is clobbered and therefore marked as implicitly
defined. However, it is also dead.
This fixes an assertion when the Greedy Register Allocator decides to
optimize a spill/restore pair away again (via tryHintsRecoloring).
Reviewers: arsenm
Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D33319
llvm-svn: 306375
Remove invalid shortcut in fixupKills(): A register needs to be marked
live even when we are not adding a kill flag. This is because a
partially live register must not get a kill flags, but it still needs to
be fully marked live when walking backwards.
llvm-svn: 306352
Fixes bug 33597.
Use of substituteRegister in the tied operand case messes
up the register use iterator, causing some uses to be left
unprocessed.
llvm-svn: 306333
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.
Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.
llvm-svn: 306264
Intrinsic already existed for llvm.SI.tbuffer.store
Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.*
Added CodeGen tests for the 2 new variants added.
Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr
Differential Revision: https://reviews.llvm.org/D30687
llvm-svn: 306031
Summary:
Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA.
There are no real instructions for them, but there are pseudo instructions.
Reviewers: arsenm, vpykhtin, cfang
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D34403
llvm-svn: 305999
Summary:
Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers.
Added several subtarget features for GFX9 SDWA.
This diff also contains changes from D34026.
Depends D34026
Reviewers: vpykhtin, rampitec, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D34241
llvm-svn: 305986
If one of the arguments of adde/sube is zero we can fold another
add/sub into it.
Differential Revision: https://reviews.llvm.org/D34374
llvm-svn: 305964
This simplification allows to avoid generating v_cndmask_b32
to serialize condition code between compare and use.
Differential Revision: https://reviews.llvm.org/D34300
llvm-svn: 305962
If there is an immediate operand we shall not shrink V_SUBB_U32
and V_ADDC_U32, it does not fit e32 encoding.
Differential Revison: https://reviews.llvm.org/D34291
llvm-svn: 305840
Before it was possible to partially fold use instructions
before the defs. After the xor is folded into a copy, the same
mov can end up in the fold list twice, so on the second attempt
it will fail expecting to see a register to fold.
llvm-svn: 305821
This does some improvements/cleanup to the recently introduced
scavengeRegisterBackwards() functionality:
- Rewrite findSurvivorBackwards algorithm to use the existing
LiveRegUnit::accumulateBackward() code. This also avoids the Available
and Candidates bitset and just need 1 LiveRegUnit instance
(= 1 bitset).
- Pick registers in allocation order instead of register number order.
llvm-svn: 305817
The offset may not be an inline immediate, so this needs
to be materialized into a register. The post-RA run of
SIShrinkInstructions is able to fold it later if it can.
llvm-svn: 305761
It adds it for the target after inlining but before SROA where
we can get most out of it.
Differential Revision: https://reviews.llvm.org/D34366
llvm-svn: 305759
Re-apply r276044/r279124/r305516. Fixed a problem where we would refuse
to place spills as the very first instruciton of a basic block and thus
artifically increase pressure (test in
test/CodeGen/PowerPC/scavenging.mir:spill_at_begin)
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Differential Revision: http://reviews.llvm.org/D21885
llvm-svn: 305625
Revert because of reports of some PPC input starting to spill when it
was predicted that it wouldn't and no spillslot was reserved.
This reverts commit r305516.
llvm-svn: 305566
Re-apply r276044/r279124. Trying to reproduce or disprove the ppc64
problems reported in the stage2 build last time, which I cannot
reproduce right now.
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Differential Revision: http://reviews.llvm.org/D21885
llvm-svn: 305516
Summary:
Alloca promotion pass not dealing with non-canonical input
Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical)
Also added some test cases in non-canonical form to check that it no longer crashes
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D31710
llvm-svn: 305079
The V_MQSAD_PK_U16_U8, V_QSAD_PK_U16_U8, and V_MQSAD_U32_U8 take more than 1 pass in hardware. For these three instructions, the destination registers must be different than all sources, so that the first pass does not overwrite sources for the following passes.
Differential Revision: https://reviews.llvm.org/D33783
llvm-svn: 304998
If -simplify-mir option is passed then MIRPrinter will not print such fields.
This change also required some lit test cases in CodeGen directory to be changed.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D32304
llvm-svn: 304779
If a tied source operand was undef, it would be replaced but not
update the other tied operand, which would end up using different
virtual registers.
llvm-svn: 304747
Fixes bug #33302. Pass did not account that Src1 of max instruction
can be an immediate.
Differential Revision: https://reviews.llvm.org/D33884
llvm-svn: 304696
Remove dependency of SDWA pass on SIShrinkInstructions.
The goal is to move SDWA even higher in the stack to avoid second run
of MachineLICM, MachineCSE and SIFoldOperands.
Also added handling to preserve original src modifiers.
Differential Revision: https://reviews.llvm.org/D33860
llvm-svn: 304665
Summary:
These are mostly legal, but will probably need special lowering for some
cases.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D33791
llvm-svn: 304628
SIFoldOperands can commute operands even if no folding was done.
This change is to preserve IR is no folding was done.
Differential Revision: https://reviews.llvm.org/D33802
llvm-svn: 304625
-enable-si-insert-waitcnts=1 becomes the default
-enable-si-insert-waitcnts=0 to use old pass
Differential Revision: https://reviews.llvm.org/D33730
llvm-svn: 304551
- new waitcnt pass remains off by default; -enable-si-insert-waitcnts=1 to enable it
- fix handling of PERMUTE ops
- fix insertion of waitcnt instrs at function begin/end ( port of analogous code that was added to old waitcnt pass )
- add new test
Differential Revision: https://reviews.llvm.org/D33114
llvm-svn: 304311
An encoding does not allow to use SDWA in an instruction with
scalar operands, either literals or SGPRs. That is however possible
to copy these operands into a VGPR first.
Several copies of the value are produced if multiple SDWA conversions
were done. To cleanup MachineLICM (to hoist copies out of loops),
MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace
SGPR to VGPR copy with immediate copy right to the VGPR) runs are added
after the SDWA pass.
Differential Revision: https://reviews.llvm.org/D33583
llvm-svn: 304219
[AMDGPU] add intrinsic for s_getpc
Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.
Patch by Tim Corringham
llvm-svn: 304031
Rename the DEBUG_TYPE to match the names of corresponding passes where
it makes sense. Also establish the pattern of simply referencing
DEBUG_TYPE instead of repeating the passname where possible.
llvm-svn: 303921
This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045
Differential Revision: https://reviews.llvm.org/D33451
llvm-svn: 303691
Summary:
Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other.
As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage
related checking out and replace it after the calling convention checking.
Reviewer:
arsenm
Differential Revision:
http://reviews.llvm.org/D33139
llvm-svn: 303684
Perform DAG combine:
and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb
Where nb is a number of trailing zeroes in mask.
It replaces two instructions with two and BFE is generally a more
expensive one. However this is only done if we are selecting a byte
or word at an aligned boundary which results in a proper SDWA
operand pattern. It is only done if SDWA is supported.
TODO: improve SDWA pass to actually convert this pattern. It is not
done now because we have an immediate in the instruction, which has
be moved into a VGPR.
Differential Revision: https://reviews.llvm.org/D33455
llvm-svn: 303681
shl (or|add x, c2), c1 => or|add (shl x, c1), (c2 << c1)
This allows to fold a constant into an address in some cases as
well as to eliminate second shift if the expression is used as
an address and second shift is a result of a GEP.
Differential Revision: https://reviews.llvm.org/D33432
llvm-svn: 303641
Turn expensive 64 bit shift into 32 bit if shift does not overflow int:
shl (ext x) => zext (shl x)
Differential Revision: https://reviews.llvm.org/D33367
llvm-svn: 303569
pruneSubRegValues() needs to remove subregister ranges starting at
instructions that later get removed by eraseInstrs(). It missed to check
one case in which eraseInstrs() would remove an instruction.
Fixes http://llvm.org/PR32688
llvm-svn: 303396
Summary:
There should be no intesection between SDWA operands and potential MIs. E.g.:
```
v_and_b32 v0, 0xff, v1 -> src:v1 sel:BYTE_0
v_and_b32 v2, 0xff, v0 -> src:v0 sel:BYTE_0
v_add_u32 v3, v4, v2
```
In that example it is possible that we would fold 2nd instruction into 3rd (v_add_u32_sdwa) and then try to fold 1st instruction into 2nd (that was already destroyed). So if SDWAOperand is also a potential MI then do not apply it.
Reviewers: vpykhtin, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D32804
llvm-svn: 303347
Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as well.
llvm-svn: 303308
Avoids instructions to pack a vector when the source is really
a scalar being broadcast.
Also be smarter and look for per-component fneg.
Doesn't yet handle scalar from upper half of register
or other swizzles.
llvm-svn: 303291
Summary:
In SelectionDAG, when a store is immediately chained to another store
to the same address, elide the first store as it has no observable
effects. This is causes small improvements dealing with intrinsics
lowered to stores.
Test notes:
* Many testcases overwrite store addresses multiple times and needed
minor changes, mainly making stores volatile to prevent the
optimization from optimizing the test away.
* Many X86 test cases optimized out instructions associated with
associated with va_start.
* Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has
dependencies to check and can probably be removed and potentially
replaced with another test.
Reviewers: rnk, john.brawn
Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33206
llvm-svn: 303198
Summary:
We should not change volatile loads/stores in promoting alloca to vector.
Reviewers:
arsenm
Differential Revision:
http://reviews.llvm.org/D33107
llvm-svn: 302943
We don't use it and it was removed in gfx9, and the encoding
bit repurposed.
Additionally actually using it requires changing the output register
class, which wasn't done anyway.
llvm-svn: 302814
- MIParser: If the successor list is not specified successors will be
added based on basic block operands in the block and possible
fallthrough.
- MIRPrinter: Adds a new `simplify-mir` option, with that option set:
Skip printing of block successor lists in cases where the
parser is guaranteed to reconstruct it. This means we still print the
list if some successor cannot be determined (happens for example for
jump tables), if the successor order changes or branch probabilities
being unequal.
Differential Revision: https://reviews.llvm.org/D31262
llvm-svn: 302289
This reverts commit r300732. This breaks a few tests.
I think the problem is related to adding more uses of
the condition that don't yet exist at this point.
llvm-svn: 301242
In call sequence setups, there may not be a frame index base
and the pointer is a constant offset from the frame
pointer / scratch wave offset register.
llvm-svn: 301230
Merges equivalent initializations of M0 and hoists them into a common
dominator block. Technically the same code can be used with any
register, physical or virtual.
Differential Revision: https://reviews.llvm.org/D32279
llvm-svn: 301228
Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0.
This is fine for most targets. However for amdgcn target, the size of pointer in address space 0
depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is
32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target
triple environment. Therefore a hook is need in target lowering for getting the fence operand type.
This patch has no effect on targets other than amdgcn.
Differential Revision: https://reviews.llvm.org/D32186
llvm-svn: 301215
Fixes traps in any block besides the entry block,
and fixes depending on a live-in physical register
by using a virtual register copy.
Also happens to stop emitting a nop in the case
debug trap is not supported.
llvm-svn: 301206
Summary:
Fix a compiler bug when the lane select happens to end up in a VGPR.
Clarify the semantic of the corresponding intrinsic to be that of
the corresponding GLSL: the lane select must be uniform across a
wave front, otherwise results are undefined.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D32343
llvm-svn: 301197
SI_MASKED_UNREACHABLE does not have machine instruction encoding.
It needs special handling in AMDGPUAsmPrinter::EmitInstruction like some
other pseudo instructions.
This patch fixes compilation failure of RadeonRays.
Differential Revision: https://reviews.llvm.org/D32364
llvm-svn: 301025
Recently alloca address space has been added to data layout. Due to this
change, pointer returned by alloca may have different size as pointer in
address space 0.
However, currently the value type of frame index is assumed to be of the
same size as pointer in address space 0.
This patch fixes that.
Most targets assume alloca returning pointer in address space 0, which
is the default alloca address space. Therefore it is NFC for them.
AMDGCN target with amdgiz environment requires this change since it
assumes alloca returning pointer to addr space 5 and its size is 32,
which is different from the size of pointer in addr space 0 which is 64.
Differential Revision: https://reviews.llvm.org/D32021
llvm-svn: 300864
This is inserted directly in the text section. The relocation
for the function ends up resolving to the beginning of the
amd_kernel_code_t header rather than the actual function
entry point.
Also skip some of the comments for initialization
that only makes sense for kernels.
llvm-svn: 300736
The most common case for a branch condition is
a single use compare. Directly invert the branch
predicate rather than adding a lot of xor i1 true
which the DAG will have to fold later.
This produces nicer to read structurizer output.
This produces some random changes in codegen
due to the DAG swapping branch conditions itself,
and then does a poor job of dealing with those
inverts.
llvm-svn: 300732
Avoid looping through program to determine register counts.
This avoids needing to look at regmask operands.
Also fixes some counting errors with flat_scr when there
are no stack objects.
llvm-svn: 300482
If a kernel's pointer argument is known to be readonly
set access qualifier accordingly. This allows RT not to
flush caches before dispatches.
Differential Revision: https://reviews.llvm.org/D32091
llvm-svn: 300362
In many cases ds operations can be combined even if offsets do not
fit into 8 bit encoding. What it takes is to adjust base address.
Differential Revision: https://reviews.llvm.org/D31993
llvm-svn: 300227
If workgroup size is known inform llvm about range returned by local
id and local size queries.
Differential Revision: https://reviews.llvm.org/D31804
llvm-svn: 300102
Increase threshold to unroll a loop which contains an "if" statement
whose condition defined by a PHI belonging to the loop. This may help
to eliminate if region and potentially even PHI itself, saving on
both divergence and registers used for the PHI.
Add a small bonus for each of such "if" statements.
Differential Revision: https://reviews.llvm.org/D31693
llvm-svn: 299779
Summary:
Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled.
With this change order of passes will not change.
Reviewers: arsenm, vpykhtin, rampitec
Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D31705
llvm-svn: 299757
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.
Differential Revision: https://reviews.llvm.org/D30596
llvm-svn: 299723
This is possible in ways that are not compiler bugs,
so stop asserting on them.
This emits an extra error when emitting objects when it
can't encode the new pseudo, but I'm not sure that matters.
llvm-svn: 299712
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guarantied
to come to the same point at the same time.
Differential Revision: https://reviews.llvm.org/D31731
llvm-svn: 299659
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.
Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.
llvm-svn: 299246
Previously compiler often extracted common immediates into specific register, e.g.:
```
%vreg0 = S_MOV_B32 0xff;
%vreg2 = V_AND_B32_e32 %vreg0, %vreg1
%vreg4 = V_AND_B32_e32 %vreg0, %vreg3
```
Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands:
```
SDWA src: %vreg2 src_sel:BYTE_0
SDWA src: %vreg4 src_sel:BYTE_0
```
With this change peephole check if operand is either immediate or register that is copy of immediate.
llvm-svn: 299202
If set to false it does not remove global aliases. With this parameter
set to false it should be safe to run the pass before link.
Differential Revision: https://reviews.llvm.org/D31489
llvm-svn: 299108
This is less important than increase threshold for private memory,
but still brings performance improvements in a wide range of tests.
Unrolling more for local memory serves three purposes: it allows
to combine ds operations if offset becomes static, saves registers
used for offsets in case of static offsets, and allows better lds
latency hiding.
Differential Revision: https://reviews.llvm.org/D31412
llvm-svn: 298948
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.
Differential Revision: https://reviews.llvm.org/D31429
llvm-svn: 298935
Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0.
amdgiz is for non-OpenCL environment where generic address space is 0.
amdgizcl is for OpenCL environment where generic address space is 0.
Differential Revision: https://reviews.llvm.org/D31211
llvm-svn: 298758
If the branch condition for a loop was a phi which itself
was fed from a phi from a loop, it isn't safe to try
to delete the phi until after the loop is handled.
llvm-svn: 298737
StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.
llvm-svn: 298729
- Rename runtime metadata -> code object metadata
- Make metadata not flow
- Switch enums to use ScalarEnumerationTraits
- Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc
- Introduce in-memory representation for attributes
- Code object metadata streamer
- Create metadata for isa and printf during EmitStartOfAsmFile
- Create metadata for kernel during EmitFunctionBodyStart
- Finalize and emit metadata to .note during EmitEndOfAsmFile
- Other minor improvements/bug fixes
Differential Revision: https://reviews.llvm.org/D29948
llvm-svn: 298552
This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.
llvm-svn: 298452
Fix two problems related to r298025:
- SplitKit would create duplicate VNIs in some cases leading to crashs
when hoisting copies.
- VirtRegMap could fail expanding copies at the beginning of a basic
block.
This fixes http://llvm.org/PR32353
llvm-svn: 298448
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.
This fixes PR23277.
Differential Revision: https://reviews.llvm.org/D28494
llvm-svn: 298430