Summary:
Somehow, I first interpreted the docs as saying space for xnack_mask is only
reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and
went back to actually test what is happening, and it turns out that xnack_mask
is always reserved at least on Tonga and Carrizo, in the sense that flat_scr
is always fixed below the SGPRs that are used to implement xnack_mask, whether
or not they are actually used.
I confirmed this by writing a shader using inline assembly to tease out the
aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where
we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so
xnack_mask is s[76:77] and vcc is s[78:79]).
This patch changes both the calculation of the total number of SGPRs and the
various register reservations to account for this.
It ought to be possible to use the gap left by xnack_mask when the feature
isn't used, but this patch doesn't try to do that. (Note that the same applies
to vcc.)
Note that previously, even before my earlier change in r256794, the SGPRs that
alias to xnack_mask could end up being used as well when flat_scr was unused
and the total number of SGPRs happened to fall on the right alignment
(e.g. highest regular SGPR being used s29 and VCC used would lead to number
of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there
were some conflict due to such aliasing, we should have noticed that already.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15898
llvm-svn: 257073
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur.
Follow up to D15310
llvm-svn: 257055
Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS.
llvm-svn: 257046
Darwin TLS accesses most closely resemble ELF's general-dynamic situation,
since they have to be able to handle all possible situations. The descriptors
and so on are obviously slightly different though.
llvm-svn: 257039
Unlike my comment in 257022 said, it turns out we do handle constant vectors in the statepoint lowering, but only because SelectionDAG doesn't actually produce constants for them. Add a couple of tests which show this working.
Also, add a triple to the same test file to hopefully fix a failing bot.
It turns out we do han
llvm-svn: 257025
Currently, we try to split vectors of pointers back into their component pointer elements during rewrite-statepoints-for-gc. This is less than ideal since presumably the vectorizer chose to vectorize for a reason. :) It's also been a source of bugs - in particular, the relocation logic as currently implemented was recently discovered to be wrong.
The alternate approach is to allow gc.relocates of vector-of-pointer type and update the backend to handle them. That's what this patch tries to do. This won't actually enable vector-of-pointers in practice - there are some RS4GC changes needed - but the lowering is standalone and testable so it makes sense to separate.
Note that there are some known cases around vector constants which this patch does not handle. Once this is in, I'll send another patch with individual fixes and test cases.
Differential Revision: http://reviews.llvm.org/D15632
llvm-svn: 257022
We need to know whether or not a given basic block is in a loop for the analysis
to be correct.
Loop information may be incomplete on irreducible CFGs, therefore we may
generate incorrect code if we use it in those situations.
This fixes PR25988.
llvm-svn: 257012
Summary:
This is admittedly something that you could only run into by manually
playing around with shader assembly because the SITypeWriter pass is
skipped for compute.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15902
llvm-svn: 256980
FindIDom() can fail in two different ways - it can either return nullptr or the
block itself, depending on the circumstances. Some users of FindIDom() check
one error condition, while others check the other.
Change it to always return nullptr on failure.
This fixes PR26004.
Differential Revision: http://reviews.llvm.org/D15847
llvm-svn: 256955
The first instruction in a block is what the rend() iterator points to, so
if it moves, we need to re-evaluate rend() so that we continue to iterate
through the rest of the instructions.
llvm-svn: 256953
Summary:
In buildSchedGraph(), when adding memory dependencies for loads, move
the call to adjustChainDeps() after the call to
addChainDependency(AliasChain) to handle the case where
addChainDependency(AliasChain) ends up not adding a dependency and
instead putting the SU on the RejectMemNodes list. The call to
adjustChainDeps() must be done after the call to addChainDependency() in
order to process the SU added to the RejectMemNodes list to create
memory dependencies for it.
Reviewers: hfinkel, atrick, jonpa, resistor
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D15927
llvm-svn: 256950
In general, disabling comments in the output reduces the chances of a
CHECK line accidentally matching a comment instead of its intended text.
llvm-svn: 256946
In an inbounds getelementptr, when an index produces a constant non-negative
offset to add to the base, the add can be assumed to not have unsigned overflow.
This relies on the assumption that addresses can't occupy more than half the
address space, which isn't possible in C because it wouldn't be possible to
represent the difference between the start of the object and one-past-the-end
in a ptrdiff_t.
Setting the NoUnsignedWrap flag is theoretically useful in general, and is
specifically useful to the WebAssembly backend, since it permits stronger
constant offset folding.
Differential Revision: http://reviews.llvm.org/D15544
llvm-svn: 256890
Due to the SGPR init bug, every program claims to use the same number
of SGPRs anyway, so there's no point in trying to shift those registers
down from their initial spot of reservation.
Add a test that uses VGPR spilling and blocks most SGPRs from being used for
the scratch resource register. Previously, this would run into an assertion.
Differential Revision: http://reviews.llvm.org/D15724
llvm-svn: 256870
The red zone consists of 128 bytes beyond the stack pointer so that the
allocation of objects in leaf functions doesn't require decrementing
rsp. In r255656, we introduced an optimization that would cheaply
materialize certain constants via push/pop. Push decrements the stack
pointer and stores it's result at what is now the top of the stack.
However, this means that using push/pop would encroach on the red zone.
PR26023 gives an example where this corrupts an object in the red zone.
llvm-svn: 256808
Unfortunately this fix had the effect of exposing the
-verify-machineinstrs FIXME of X86InstrInfo.cpp in two testcases for
which I disabled it for now.
Two testcases also have additional pushq/popq where the corrected code
cannot prove that %rax is dead any longer. Looking at the examples, this
could potentially be fixed by improving computeRegisterLiveness() to check
the live-in lists of the successors blocks when reaching the end of a
block.
This fixes http://llvm.org/PR25951.
llvm-svn: 256799
Summary:
Enabling this feature will account for the two SGPRs used by the hardware
to store the XNACK_MASK physically.
The hardware only requires this reservation when the XNACK feature is
explicitly enabled. At some point, HSA will probably want to do that, but
it does increase SGPR register pressure, so leave it disabled by default
for now (but do add a small test).
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15869
llvm-svn: 256794
Summary:
Fix the CLR state numbering to generate correct tables, and update the lit
test to verify them.
The CLR numbering assigns one state number to each catchpad and
cleanuppad.
It also computes two tree-like relations over states:
1) Each state has a "HandlerParentState", which is the state of the next
outer handler enclosing this state's handler (same as nearest ancestor
per the ParentPad linkage on EH pads, but skipping over catchswitches).
2) Each state has a "TryParentState", which:
a) for a catchpad that's not the last handler on its catchswitch, is
the state of the next catchpad on that catchswitch.
b) for all other pads, is the state of the pad whose try region is the
next outer try region enclosing this state's try region. The "try
regions are not present as such in the IR, but will be inferred
based on the placement of invokes and pads which reach each other
by exceptional exits.
Catchswitches do not get their own states, but each gets mapped to the
state of its first catchpad.
Table generation requires each state's "unwind dest" state to have a lower
state number than the given state.
Since HandlerParentState can be computed as a function of a pad's
ParentPad, and TryParentState can be computed as a function of its unwind
dest and the TryParentStates of its children, the CLR state numbering
algorithm first computes HandlerParentState in a top-down pass, then
computes TryParentState in a bottom-up pass.
Also reword some comments/names in the CLR EH table generation to make the
distinction between the different kinds of "parent" clear.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: AndyAyers, llvm-commits
Differential Revision: http://reviews.llvm.org/D15325
llvm-svn: 256760
Summary:
There are a number of files in the tree which have been accidentally checked in with DOS line endings. Convert these to native line endings.
There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those.
Reviewers: joerg, aaron.ballman
Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D15848
llvm-svn: 256707
Summary:
Add a pass to update catchrets when their successors get cloned; the
existing pass doesn't catch these because it walks the funclet whose
blocks are being cloned but the catchret is in a child funclet.
Also update the test for removing incoming PHI values; when the
predecessor is a catchret, the relevant color is the catchret's parentPad,
not its block's color.
Reviewers: andrew.w.kaylor, rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15840
llvm-svn: 256689
LLVM's targets need to know if stack pointer adjustments occur after the
prologue. This is needed to correctly determine if the red-zone is
appropriate to use or if a frame pointer is required.
Normally, LLVM can figure this out very precisely by reasoning about the
contents of the MachineFunction. There is an interesting corner case:
inline assembly.
The vast majority of inline assembly which will perform a push or pop is
done so to pair up with pushf or popf as appropriate. Unfortunately,
this inline assembly doesn't mark the stack pointer as clobbered
because, well, it isn't. The stack pointer is decremented and then
immediately incremented. Because of this, LLVM was changed in r256456
to conservatively assume that inline assembly contain a sequence of
stack operations. This is unfortunate because the vast majority of
inline assembly will not end up manipulating the stack pointer in any
way at all.
Instead, let's provide a more principled solution: an intrinsic.
FWIW, other compilers (MSVC and GCC among them) also provide this
functionality as an intrinsic.
llvm-svn: 256685
Not folding these cases tends to avoid partial register updates:
sqrtss (%eax), %xmm0
Has a partial update of %xmm0, while
movss (%eax), %xmm0
sqrtss %xmm0, %xmm0
Has a clobber of the high lanes immediately before the partial update,
avoiding a potential stall.
Given this, we only want to fold when optimizing for size.
This is consistent with the patterns we already have for some of
the fp/int converts, and in X86InstrInfo::foldMemoryOperandImpl()
Differential Revision: http://reviews.llvm.org/D15741
llvm-svn: 256671