Summary:
This change adds two new parameters to the statepoint intrinsic, `i64 id`
and `i32 num_patch_bytes`. `id` gets propagated to the ID field
in the generated StackMap section. If the `num_patch_bytes` is
non-zero then the statepoint is lowered to `num_patch_bytes` bytes of
nops instead of a call (the spill and reload code remains unchanged).
A non-zero `num_patch_bytes` is useful in situations where a language
runtime requires complete control over how a call is lowered.
This change brings statepoints one step closer to patchpoints. With
some additional work (that is not part of this patch) it should be
possible to get rid of `TargetOpcode::STATEPOINT` altogether.
PlaceSafepoints generates `statepoint` wrappers with `id` set to
`0xABCDEF00` (the old default value for the ID reported in the stackmap)
and `num_patch_bytes` set to `0`. This can be made more sophisticated
later.
Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9546
llvm-svn: 237214
x86 Windows uses the '_' prefix for all global symbols, and this was
mistakenly being applied to frameescape labels, which are not externally
visible global symbols. They use the private global prefix 'L'.
The *right* way to fix this is probably to stop masquerading this label
as an ExternalSymbol and create a new SDNode type. These labels are not
"external", and we know they will be resolved by assembly time. Having a
custom SDNode type would allow us to do better X86 address mode
matching, so it's probably worth doing eventually.
llvm-svn: 236123
X86 backend.
The code generated for symbolic targets is identical to the code generated for
constant targets, except that a relocation is emitted to fix up the actual
target address at link-time. This allows IR and object files containing
patchpoints to be cached across JIT-invocations where the target address may
change.
llvm-svn: 235483
MSDN's x64 software conventions page says that this is one of the fixed
list of legal epilogues:
https://msdn.microsoft.com/en-us/library/tawsa7cb.aspx
Presumably this is how the unwinder distinguishes epilogue jumps from
in-function control flow.
Also normalize the way we place "## TAILCALL" comments on such jumps.
llvm-svn: 227611
derived classes.
Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.
*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.
llvm-svn: 227113
It's possible for the constant pool entry for the shuffle mask to come
from a completely different operation. This occurs when Constants have
the same bit pattern but have different types.
Make DecodePSHUFBMask tolerant of types which, after a bitcast, are
appropriately sized vector types.
This fixes PR22188.
llvm-svn: 225597
Overall this seems simpler. It reduces duplication of patterns between both modes and it simplifies the memory folding/unfolding tables as they don't need to create fake instructions just to keep track of 64-bitness.
llvm-svn: 225252
The assembler backend will relax to the long form if necessary. This removes a swap from long form to short form in the MCInstLowering code. Selecting the long form used to be required by the old JIT.
llvm-svn: 225242
This is the second patch in a small series. This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints. It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code. I will be submitting the SelectionDAG parts within the next 24-48 hours. Since those pieces are by far the most complicated, I wanted to minimize the size of that patch. That patch will include the tests which exercise the functionality in this patch. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.
The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots. The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes. The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT. (Doing so would break relocation semantics for collectors which wish to relocate roots.)
The implementation of STATEPOINT is closely modeled on PATCHPOINT. Eventually, much of the code in this patch will be removed. The long term plan is to merge the functionality provided by statepoints and patchpoints. Merging their implementations in the backend is likely to be a good starting point.
Reviewed by: atrick, ributzka
llvm-svn: 223085
For a call to not return in to the stackmap shadow, the shadow must end with the call.
To do this, we must insert any required nops *before* the call, and not after it.
llvm-svn: 220728
To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets.
A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return.
llvm-svn: 220710
Every target we support has support for assembly that looks like
a = b - c
.long a
What is special about MachO is that the above combination suppresses the
production of a relocation.
With this change we avoid producing the intermediary labels when they don't
add any value.
llvm-svn: 220256
lowering.
This also implements the fancy blend lowering for v16i16 using AVX2 and
teaches the X86 backend to print shuffle masks for 256-bit PSHUFB
and PBLENDW instructions. It also makes the mask decoding correct for
PBLENDW instructions. The yaks, they are legion.
Tests are updated accordingly. There are some missing tests for the
VBLENDVB lowering, but I'll add those in a follow-up as this commit has
accumulated enough cruft already.
llvm-svn: 218430
pool data being loaded into a vector register.
The comments take the form of:
# ymm0 = [a,b,c,d,...]
# xmm1 = <x,y,z...>
The []s are used for generic sequential data and the <>s are used for
specifically ConstantVector loads. Undef elements are printed as the
letter 'u', integers in decimal, and floating point values as floating
point values. Suggestions on improving the formatting or other aspects
of the display are very welcome.
My primary use case for this is to be able to FileCheck test masks
passed to vector shuffle instructions in-register. It isn't fantastic
for that (no decoding special zeroing semantics or other tricks), but it
at least puts the mask onto an instruction line that could reasonably be
checked. I've updated many of the new vector shuffle lowering tests to
leverage this in their test cases so that we're actually checking the
shuffle masks remain as expected.
Before implementing this, I tried a *bunch* of different approaches.
I looked into teaching the MCInstLower code to scan up the basic block
and find a definition of a register used in a shuffle instruction and
then decode that, but this seems incredibly brittle and complex.
I talked to Hal a lot about the "right" way to do this: attach the raw
shuffle mask to the instruction itself in some form of unencoded
operands, and then use that to emit the comments. I still think that's
the optimal solution here, but it proved to be beyond what I'm up for
here. In particular, it seems likely best done by completing the
plumbing of metadata through these layers and attaching the shuffle mask
in metadata which could have fully automatic dropping when encoding an
actual instruction.
llvm-svn: 218377
attempt didn't work out so well. It looks like it will be much better
for introducing extra logic to find a shuffle mask if the finding logic
is totally separate. This also makes it easy to sink the opcode logic
completely out of the routine so we don't re-dispatch across it.
Still no functionality changed.
llvm-svn: 218363
asm. This can be somewhat expensive and there is no reason to do it
outside of tests or debugging sessions. I'm also likely to make it
significantly more expensive to support more styles of shuffles.
llvm-svn: 218362
from the MachineInstr into the caller which is already doing a switch
over the instruction.
This will make it more clear how to compute different operands to feed
the comment selection for example.
Also, in a drive-by-fix, don't append an empty comment string (which is
a no-op ultimately).
No functionality changed.
llvm-svn: 218361
vector shuffles.
This is just the beginning by hoisting it into its own function and
making use of early exit to dramatically simplify the flow of the
function. I'm going to be incrementally refactoring this until it is
a bit less magical how this applies to other instructions, and I can
teach it how to dig a shuffle mask out of a register. Then I plan to
hook it up to VPERMD so we get our mask comments for it.
No functionality changed yet.
llvm-svn: 218357
undef in the shuffle mask. This shows up when we're printing comments
during lowering and we still have an IR-level constant hanging around
that models undef.
A nice consequence of this is *much* prettier test cases where the undef
lanes actually show up as undef rather than as a particular set of
values. This also allows us to print shuffle comments in cases that use
undef such as the recently added variable VPERMILPS lowering. Now those
test cases have nice shuffle comments attached with their details.
The shuffle lowering for PSHUFB has been augmented to use undef, and the
shuffle combining has been augmented to comprehend it.
llvm-svn: 218301
trick that I missed.
VPERMILPS has a non-immediate memory operand mode that allows it to do
asymetric shuffles in the two 128-bit lanes. Use this rather than two
shuffles and a blend.
However, it turns out the variable shuffle path to VPERMILPS (and
VPERMILPD, although that one offers no functional differenc from the
immediate operand other than variability) wasn't even plumbed through
codegen. Do such plumbing so that we can reasonably emit
a variable-masked VPERMILP instruction. Also plumb basic comment parsing
and printing through so that the tests are reasonable.
There are still a few tests which don't show the shuffle pattern. These
are tests with undef lanes. I'll teach the shuffle decoding and printing
to handle undef mask entries in a follow-up. I've looked at the masks
and they seem reasonable.
llvm-svn: 218300
The only valid lowering of atomic stores in the X86 backend was mov from
register to memory. As a result, storing an immediate required a useless copy
of the immediate in a register. Now these can be compiled as a simple mov.
Similarily, adding/and-ing/or-ing/xor-ing an
immediate to an atomic location (but through an atomic_store/atomic_load,
not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)'
instead of using a register. And the same applies to inc/dec.
This second point matches the first issue identified in
http://llvm.org/bugs/show_bug.cgi?id=17281
llvm-svn: 216980
When the last instruction prior to a function epilogue is a call, we
need to emit a nop so that the return address is not in the epilogue IP
range. This is consistent with MSVC's behavior, and may be a workaround
for a bug in the Win64 unwinder.
Differential Revision: http://reviews.llvm.org/D4751
Patch by Vadim Chugunov!
llvm-svn: 214775
instructions which happen to have a constant mask.
Currently, this only handles a very narrow set of cases, but those
happen to be the cases that I care about for testing shuffles sanely.
This is a bit trickier than other shuffle instructions because we're
decoding constants out of the constant pool. The current MC layer makes
it completely impossible to inspect a constant pool entry, so we have to
do it at the MI level and attach the comment to the streamer on its way
out. So no joy for disassembling, but it does make test cases and asm
dumps *much* nicer.
Sorry for no test cases, but it didn't really seem that valuable to go
trolling through existing old test cases and updating them. I'll have
lots of testing of this in the upcoming patch for SSSE3 emission in the
new vector shuffle lowering code paths.
llvm-svn: 213986
This patch minimizes the number of nops that must be emitted on X86 to satisfy
stackmap shadow constraints.
To minimize the number of nops inserted, the X86AsmPrinter now records the
size of the most recent stackmap's shadow in the StackMapShadowTracker class,
and tracks the number of instruction bytes emitted since the that stackmap
instruction was encountered. Padding is emitted (if it is required at all)
immediately before the next stackmap/patchpoint instruction, or at the end of
the basic block.
This optimization should reduce code-size and improve performance for people
using the llvm stackmap intrinsic on X86.
<rdar://problem/14959522>
llvm-svn: 213892
Rename the routines to reflect the reality that they are more related to call
frame information than to Win64 EH. Although EH is implemented in an intertwined
manner by augmenting with an exception handler and an associated parameter, the
majority of these routines emit information required to unwind the frames. This
also helps identify that these routines are generic for most windows platforms
(they apply equally to nearly all architectures except x86) although the
encoding of the information is architecture dependent.
Unwinding data is emitted via EmitWinCFI* and exception handling information via
EmitWinEH*.
llvm-svn: 211994
--
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI. It handles all corner cases (I hope), including stack
realignment.
Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.
Patch by Vadim Chugunov!
Reviewed By: rnk
Differential Revision: http://reviews.llvm.org/D4081
llvm-svn: 211691
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI. It handles all corner cases (I hope), including stack
realignment.
Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.
Patch by Vadim Chugunov!
Reviewed By: rnk
Differential Revision: http://reviews.llvm.org/D4081
llvm-svn: 211399