Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.
This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.
The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
affects all targets that use frame pseudo instructions and touched many
files although the changes are uniform.
- Access to frame properties are implemented using special instructions
rather than calls getOperand(N).getImm(). For X86 and ARM such
replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
instruction. These involve proper instruction initialization and
methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
frame parts initialized inside frame instruction pair and outside it.
The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.
Differential Revision: https://reviews.llvm.org/D32394
llvm-svn: 302527
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.
This is largely a mechanical transformation from KnownZero to Known.Zero.
Differential Revision: https://reviews.llvm.org/D32569
llvm-svn: 301620
This makes the values a little more consistent between similar instruction and reduces the values some. This results in better grouping in the isel table saving a few bytes.
llvm-svn: 298043
The bug arises during register allocation on i686 for
CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B
needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address,
plus ESI is reserved as the base pointer. With such constraints the only
way register allocator would do its job successfully is when the addressing
mode of the instruction requires only one register. If that is not the case
- we are emitting additional LEA instruction to compute the address.
It fixes PR28755.
Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>
Differential Revision: https://reviews.llvm.org/D25088
llvm-svn: 287875
Suspected to be the cause of a sanitizer-windows bot failure:
Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420
llvm-svn: 286385
A relocatable immediate is either an immediate operand or an operand that
can be relocated by the linker to an immediate, such as a regular symbol
in non-PIC code.
Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands
of type "imm32_su". Remove a number of now-redundant patterns.
Differential Revision: https://reviews.llvm.org/D25812
llvm-svn: 286384
with an additional fix to make RegAllocFast ignore undef physreg uses. It would
previously get confused about the "push %eax" instruction's use of eax. That
method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate
as well, but since that runs after register-allocation, we didn't run into the
RegAllocFast issue before.
llvm-svn: 269949
This patch moves the expansion of WIN_ALLOCA pseudo-instructions
into a separate pass that walks the CFG and lowers the instructions
based on a conservative estimate of the offset between the stack
pointer and the lowest accessed stack address.
The goal is to reduce binary size and run-time costs by removing
calls to _chkstk. While it doesn't fix all the code quality problems
with inalloca calls, it's an incremental improvement for PR27076.
Differential Revision: http://reviews.llvm.org/D20263
llvm-svn: 269828
This didn't cause a bug because the order of the patterns ensured that the 64-bit instructions with 32-bit immediates were selected first.
llvm-svn: 268212
The callseq_end node must be glued with the TLS calls, otherwise,
the generic code will miss the uses of the returned value and will
mark it dead.
Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI,
the pseudo uses the symbol address at this point not RDI and the
lowering will do the right thing.
llvm-svn: 267797
64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes,
respectively, whereas and/or with 8-bit immediate is only three bytes.
Since these instructions imply an additional memory read (which the CPU could
elide, but we don't think it does), restrict these patterns to minsize functions.
Differential Revision: http://reviews.llvm.org/D18374
llvm-svn: 264440
This is the same as r255936, with added logic for avoiding clobbering of the
red zone (PR26023).
Differential Revision: http://reviews.llvm.org/D18246
llvm-svn: 264375
cmpxchg[8|16]b uses RBX as one of its argument.
In other words, using this instruction clobbers RBX as it is defined to hold one
the input. When the backend uses dynamically allocated stack, RBX is used as a
reserved register for the base pointer.
Reserved registers have special semantic that only the target understands and
enforces, because of that, the register allocator don’t use them, but also,
don’t try to make sure they are used properly (remember it does not know how
they are supposed to be used).
Therefore, when RBX is used as a reserved register but defined by something that
is not compatible with that use, the register allocator will not fix the
surrounding code to make sure it gets saved and restored properly around the
broken code. This is the responsibility of the target to do the right thing with
its reserved register.
To fix that, when the base pointer needs to be preserved, we use a different
pseudo instruction for cmpxchg that save rbx.
That pseudo takes two more arguments than the regular instruction:
- One is the value to be copied into RBX to set the proper value for the
comparison.
- The other is the virtual register holding the save of the value of RBX as the
base pointer. This saving is done as part of isel (i.e., we emit a copy from
rbx).
cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp
This gets expanded into:
rbx = copy input_for_rbx_reg
cmpxchg <regular cmpxchg args>
rbx = save_of_rbx_as_bp
Note: The actual modeling of the pseudo is a bit more complicated to make sure
the interferes that appears after the pseudo gets expanded are properly modeled
before that expansion.
This fixes PR26883.
llvm-svn: 263325
This is long-standing dirtiness, as acknowledged by r77582:
The current trick is to select it into a merge_values with
the first definition being an implicit_def. The proper solution is
to add new ISD opcodes for the no-output variant.
Doing this before selection will let us combine away some constructs.
Differential Revision: http://reviews.llvm.org/D17659
llvm-svn: 262244
TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent
shrink-wrapping from pushing prologue/epilogue past them (which result
in TLS variables being accessed before the stack frame is set up), we
put markers, so that the stack gets adjusted properly.
Thanks to Quentin Colombet for guidance/help on how to fix this problem!
llvm-svn: 261387
The red zone consists of 128 bytes beyond the stack pointer so that the
allocation of objects in leaf functions doesn't require decrementing
rsp. In r255656, we introduced an optimization that would cheaply
materialize certain constants via push/pop. Push decrements the stack
pointer and stores it's result at what is now the top of the stack.
However, this means that using push/pop would encroach on the red zone.
PR26023 gives an example where this corrupts an object in the red zone.
llvm-svn: 256808
Use the 3-byte (4 with REX prefix) push-pop sequence for materializing
small constants. This is smaller than using a mov (5, 6 or 7 bytes
depending on size and REX prefix), but it's likely to be slower, so
only used for 'minsize'.
This is a follow-up to r255656.
Differential Revision: http://reviews.llvm.org/D15549
llvm-svn: 255936
"movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes.
This commit makes LLVM use the latter when optimizing for size.
Differential Revision: http://reviews.llvm.org/D14971
llvm-svn: 255656
Part 1 was submitted in http://reviews.llvm.org/D15134.
Changes in this part:
* X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class.
* X86CallingConv.td: Pass f128 values in XMM registers or on stack.
* X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td:
Add instruction selection patterns for f128.
* X86ISelLowering.cpp:
When target has MMX registers, configure MVT::f128 in FR128RegClass,
with TypeSoftenFloat action, and custom actions for some opcodes.
Add missed cases of MVT::f128 in places that handle f32, f64, or vector types.
Add TODO comment to support f128 type in inline assembly code.
* SelectionDAGBuilder.cpp:
Fix infinite loop when f128 type can have
VT == TLI.getTypeToTransformTo(Ctx, VT).
* Add unit tests for x86-64 fp128 type.
Differential Revision: http://reviews.llvm.org/D11438
llvm-svn: 255558
This adds the EH_RESTORE x86 pseudo instr, which is responsible for
restoring the stack pointers: EBP and ESP, and ESI if stack realignment
is involved. We only need this on 32-bit x86, because on x64 the runtime
restores CSRs for us.
Previously we had to keep the CATCHRET instruction around during SEH so
that we could convince X86FrameLowering to restore our frame pointers.
Now we can split these instructions earlier.
This was confusing, because we had a return instruction which wasn't
really a return and was ultimately going to be removed by
X86FrameLowering. This change also simplifies X86FrameLowering, which
really shouldn't be building new MBBs.
No observable functional change currently, but with the new register
mask stuff in D14407, CATCHRET will become a register allocator barrier,
and our existing tests rely on us having reasonable register allocation
around SEH.
llvm-svn: 252266
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags. I fixed some of this issue in D13680 but had missed INC/DEC.
This patch adds the missing EFLAGS definition.
llvm-svn: 250438
Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.
This patch adds the missing EFLAGS definition.
Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.
Reviewers: rsmith, rtrieu
Subscribers: llvm-commits, reames, morisset
Differential Revision: http://reviews.llvm.org/D13680
llvm-svn: 250135
Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of. Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.
llvm-svn: 249052
This makes catchret look more like a branch, and less like a weird use
of BlockAddress. It also lets us get away from
llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label
arithmetic.
llvm-svn: 247936
All of the complexity is in cleanupret, and it mostly follows the same
codepaths as catchret, except it doesn't take a return value in RAX.
This small example now compiles and executes successfully on win32:
extern "C" int printf(const char *, ...) noexcept;
struct Dtor {
~Dtor() { printf("~Dtor\n"); }
};
void has_cleanup() {
Dtor o;
throw 42;
}
int main() {
try {
has_cleanup();
} catch (int) {
printf("caught it\n");
}
}
Don't try to put the cleanup in the same function as the catch, or Bad
Things will happen.
llvm-svn: 247219
Summary:
32-bit funclets have short prologues that allocate enough stack for the
largest call in the whole function. The runtime saves CSRs for the
funclet. It doesn't restore CSRs after we finally transfer control back
to the parent funciton via a CATCHRET, but that's a separate issue.
32-bit funclets also have to adjust the incoming EBP value, which is
what llvm.x86.seh.recoverframe does in the old model.
64-bit funclets need to spill CSRs as normal. For simplicity, this just
spills the same set of CSRs as the parent function, rather than trying
to compute different CSR sets for the parent function and each funclet.
64-bit funclets also allocate enough stack space for the largest
outgoing call frame, like 32-bit.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12546
llvm-svn: 247092
We can now run 32-bit programs with empty catch bodies. The next step
is to change PEI so that we get funclet prologues and epilogues.
llvm-svn: 246235
Summary: The casts from String to PatFrag weren't needed if we instead provided an SDNode. This fix was suggested by @pete in D11382.
Subscribers: pete, llvm-commits
Differential Revision: http://reviews.llvm.org/D11788
llvm-svn: 244167
Summary: PR24191 finds that the expected memory-register operations aren't generated when relaxed { load ; modify ; store } is used. This is similar to PR17281 which was addressed in D4796, but only for memory-immediate operations (and for memory orderings up to acquire and release). This patch also handles some floating-point operations.
Reviewers: reames, kcc, dvyukov, nadav, morisset, chandlerc, t.p.northover, pete
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11382
llvm-svn: 244128
Before this we were producing a TargetExternalSymbol from a MCSymbol.
That meant extracting the symbol name and fetching the symbol again
down the pipeline.
This patch adds a DAG.getMCSymbol that lets the MCSymbol pass unchanged on the
DAG.
Doing so removes the need for MO_NOPREFIX and fixes the root cause of pr23900,
allowing r240130 to be committed again.
llvm-svn: 240300