- MBB address is only valid as an immediate value in Small & Static
code/relocation models. On other models, LEA is needed to load IP address of
the restore MBB.
- A minor fix of MBB in MC lowering is added as well to enable target
relocation flag being propagated into MC.
llvm-svn: 166084
- Add custom FP_TO_SINT on v8i16 (and v8i8 which is legalized as v8i16 due to
vector element-wise widening) to reduce DAG combiner and its overhead added
in X86 backend.
llvm-svn: 166036
For the PowerPC 64-bit ELF Linux ABI, aggregates of size less than 8
bytes are to be passed in the low-order bits ("right-adjusted") of the
doubleword register or memory slot assigned to them. A previous patch
addressed this for aggregates passed in registers. However, small
aggregates passed in the overflow portion of the parameter save area are
still being passed left-adjusted.
The fix is made in PPCTargetLowering::LowerCall_Darwin_Or_64SVR4 on the
caller side, and in PPCTargetLowering::LowerFormalArguments_64SVR4 on
the callee side. The main fix on the callee side simply extends
existing logic for 1- and 2-byte objects to 1- through 7-byte objects,
and correcting a constant left over from 32-bit code. There is also a
fix to a bogus calculation of the offset to the following argument in
the parameter save area.
On the caller side, again a constant left over from 32-bit code is
fixed. Additionally, some code for 1, 2, and 4-byte objects is
duplicated to handle the 3, 5, 6, and 7-byte objects for SVR4 only. The
LowerCall_Darwin_Or_64SVR4 logic is getting fairly convoluted trying to
handle both ABIs, and I propose to separate this into two functions in a
future patch, at which time the duplication can be removed.
The patch adds a new test (structsinmem.ll) to demonstrate correct
passing of structures of all seven sizes. Eight dummy parameters are
used to force these structures to be in the overflow portion of the
parameter save area.
As a side effect, this corrects the case when aggregates passed in
registers are saved into the first eight doublewords of the parameter
save area: Previously they were stored left-justified, and now are
properly stored right-justified. This requires changing the expected
output of existing test case structsinregs.ll.
llvm-svn: 166022
Stack is formed improperly for long structures passed as byval arguments for
EABI mode.
If we took AAPCS reference, we can found the next statements:
A: "If the argument requires double-word alignment (8-byte), the NCRN (Next
Core Register Number) is rounded up to the next even register number." (5.5
Parameter Passing, Stage C, C.3).
B: "The alignment of an aggregate shall be the alignment of its most-aligned
component." (4.3 Composite Types, 4.3.1 Aggregates).
So if we have structure with doubles (9 double fields) and 3 Core unused
registers (r1, r2, r3): caller should use r2 and r3 registers only.
Currently r1,r2,r3 set is used, but it is invalid.
Callee VA routine should also use r2 and r3 regs only. All is ok here. This
behaviour is guessed by rounding up SP address with ADD+BFC operations.
Fix:
Main fix is in ARMTargetLowering::HandleByVal. If we detected AAPCS mode and
8 byte alignment, we waste odd registers then.
P.S.:
I also improved LDRB_POST_IMM regression test. Since ldrb instruction will
not generated by current regression test after this patch.
llvm-svn: 166018
Original message:
The attached is the fix to radar://11663049. The optimization can be outlined by following rules:
(select (x != c), e, c) -> select (x != c), e, x),
(select (x == c), c, e) -> select (x == c), x, e)
where the <c> is an integer constant.
The reason for this change is that : on x86, conditional-move-from-constant needs two instructions;
however, conditional-move-from-register need only one instruction.
While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase.
The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource".
Original message since r165661:
My previous change has a bug: I negated the condition code of a CMOV, and go ahead creating a new CMOV using the *ORIGINAL* condition code.
llvm-svn: 166017
- Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also
used as a light-weight replacement of setjmp/longjmp which are used to
implementation continuation, user-level threading, and etc. The support added
in this patch ONLY addresses this usage and is NOT intended to support SjLj
exception handling as zero-cost DWARF exception handling is used by default
in X86.
llvm-svn: 165989
Convert the internal representation of the Attributes class into a pointer to an
opaque object that's uniqued by and stored in the LLVMContext object. The
Attributes class then becomes a thin wrapper around this opaque
object. Eventually, the internal representation will be expanded to include
attributes that represent code generation options, etc.
llvm-svn: 165917
X86 doesn't have i8 cmovs so isel would emit a branch. Emitting branches at this
level is often not a good idea because it's too late for many optimizations to
kick in. This solution doesn't add any extensions (truncs are free) and tries
to avoid introducing partial register stalls by filtering direct copyfromregs.
I'm seeing a ~10% speedup on reading a random .png file with libpng15 via
graphicsmagick on x86_64/westmere, but YMMV depending on the microarchitecture.
llvm-svn: 165868
the interface between the front-end and the MC layer when parsing inline
assembly. Unfortunately, this is too deep into the parsing stack. Specifically,
we're unable to handle target-independent assembly (i.e., assembly directives,
labels, etc.). Note the MatchAndEmitInstruction() isn't the correct
abstraction either. I'll be exposing target-independent hooks shortly, so this
is really just a cleanup.
llvm-svn: 165858
local frame causes problem.
For example:
void f(StructToPass s) {
g(&s, sizeof(s));
}
will cause problem with tail-call since part of s is passed via registers and
saved in f's local frame. When g tries to access s, part of s may be corrupted
since f's local frame is popped out before the tail-call.
The current fix is to disable tail-call if getVarArgsRegSaveSize is not 0 for
the caller. This is a conservative approach, if we can prove the address of
s or part of s is not taken and passed to g, it should be okay to perform
tail-call.
rdar://12442472
llvm-svn: 165853
The backend already pattern matches to form VBSL when it can. We may want to
teach it to use the vbsl intrinsics at some point to prevent machine licm from
mucking with this, but using the Expand is completely correct.
http://llvm.org/bugs/show_bug.cgi?id=13831http://llvm.org/bugs/show_bug.cgi?id=13961
Patch by Peter Couperus <peter.couperus@st.com>.
llvm-svn: 165845
For function calls on the 64-bit PowerPC SVR4 target, each parameter
is mapped to as many doublewords in the parameter save area as
necessary to hold the parameter. The first 13 non-varargs
floating-point values are passed in registers; any additional
floating-point parameters are passed in the parameter save area. A
single-precision floating-point parameter (32 bits) must be mapped to
the second (rightmost, low-order) word of its assigned doubleword
slot.
Currently LLVM violates this ABI requirement by mapping such a
parameter to the first (leftmost, high-order) word of its assigned
doubleword slot. This is internally self-consistent but will not
interoperate correctly with libraries compiled with an ABI-compliant
compiler.
This patch corrects the problem by adjusting the parameter addressing
on both sides of the calling convention.
llvm-svn: 165714
Note: [D]M{T,F}CP2 is just a recommended encoding. Vendors often provide a
custom CP2 that interprets instructions differently and may wish to add their
own instructions that use this opcode. We should ensure that this is easy to
do. I will probably add a 'has custom CP{0-3}' subtarget flag to make this
easy: We want to avoid the GCC situation where every MIPS vendor makes a custom
fork that breaks every other MIPS CPU and so can't be merged upstream.
llvm-svn: 165711
Original message:
The attached is the fix to radar://11663049. The optimization can be outlined by following rules:
(select (x != c), e, c) -> select (x != c), e, x),
(select (x == c), c, e) -> select (x == c), x, e)
where the <c> is an integer constant.
The reason for this change is that : on x86, conditional-move-from-constant needs two instructions;
however, conditional-move-from-register need only one instruction.
While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase.
The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource".
llvm-svn: 165661
the compiler makes use of GPR0. However, there are two flavors of
GPR0 defined by the target: the 32-bit GPR0 (R0) and the 64-bit GPR0
(X0). The spill/reload code makes use of R0 regardless of whether we
are generating 32- or 64-bit code.
This patch corrects the problem in the obvious manner, using X0 and
ADDI8 for 64-bit and R0 and ADDI for 32-bit.
llvm-svn: 165658
the Altivec extensions were introduced. Its use is optional, and
allows the compiler to communicate to the operating system which
vector registers should be saved and restored during a context switch.
In practice, this information is ignored by the various operating
systems using the SVR4 ABI; the kernel saves and restores the entire
register state. Setting the VRSAVE register is no longer performed by
the AIX XL compilers, the IBM i compilers, or by GCC on Power Linux
systems. It seems best to avoid this logic within LLVM as well.
This patch avoids generating code to update and restore VRSAVE for the
PowerPC SVR4 ABIs (32- and 64-bit). The code remains in place for the
Darwin ABI.
llvm-svn: 165656
- Due to the current matching vector elements constraints in
ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from
v2f32) is scalarized. Add a customized v2f32 widening to convert it
into a target-specific X86ISD::VFPROUND to work around this
constraints.
llvm-svn: 165631
- Due to the current matching vector elements constraints in ISD::FP_EXTEND,
rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening
to convert it into a target-specific X86ISD::VFPEXT to work around this
constraints. This patch also reverts a previous attempt to fix this issue by
recovering the scalarized ISD::FP_EXTEND pattern and thus significantly
reduces the overhead of supporting non-power-2 vector FP extend.
llvm-svn: 165625
SDNode for LDRB_POST_IMM is invalid: number of registers added to SDNode fewer
that described in .td.
7 ops is needed, but SDNode with only 6 is created.
In more details:
In ARMInstrInfo.td, in multiclass AI2_ldridx, in definition _POST_IMM, offset
operand is defined as am2offset_imm. am2offset_imm is complex parameter type,
and actually it consists from dummy register and imm itself. As I understood
trick with dummy reg was made for AsmParser. In ARMISelLowering.cpp, this dummy
register was not added to SDNode, and it cause crash in Peephole Optimizer pass.
The problem fixed by setting up additional dummy reg when emitting
LDRB_POST_IMM instruction.
llvm-svn: 165617
SchedulerDAGInstrs::buildSchedGraph ignores dependencies between FixedStack
objects and byval parameters. So loading byval parameters from stack may be
inserted *before* it will be stored, since these operations are treated as
independent.
Fix:
Currently ARMTargetLowering::LowerFormalArguments saves byval registers with
FixedStack MachinePointerInfo. To fix the problem we need to store byval
registers with MachinePointerInfo referenced to first the "byval" parameter.
Also commit adds two new fields to the InputArg structure: Function's argument
index and InputArg's part offset in bytes relative to the start position of
Function's argument. E.g.: If function's argument is 128 bit width and it was
splitted onto 32 bit regs, then we got 4 InputArg structs with same arg index,
but different offset values.
llvm-svn: 165616
Allows the new machine model to be used for NumMicroOps and OutputLatency.
Allows the HazardRecognizer to be disabled along with itineraries.
llvm-svn: 165603
This patch provides initial implementation of load address
macro instruction for Mips. We have implemented two kinds
of expansions with their variations depending on the size
of immediate operand:
1) load address with immediate value directly:
* la d,j => addiu d,$zero,j (for -32768 <= j <= 65535)
* la d,j => lui d,hi16(j)
ori d,d,lo16(j) (for any other 32 bit value of j)
2) load load address with register offset value
* la d,j(s) => addiu d,s,j (for -32768 <= j <= 65535)
* la d,j(s) => lui d,hi16(j) (for any other 32 bit value of j)
ori d,d,lo16(j)
addu d,d,s
This patch does not cover the case when the address is loaded
from the value of the label or function.
Contributer: Vladimir Medic
llvm-svn: 165561
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.
llvm-svn: 165488
Vector compare using altivec 'vcmpxxx' instructions have as third argument
a vector register instead of CR one, different from integer and float-point
compares. This leads to a failure in code generation, where 'SelectSETCC'
expects a DAG with a CR register and gets vector register instead.
This patch changes the behavior by just returning a DAG with the
vector compare instruction based on the type. The patch also adds a testcase
for all vector types llvm defines.
It also included a fix on signed 5-bits predicates printing, where
signed values were not handled correctly as signed (char are unsigned by
default for PowerPC). This generates 'vspltisw' (vector splat)
instruction with SIM out of range.
llvm-svn: 165419
into separate versions for the Darwin and 64-bit SVR4 ABIs. This will
facilitate doing more major surgery on the 64-bit SVR4 ABI in the near future.
llvm-svn: 165336
Make sure functions located in user specified text sections (via the
section attribute) are located together with the default text sections.
Otherwise, for large object files, the relocations for call instructions
are more likely to be out of range. This becomes even more likely in the
presence of LTO.
rdar://12402636
llvm-svn: 165254
a) frame setup instructions define the prologue
b) we shouldn't change our location mid-stream
Add a test to make sure that the stack adjustment stays within
the prologue.
llvm-svn: 165250
- Add 'HwEncoding' for X86 registers and call getEncodingValue() to
retrieve their encoding values.
- This's the first step to adopt new scheme. Furthur revising is onging.
llvm-svn: 165241
"Instruction 'foo' has no tokens" errors during llvm-tblgen
-gen-asm-matcher attempts. At this time, the added
tokens are "#comment" style rather than the actual mnemonic. This will
be revisited once the rest of the base asmparser bits get straightened
out for ppc64-elf-linux.
llvm-svn: 165237
macro instruction (li) in the assembler.
We have identified three possible expansions depending on
the size of immediate operand:
1) for 0 ≤ j ≤ 65535.
li d,j =>
ori d,$zero,j
2) for −32768 ≤ j < 0.
li d,j =>
addiu d,$zero,j
3) for any other value of j that is representable as a 32-bit integer.
li d,j =>
lui d,hi16(j)
ori d,d,lo16(j)
All of the above have been implemented in ths patch.
Contributer: Vladimir Medic
llvm-svn: 165199
.set option
The patch implements following options
at - lets the assembler use the $at register for macros,
but generates warnings if the source program uses $at
noat - let source programs use $at without issuingwarnings.
noreorder - prevents the assembler from reordering machine
language instructions.
nomacro - causes the assembler to print a warning whenever
an assembler operation generates more than one
machine language instruction.
macro - lets the assembler generate multiple machine instructions
from a single assembler instruction
reorder - lets the assembler reorder machine language
instructions to improve performance
The above variants are parsed and their boolean values set or unset.
The code to actually use them will come later.
Following options are not implemented yet:
nomips16
nomicromips
move
nomove
Contributer: Vladimir Medic
llvm-svn: 165194
in the Intel syntax.
The MC layer supports emitting in the Intel syntax, but this would require the
inline assembly MachineInstr to be lowered to an MCInst before emission. This
is potential future work, but for now emitting directly from the MachineInstr
suffices.
llvm-svn: 165173
for the number of bytes in a particular instruction
to using
const MCInstrDesc &Desc = MCII.get(TmpInst.getOpcode());
Desc.getSize()
This is necessary with the advent of 16 bit instructions with
mips16 and micromips. It is also puts Mips in compliance with
the other targets for getting instruction size.
llvm-svn: 165171
Enable the pass by default for targets that request it, and change the
-enable-early-ifcvt to the opposite -disable-early-ifcvt.
There are still some x86 regressions when enabling early if-conversion
because of the missing machine models. Disable the pass for x86 until
machine models are added.
llvm-svn: 165075
X86DAGToDAGISel::PreprocessISelDAG(), isel is moving load inside
callseq_start / callseq_end so it can be folded into a call. This can
create a cycle in the DAG when the call is glued to a copytoreg. We
have been lucky this hasn't caused too many issues because the pre-ra
scheduler has special handling of call sequences. However, it has
caused a crash in a specific tailcall case.
rdar://12393897
llvm-svn: 165072
If the code is generated as assembler, this transformation does not occur assuming that it will occur later in the assembler.
This code was originally called from MipsAsmPrinter.cpp and we needed to check for OutStreamer.hasRawTextSupport(). This was not a good place for it and has been moved to MCTargetDesc/MipsMCCodeEmitter.cpp where both direct object and the assembler use it it automagically.
The test cases have been checked in for a number of weeks now.
llvm-svn: 165067
of operand is specific to MS-style inline assembly and should not be generated
when parsing normal assembly.
The purpose of the wildcard operands are to allow the AsmParser to match
multiple instructions (i.e., MCInsts) to a given ms-style asm statement. For
the time being the matcher just returns the first match. This patch only
implements wildcard matches for memory operands. Support for register
wildcards will be added in the near future.
llvm-svn: 165057
This adds 'elf' as a recognized target triple environment value and overrides the default generated object format on Windows platforms if that value is present. This patch also enables MCJIT tests on Windows using the new environment value.
llvm-svn: 165030
map constraints and MCInst operands to inline asm operands. This replaces the
getMCInstOperandNum() function.
The logic to determine the constraints are not in place, so we still default to
a register constraint (i.e., "r"). Also, we no longer build the MCInst but
rather return just the opcode to get the MCInstrDesc.
llvm-svn: 164979
The target backend can support data-in-code load commands even when
the assembler doesn't, or vice-versa. Allow targets to opt-in for
direct-to-object.
PR13973.
llvm-svn: 164974
2. As part of this, added assembly format FEXT_RI16_SP_explicit_ins and
moved other lines for FEXT_RI16 formats to be in the right place in the code.
3. Added mayLoad and mayStore assignements for the load/store instructions added and for ones already there that did not have this assignment.
4. Another patch will deal with the problem of load/store byte/halfword to the stack. This is a particular Mips16 problem.
llvm-svn: 164811
If the offset is more than 24-bits, it won't fit in a scattered
relocation offset field, so we fall back to using a non-scattered
relocation.
rdar://12358909
llvm-svn: 164724
- Instead of embedding 'lock' into each mnemonic of atomic
instructions except 'xchg', we teach X86 assembly printer to output 'lock'
prefix similar to or consistent with code emitter.
llvm-svn: 164659
Provide interface in TargetLowering to set or get the minimum number of basic
blocks whereby jump tables are generated for switch statements rather than an
if sequence.
getMinimumJumpTableEntries() defaults to 4.
setMinimumJumpTableEntries() allows target configuration.
This patch changes the default for the Hexagon architecture to 5
as it improves performance on some benchmarks.
llvm-svn: 164628
When a BL/BLX references a symbol in the same translation unit that is
out of range, use an external relocation. The linker will use this to
generate a branch island rather than a direct reference, allowing the
relocation to resolve correctly.
rdar://12359919
llvm-svn: 164615
Even out-of-line jump tables can be in the code section, so mark them
as data-regions for those targets which support the directives.
rdar://12362871&12362974
llvm-svn: 164571
This patch fixes load/store instructions to handle less common cases
like "asr #32", "rrx" properly throughout the MC layer.
Patch by Chris Lidbury.
llvm-svn: 164455
The expression based expansion too often results in IR level optimizations
splitting the intermediate values into separate basic blocks, preventing
the formation of the VBSL instruction as the code author intended. In
particular, LICM would often hoist part of the computation out of a loop.
rdar://11011471
llvm-svn: 164340
- Rewrite/merge pseudo-atomic instruction emitters to address the
following issue:
* Reduce one unnecessary load in spin-loop
previously the spin-loop looks like
thisMBB:
newMBB:
ld t1 = [bitinstr.addr]
op t2 = t1, [bitinstr.val]
not t3 = t2 (if Invert)
mov EAX = t1
lcs dest = [bitinstr.addr], t3 [EAX is implicit]
bz newMBB
fallthrough -->nextMBB
the 'ld' at the beginning of newMBB should be lift out of the loop
as lcs (or CMPXCHG on x86) will load the current memory value into
EAX. This loop is refined as:
thisMBB:
EAX = LOAD [MI.addr]
mainMBB:
t1 = OP [MI.val], EAX
LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
JNE mainMBB
sinkMBB:
* Remove immopc as, so far, all pseudo-atomic instructions has
all-register form only, there is no immedidate operand.
* Remove unnecessary attributes/modifiers in pseudo-atomic instruction
td
* Fix issues in PR13458
- Add comprehensive tests on atomic ops on various data types.
NOTE: Some of them are turned off due to missing functionality.
- Revise tests due to the new spin-loop generated.
llvm-svn: 164281
- Merge the processing of LOAD_ADD with other atomic load-arith
operations
- Separate the logic getting target constant for atomic-load-op and add
an optimization for atomic-load-add on i16 with negative value
- Optimize a minor case for atomic-fetch-add i16 with negative operand. Test
case is revised.
llvm-svn: 164243
lib/Target/PowerPC/PPCISelLowering.{h,cpp}
Rename LowerFormalArguments_Darwin to LowerFormalArguments_Darwin_Or_64SVR4.
Rename LowerFormalArguments_SVR4 to LowerFormalArguments_32SVR4.
Receive small structs right-justified in LowerFormalArguments_Darwin_Or_64SVR4.
Rename LowerCall_Darwin to LowerCall_Darwin_Or_64SVR4.
Rename LowerCall_SVR4 to LowerCall_32SVR4.
Pass small structs right-justified in LowerCall_Darwin_Or_64SVR4.
test/CodeGen/PowerPC/structsinregs.ll
New test.
llvm-svn: 164228
aligned address. Based on patch by David Peixotto.
Also use vld1.64 / vst1.64 with 128-bit alignment to take advantage of alignment
hints. rdar://12090772, rdar://12238782
llvm-svn: 164089
It had patterns for zext-loading and extending. This commit adds patterns for loading a wide type, performing a bitcast,
and extending. This is an odd pattern, but it is commonly used when writing code with intrinsics.
rdar://11897677
llvm-svn: 163995
use load/store fragments defined in TargetSelectionDAG.td in place of them.
Unaligned loads/stores are either expanded or lowered to target-specific nodes,
so instruction selection should see only aligned load/store nodes.
No changes in functionality.
llvm-svn: 163960
This models the A9 processor at the level of instruction operands, as
opposed to the itinerary, which models each operation at the level of
pipeline stages.
The two primary motivations are:
1) Allow MachineScheduler to model A9 as an out-of-order processor. It
can now distinguish between hazards that force interlocking vs.
buffered resources.
2) Reduce long-term maintenance by allowing the itinerary and target
hooks to eventually be removed. Note that almost all of the complexity
in the new model exists to model instruction variants, which the
itinerary cannot handle. Instead the scheduler previously relied on
processor-specific target hooks which are incomplete and buggy.
llvm-svn: 163921
This patch introduces a possibility for Hexagon MI scheduler
to perform some target specific post- processing on the scheduling
DAG prior to scheduling.
llvm-svn: 163903
* wrap code blocks in \code ... \endcode;
* refer to parameter names in paragraphs correctly (\arg is not what most
people want -- it starts a new paragraph);
* use \param instead of \arg to document parameters in order to be consistent
with the rest of the codebase.
llvm-svn: 163902
- Enhance the fix to PR12312 to support wider integer, such as 256-bit
integer. If more than 1 fully evaluated vectors are found, POR them
first followed by the final PTEST.
llvm-svn: 163832
Add a PatFrag to match X86tcret using 6 fixed registers or less. This
avoids folding loads into TCRETURNmi64 using 7 or more volatile
registers.
<rdar://problem/12282281>
llvm-svn: 163819
We don't have enough GR64_TC registers when calling a varargs function
with 6 arguments. Since %al holds the number of vector registers used,
only %r11 is available as a scratch register.
This means that addressing modes using both base and index registers
can't be folded into TCRETURNmi64.
<rdar://problem/12282281>
llvm-svn: 163761
- BlockAddress has no support of BA + offset form and there is no way to
propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
support BA + offset addressing.
llvm-svn: 163743
nonvolatile condition register fields across calls under the SVR4 ABIs.
* With the 64-bit ABI, the save location is at a fixed offset of 8 from
the stack pointer. The frame pointer cannot be used to access this
portion of the stack frame since the distance from the frame pointer may
change with alloca calls.
* With the 32-bit ABI, the save location is just below the general
register save area, and is accessed via the frame pointer like the rest
of the save areas. This is an optional slot, so it must only be created
if any of CR2, CR3, and CR4 were modified.
* For both ABIs, save/restore logic is generated only if one of the
nonvolatile CR fields were modified.
I also took this opportunity to clean up an extra FIXME in
PPCFrameLowering.h. Save area offsets for 32-bit GPRs are meaningless
for the 64-bit ABI, so I removed them for correctness and efficiency.
Fixes PR13708 and partially also PR13623. It lets us enable exception handling
on PPC64.
Patch by William J. Schmidt!
llvm-svn: 163713
Sub-register lane masks are bitmasks that can be used to determine if
two sub-registers of a virtual register will overlap. For example, ARM's
ssub0 and ssub1 sub-register indices don't overlap each other, but both
overlap dsub0 and qsub0.
The lane masks will be accurate on most targets, but on targets that use
sub-register indexes in an irregular way, the masks may conservatively
report that two sub-register indices overlap when the eventually
allocated physregs don't.
Irregular register banks also mean that the bits in a lane mask can't be
mapped onto register units, but the concept is similar.
llvm-svn: 163630
The Hexagon target decided to use a lot of functionality from the
target-independent scheduler. That's fine, and other targets should be
able to do the same. This reorg and API update makes that easy.
For the record, ScheduleDAGMI was not meant to be subclassed. Instead,
new scheduling algorithms should be able to implement
MachineSchedStrategy and be done. But if need be, it's nice to be
able to extend ScheduleDAGMI, so I also made that easier. The target
scheduler is somewhat more apt to break that way though.
llvm-svn: 163580
The ARM backend can eliminate cmp instructions by reusing flags from a
nearby sub instruction with similar arguments.
Don't do that if the sub is predicated - the flags are not written
unconditionally.
<rdar://problem/12263428>
llvm-svn: 163535
- If a boolean value is generated from CMOV and tested as boolean value,
simplify the use of test result by referencing the original condition.
RDRAND intrinisc is one of such cases.
llvm-svn: 163516
For some reason .lcomm uses byte alignment and .comm log2 alignment so we can't
use the same setting for both. Fix this by reintroducing the LCOMM enum.
I verified this against mingw's gcc.
llvm-svn: 163420
- Darwin lied about not supporting .lcomm and turned it into zerofill in the
asm parser. Push the zerofill-conversion down into macho-specific code.
- This makes the tri-state LCOMMType enum superfluous, there are no targets
without .lcomm.
- Do proper error reporting when trying to use .lcomm with alignment on a target
that doesn't support it.
- .comm and .lcomm alignment was parsed in bytes on COFF, should be power of 2.
- Fixes PR13755 (.lcomm crashes on ELF).
llvm-svn: 163395
gas accepts this and it seems to be common enough to be worth supporting. This
doesn't affect the parsing of reg operands outside of .cfi directives.
llvm-svn: 163390
The assembler can alias one instruction into another based
on the operands. For example the jump instruction "J" takes
and immediate operand, but if the operand is a register the
assembler will change it into a jump register "JR" instruction.
These changes are in the instruction td file.
Test cases included
Contributer: Vladimir Medic
llvm-svn: 163368
Actually these are just stubs for parsing the directives.
Semantic support will come later.
Test cases included
Contributer: Vladimir Medic
llvm-svn: 163364
If we have a BUILD_VECTOR that is mostly a constant splat, it is often better to splat that constant then insertelement the non-constant lanes instead of insertelementing every lane from an undef base.
llvm-svn: 163304
assembler such as shifts greater than 32. In the case
of direct object, the code gen needs to do this lowering
since the assembler is not involved.
With the advent of the llvm-mc assembler, it also needs
to do the same lowering.
This patch makes that specific lowering code accessible
to both the direct object output and the assembler.
This patch does not affect generated output.
llvm-svn: 163287
Now that it is possible to dynamically tie MachineInstr operands,
predicated instructions are possible in SSA form:
%vreg3<def> = SUBri %vreg1, -2147483647, pred:14, pred:%noreg, %opt:%noreg
%vreg4<def,tied1> = MOVCCr %vreg3<tied0>, %vreg1, %pred:12, pred:%CPSR
Becomes a predicated SUBri with a tied imp-use:
SUBri %vreg1, -2147483647, pred:13, pred:%CPSR, opt:%noreg, %vreg1<imp-use,tied0>
This means that any instruction that is safe to move can be folded into
a MOVCC, and the *CC pseudo-instructions are no longer needed.
The test case changes reflect that Thumb2SizeReduce recognizes the
predicated instructions. It didn't understand the pseudos.
llvm-svn: 163274
Previous patch accidentally decided it couldn't convert a VFP to a
NEON instruction after it had already destroyed the old one. Not a
good move.
llvm-svn: 163230
subreg_hireg of register pair Rp.
* lib/Target/Hexagon/HexagonPeephole.cpp(PeepholeDoubleRegsMap): New
DenseMap similar to PeepholeMap that additionally records subreg info
too.
(runOnMachineFunction): Record information in PeepholeDoubleRegsMap
and copy propagate the high sub-reg of Rp0 in Rp1 = lsr(Rp0, #32) to
the instruction Rx = COPY Rp1:logreg_subreg.
* test/CodeGen/Hexagon/remove_lsr.ll: New test.
llvm-svn: 163214
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder, or both.
Patch by Tyler Nowicki!
llvm-svn: 163150
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.
Bug 12213
Patch by Yin Ma!
llvm-svn: 163136
MatchInstructionImpl() function.
These values are used by the ConvertToMCInst() function to index into the
ConversionTable. The values are also needed to call the GetMCInstOperandNum()
function.
llvm-svn: 163101
For example, the ARM target does not have efficient ISel handling for vector
selects with scalar conditions. This patch adds a TLI hook which allows the
different targets to report which selects are supported well and which selects
should be converted to CF duting codegen prepare.
llvm-svn: 163093
NEON domain conversion was too heavy-handed with its widened
registers, which could have stripped existing instructions of their
dependency, leaving them vulnerable to scheduling errors.
llvm-svn: 163070
output chain is correctly setup.
As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.
rdar://11457792
llvm-svn: 163036
- In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as
well as PSHUFB will zero elements with negative indices.
Patch by Sriram Murali <sriram.murali@intel.com>
llvm-svn: 163018
on the size of the extraction and its position in the 64 bit word.
This patch allows support of the dext transformations with mips64 direct
object output.
0 <= msb < 32 0 <= lsb < 32 0 <= pos < 32 1 <= size <= 32
DINS
The field is entirely contained in the right-most word of the doubleword
32 <= msb < 64 0 <= lsb < 32 0 <= pos < 32 2 <= size <= 64
DINSM
The field straddles the words of the doubleword
32 <= msb < 64 32 <= lsb < 64 32 <= pos < 64 1 <= size <= 32
DINSU
The field is entirely contained in the left-most word of the doubleword
llvm-svn: 163010
The assembly string for the VMOVPQIto64rr instruction incorrectly lacked the 'v'
prefix, resulting in mis-assembly of the vanilla movd instruction.
llvm-svn: 162963
- Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is
enabled.
As the penalty of inter-mixing SSE and AVX instructions, we need
prevent SSE legacy insn from being generated except explicitly
specified through some intrinsics. For patterns supported by both
SSE and AVX, so far, we force AVX insn will be tried first relying on
AddedComplexity or position in td file. It's error-prone and
introduces bugs accidentally.
'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited
by AVX, we need this predicate to force VEX encoding or SSE legacy
encoding only.
For insns not inherited by AVX, we still use the previous predicates,
i.e. 'HasSSEx'. So far, these insns fall into the following
categories:
* SSE insns with MMX operands
* SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH,
CRC, and etc.)
* SSE4A insns.
* MMX insns.
* x87 insns added by SSE.
2 test cases are modified:
- test/CodeGen/X86/fast-isel-x86-64.ll
AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be
selected by fast-isel due to complicated pattern and fast-isel
fallback to materialize it from constant pool.
- test/CodeGen/X86/widen_load-1.ll
AVX code generation is different from SSE one after fixing SSE/AVX
inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of
'vmovaps'.
llvm-svn: 162919
[Tobias von Koch] What's happening here is that the CR6SET/CR6UNSET is breaking the chain of register copies glued to the function call (BL_SVR4 node). The scheduler then moves other instructions in between those and the function call, which isn't good!
Right. That's the case where there is no chain of register copies before the call, so InFlag == 0... Attached is a new revision of the patch which should fix this for good.
llvm-svn: 162916
- The root cause is that target constant materialization in X86 fast-isel
creates a PC-rel addressing which may overflow 32-bit range in non-Small code
model if .rodata section is allocated too far away from code segment in
MCJIT, which uses Large code model so far.
- Follow the similar logic to fix non-Small code model in fast-isel by skipping
non-Small code model.
llvm-svn: 162881
Ordered memory operations are more constrained than volatile loads and
stores because they must be ordered with respect to all other memory
operations.
llvm-svn: 162861
We need to reserve space for the mandatory traceback fields,
though leaving them as zero is appropriate for now.
Although the ABI calls for these fields to be filled in fully, no
compiler on Linux currently does this, and GDB does not read these
fields. GDB uses the first word of zeroes during exception handling to
find the end of the function and the size field, allowing it to compute
the beginning of the function. DWARF information is used for everything
else. We need the extra 8 bytes of pad so the size field is found in
the right place.
As a comparison, GCC fills in a few of the fields -- language, number
of saved registers -- but ignores the rest. IBM's proprietary OSes do
make use of the full traceback table facility.
Patch by Bill Schmidt.
llvm-svn: 162854
This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
is specified. This has been a problem for a long time but became more severe
with the recent memory builtin improvements.
Since the memory builtin functions are used everywhere, this required passing
TLI in many places. This means that functions that now have an optional TLI
argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
mallocs anymore if the TLI argument is missing. I've updated most passes to do
the right thing.
Fixes PR13694 and probably others.
llvm-svn: 162841
I have tested the fix, but have not been successfull in generating
a robust unit test. This can only be exposed through particular
register assignments.
llvm-svn: 162821