Enable the pass by default for targets that request it, and change the
-enable-early-ifcvt to the opposite -disable-early-ifcvt.
There are still some x86 regressions when enabling early if-conversion
because of the missing machine models. Disable the pass for x86 until
machine models are added.
llvm-svn: 165075
X86DAGToDAGISel::PreprocessISelDAG(), isel is moving load inside
callseq_start / callseq_end so it can be folded into a call. This can
create a cycle in the DAG when the call is glued to a copytoreg. We
have been lucky this hasn't caused too many issues because the pre-ra
scheduler has special handling of call sequences. However, it has
caused a crash in a specific tailcall case.
rdar://12393897
llvm-svn: 165072
- Update maximal stack alignment when stack arguments are prepared before a
call.
- Test cases are enhanced to show it's not a Win32 specific issue but a generic
one.
llvm-svn: 164946
because moden processos can store multiple values in parallel, and preparing the consecutive store requires
some work. We only handle these cases:
1. Consecutive stores where the values and consecutive loads. For example:
int a = p->a;
int b = p->b;
q->a = a;
q->b = b;
2. Consecutive stores where the values are constants. Foe example:
q->a = 4;
q->b = 5;
llvm-svn: 164910
buildbots. Original commit message:
A DAGCombine optimization for merging consecutive stores. This optimization is not profitable in many cases
because moden processos can store multiple values in parallel, and preparing the consecutive store requires
some work. We only handle these cases:
1. Consecutive stores where the values and consecutive loads. For example:
int a = p->a;
int b = p->b;
q->a = a;
q->b = b;
2. Consecutive stores where the values are constants. Foe example:
q->a = 4;
q->b = 5;
llvm-svn: 164890
because moden processos can store multiple values in parallel, and preparing the consecutive store requires
some work. We only handle these cases:
1. Consecutive stores where the values and consecutive loads. For example:
int a = p->a;
int b = p->b;
q->a = a;
q->b = b;
2. Consecutive stores where the values are constants. Foe example:
q->a = 4;
q->b = 5;
llvm-svn: 164885
Even out-of-line jump tables can be in the code section, so mark them
as data-regions for those targets which support the directives.
rdar://12362871&12362974
llvm-svn: 164571
- Rewrite/merge pseudo-atomic instruction emitters to address the
following issue:
* Reduce one unnecessary load in spin-loop
previously the spin-loop looks like
thisMBB:
newMBB:
ld t1 = [bitinstr.addr]
op t2 = t1, [bitinstr.val]
not t3 = t2 (if Invert)
mov EAX = t1
lcs dest = [bitinstr.addr], t3 [EAX is implicit]
bz newMBB
fallthrough -->nextMBB
the 'ld' at the beginning of newMBB should be lift out of the loop
as lcs (or CMPXCHG on x86) will load the current memory value into
EAX. This loop is refined as:
thisMBB:
EAX = LOAD [MI.addr]
mainMBB:
t1 = OP [MI.val], EAX
LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
JNE mainMBB
sinkMBB:
* Remove immopc as, so far, all pseudo-atomic instructions has
all-register form only, there is no immedidate operand.
* Remove unnecessary attributes/modifiers in pseudo-atomic instruction
td
* Fix issues in PR13458
- Add comprehensive tests on atomic ops on various data types.
NOTE: Some of them are turned off due to missing functionality.
- Revise tests due to the new spin-loop generated.
llvm-svn: 164281
- Merge the processing of LOAD_ADD with other atomic load-arith
operations
- Separate the logic getting target constant for atomic-load-op and add
an optimization for atomic-load-add on i16 with negative value
- Optimize a minor case for atomic-fetch-add i16 with negative operand. Test
case is revised.
llvm-svn: 164243
It had patterns for zext-loading and extending. This commit adds patterns for loading a wide type, performing a bitcast,
and extending. This is an odd pattern, but it is commonly used when writing code with intrinsics.
rdar://11897677
llvm-svn: 163995
- Enhance the fix to PR12312 to support wider integer, such as 256-bit
integer. If more than 1 fully evaluated vectors are found, POR them
first followed by the final PTEST.
llvm-svn: 163832
- Find a legal vector type before casting and extracting element from it.
- As the new vector type may have more than 2 elements, build the final
hi/lo pair by BFS pairing them from bottom to top.
llvm-svn: 163830
Add a PatFrag to match X86tcret using 6 fixed registers or less. This
avoids folding loads into TCRETURNmi64 using 7 or more volatile
registers.
<rdar://problem/12282281>
llvm-svn: 163819
by xoring the high-bit. This fails if the source operand is a vector because we need to negate
each of the elements in the vector.
Fix rdar://12281066 PR13813.
llvm-svn: 163802
are within the lifetime zone. Sometime legitimate usages of allocas are
hoisted outside of the lifetime zone. For example, GEPS may calculate the
address of a member of an allocated struct. This commit makes sure that
we only check (abort regions or assert) for instructions that read and write
memory using stack frames directly. Notice that by allowing legitimate
usages outside the lifetime zone we also stop checking for instructions
which use derivatives of allocas. We will catch less bugs in user code
and in the compiler itself.
llvm-svn: 163791
We don't have enough GR64_TC registers when calling a varargs function
with 6 arguments. Since %al holds the number of vector registers used,
only %r11 is available as a scratch register.
This means that addressing modes using both base and index registers
can't be folded into TCRETURNmi64.
<rdar://problem/12282281>
llvm-svn: 163761
- BlockAddress has no support of BA + offset form and there is no way to
propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
support BA + offset addressing.
llvm-svn: 163743
The input program may contain intructions which are not inside lifetime
markers. This can happen due to a bug in the compiler or due to a bug in
user code (for example, returning a reference to a local variable).
This commit adds checks that all of the instructions in the function and
invalidates lifetime ranges which do not contain all of the instructions.
llvm-svn: 163678
- If a boolean value is generated from CMOV and tested as boolean value,
simplify the use of test result by referencing the original condition.
RDRAND intrinisc is one of such cases.
llvm-svn: 163516
The RegisterCoalescer understands overlapping live ranges where one
register is defined as a copy of the other. With this change, register
allocators using LiveRegMatrix can do the same, at least for copies
between physical and virtual registers.
When a physreg is defined by a copy from a virtreg, allow those live
ranges to overlap:
%CL<def> = COPY %vreg11:sub_8bit; GR32_ABCD:%vreg11
%vreg13<def,tied1> = SAR32rCL %vreg13<tied0>, %CL<imp-use,kill>
We can assign %vreg11 to %ECX, overlapping the live range of %CL.
llvm-svn: 163336
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder, or both.
Patch by Tyler Nowicki!
llvm-svn: 163150
This reverts commit 5dd9e214fb92847e947f9edab170f9b4e52b908f.
Thanks to Duncan for explaining how this should have been done.
Conflicts:
test/CodeGen/X86/vec_select.ll
llvm-svn: 163064
output chain is correctly setup.
As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.
rdar://11457792
llvm-svn: 163036
- In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as
well as PSHUFB will zero elements with negative indices.
Patch by Sriram Murali <sriram.murali@intel.com>
llvm-svn: 163018
- Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is
enabled.
As the penalty of inter-mixing SSE and AVX instructions, we need
prevent SSE legacy insn from being generated except explicitly
specified through some intrinsics. For patterns supported by both
SSE and AVX, so far, we force AVX insn will be tried first relying on
AddedComplexity or position in td file. It's error-prone and
introduces bugs accidentally.
'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited
by AVX, we need this predicate to force VEX encoding or SSE legacy
encoding only.
For insns not inherited by AVX, we still use the previous predicates,
i.e. 'HasSSEx'. So far, these insns fall into the following
categories:
* SSE insns with MMX operands
* SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH,
CRC, and etc.)
* SSE4A insns.
* MMX insns.
* x87 insns added by SSE.
2 test cases are modified:
- test/CodeGen/X86/fast-isel-x86-64.ll
AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be
selected by fast-isel due to complicated pattern and fast-isel
fallback to materialize it from constant pool.
- test/CodeGen/X86/widen_load-1.ll
AVX code generation is different from SSE one after fixing SSE/AVX
inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of
'vmovaps'.
llvm-svn: 162919
- The root cause is that target constant materialization in X86 fast-isel
creates a PC-rel addressing which may overflow 32-bit range in non-Small code
model if .rodata section is allocated too far away from code segment in
MCJIT, which uses Large code model so far.
- Follow the similar logic to fix non-Small code model in fast-isel by skipping
non-Small code model.
llvm-svn: 162881
- Add a target-specific DAG optimization to recognize a pattern PTEST-able.
Such a pattern is a OR'd tree with X86ISD::OR as the root node. When
X86ISD::OR node has only its flag result being used as a boolean value and
all its leaves are extracted from the same vector, it could be folded into an
X86ISD::PTEST node.
llvm-svn: 162735
this allows for better code generation.
Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and
FMINC, which are commutative.
For example:
movaps %xmm0, %xmm1
movsd LC(%rip), %xmm0
minsd %xmm1, %xmm0
becomes:
minsd LC(%rip), %xmm0
llvm-svn: 162187
arithmetic instructions. However, when small data types are used, a truncate
node appears between the SETCC node and the arithmetic operation. This patch
adds support for this pattern.
Before:
xorl %esi, %edi
testb %dil, %dil
setne %al
ret
After:
xorb %dil, %sil
setne %al
ret
rdar://12081007
llvm-svn: 162160
I really need to find a way to automate this, but I can't come up with a regex
that has no false positives while handling tricky cases like custom check
prefixes.
llvm-svn: 162097
around. That's not how we do things. Besides, the commit message tells us that
it is covered by the GCC test suite.
------------------------------------------------------------------------
r127497 | zwarich | 2011-03-11 13:51:56 -0800 (Fri, 11 Mar 2011) | 3 lines
Fix the GCC test suite issue exposed by r127477, which was caused by stack
protector insertion not working correctly with unreachable code. Since that
revision was rolled out, this test doesn't actual fail before this fix.
------------------------------------------------------------------------
llvm-svn: 161985
- FP_EXTEND only support extending from vectors with matching elements.
This results in the scalarization of extending to v2f64 from v2f32,
which will be legalized to v4f32 not matching with v2f64.
- add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
- add BUILD_VECTOR lowering helper to recover back the original
extending from v4f32 to v2f64.
- test case is enhanced to include different vector width.
llvm-svn: 161894
and allow some optimizations to turn conditional branches into unconditional.
This commit adds a simple control-flow optimization which merges two consecutive
basic blocks which are connected by a single edge. This allows the codegen to
operate on larger basic blocks.
rdar://11973998
llvm-svn: 161852
It is still possible to if-convert if the tail block has extra
predecessors, but the tail phis must be rewritten instead of being
removed.
llvm-svn: 161781
- FCMOV only supports a subset of X86 conditions. Skip boolean
simplification if X86 condition is not valid for FCMOV.
- add a minimal test case for PR13577.
llvm-svn: 161732
FeatureFastUAMem for Nehalem, Westmere and Sandy Bridge.
FeatureFastUAMem is already on if we pass in nehalem or westmere as a command
argument.
rdar: 7252306
llvm-svn: 161717
- if a boolean test (X86ISD::CMP or X86ISD:SUB) checks a boolean value
generated from X86ISD::SETCC, try to simplify the boolean value
generation and checking by reusing the original EFLAGS with proper
condition code
- add hooks to X86 specific SETCC/BRCOND/CMOV, the major 3 places
consuming EFLAGS
part of patches fixing PR12312
llvm-svn: 161687
When replacing Old with New, it can happen that New is already a
successor. Add the old and new edge weights instead of creating a
duplicate edge.
llvm-svn: 161653
This makes it possible to speed up def_iterator by stopping at the first
use. This makes def_empty() and getUniqueVRegDef() much faster when
there are many uses.
In a +Asserts build, LiveVariables is 100x faster in one case because
getVRegDef() has an assertion that would scan to the end of a
def_iterator chain.
Spill weight calculation is significantly faster (300x in one case)
because isTriviallyReMaterializable() calls MRI->isConstantPhysReg(%RIP)
which calls def_empty(%RIP).
llvm-svn: 161634
Use a more conventional doubly linked list where the Prev pointers form
a cycle. This means it is no longer necessary to adjust the Prev
pointers when reallocating the VRegInfo array.
The test changes are required because the register allocation hint is
using the use-list order to break ties.
llvm-svn: 161633
We perform the following:
1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering.
2> Modify MachineCSE to correctly handle implicit defs.
3> Convert SUB back to CMP if possible at peephole.
Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled
by peephole now.
rdar://11873276
llvm-svn: 161462
Previously, MBP essentially aligned every branch target it could. This
bloats code quite a bit, especially non-looping code which has no real
reason to prefer aligned branch targets so heavily.
As Andy said in review, it's still a bit odd to do this without a real
cost model, but this at least has much more plausible heuristics.
Fixes PR13265.
llvm-svn: 161409
If the result of a common subexpression is used at all uses of the candidate
expression, CSE should not increase the live range of the common subexpression.
rdar://11393714 and rdar://11819721
llvm-svn: 161396
This patch is mostly just refactoring a bunch of copy-and-pasted code, but
it also adds a check that the call instructions are readnone or readonly.
That check was already present for sin, cos, sqrt, log2, and exp2 calls, but
it was missing for the rest of the builtins being handled in this code.
llvm-svn: 161282
I noticed that SelectionDAGBuilder::visitCall was missing a check for memcmp
in TargetLibraryInfo, so that it would use custom code for memcmp calls even
with -fno-builtin. I also had to add a new -disable-simplify-libcalls option
to llc so that I could write a test for this.
llvm-svn: 161262
Fast isel doesn't currently have support for translating builtin function
calls to target instructions. For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization. Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel. <rdar://problem/12008746>
llvm-svn: 161232
Add more comments and use early returns to reduce nesting in isLoadFoldable.
Also disable folding for V_SET0 to avoid introducing a const pool entry and
a const pool load.
rdar://10554090 and rdar://11873276
llvm-svn: 161207
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.
This patch is a rework of r160919 and was tested on clang self-host on my local
machine.
rdar://10554090 and rdar://11873276
llvm-svn: 161152
One motivating example is to sink an instruction from a basic block which has
two successors: one outside the loop, the other inside the loop. We should try
to sink the instruction outside the loop.
rdar://11980766
llvm-svn: 161062
We branch to the successor with higher edge weight first.
Convert from
je LBB4_8 --> to outer loop
jmp LBB4_14 --> to inner loop
to
jne LBB4_14
jmp LBB4_8
PR12750
rdar: 11393714
llvm-svn: 161018
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.
rdar://10554090 and rdar://11873276
llvm-svn: 160919
It is possible that an instruction can use and update EFLAGS.
When checking the safety, we should check the usage of EFLAGS first before
declaring it is safe to optimize due to the update.
llvm-svn: 160912
These idempotent sub-register indices don't do anything --- They simply
map XMM registers to themselves. They no longer affect register classes
either since the SubRegClasses field has been removed from Target.td.
This patch replaces XMM->XMM EXTRACT_SUBREG and INSERT_SUBREG patterns
with COPY_TO_REGCLASS patterns which simply become COPY instructions.
The number of IMPLICIT_DEF instructions before register allocation is
reduced, and that is the cause of the test case changes.
llvm-svn: 160816
It is redundant; RegisterCoalescer will do the remat if it can't eliminate
the copy. Collected instruction counts before and after this. A few extra
instructions are generated due to spilling but it is normal to see these kinds
of changes with almost any small codegen change, according to Jakob.
This also fixed rdar://11830760 where xor is expected instead of movi0.
llvm-svn: 160749
struct s {
double x1;
float x2;
};
__attribute__((regparm(3))) struct s f(int a, int b, int c);
void g(void) {
f(41, 42, 43);
}
We need to be able to represent passing the address of s to f (sret) in a
register (inreg). Turns out that all that is needed is to not mark them as
mutually incompatible.
llvm-svn: 160695