The expression based expansion too often results in IR level optimizations
splitting the intermediate values into separate basic blocks, preventing
the formation of the VBSL instruction as the code author intended. In
particular, LICM would often hoist part of the computation out of a loop.
rdar://11011471
llvm-svn: 164340
A PHI can't create interference on its own. If two live ranges interfere
at a PHI, they must also interfere when leaving one of the PHI
predecessors.
llvm-svn: 164330
- Rewrite/merge pseudo-atomic instruction emitters to address the
following issue:
* Reduce one unnecessary load in spin-loop
previously the spin-loop looks like
thisMBB:
newMBB:
ld t1 = [bitinstr.addr]
op t2 = t1, [bitinstr.val]
not t3 = t2 (if Invert)
mov EAX = t1
lcs dest = [bitinstr.addr], t3 [EAX is implicit]
bz newMBB
fallthrough -->nextMBB
the 'ld' at the beginning of newMBB should be lift out of the loop
as lcs (or CMPXCHG on x86) will load the current memory value into
EAX. This loop is refined as:
thisMBB:
EAX = LOAD [MI.addr]
mainMBB:
t1 = OP [MI.val], EAX
LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
JNE mainMBB
sinkMBB:
* Remove immopc as, so far, all pseudo-atomic instructions has
all-register form only, there is no immedidate operand.
* Remove unnecessary attributes/modifiers in pseudo-atomic instruction
td
* Fix issues in PR13458
- Add comprehensive tests on atomic ops on various data types.
NOTE: Some of them are turned off due to missing functionality.
- Revise tests due to the new spin-loop generated.
llvm-svn: 164281
A common coalescing conflict in vector code is lane insertion:
%dst = FOO
%src = BAR
%dst:ssub0 = COPY %src
The live range of %src interferes with the ssub0 lane of %dst, but that
lane is never read after %src would have clobbered it. That makes it
safe to merge the live ranges and eliminate the COPY:
%dst = FOO
%dst:ssub0 = BAR
This patch teaches the new coalescer to resolve conflicts where dead
vector lanes would be clobbered, at least as long as the clobbered
vector lanes don't escape the basic block.
llvm-svn: 164250
- Merge the processing of LOAD_ADD with other atomic load-arith
operations
- Separate the logic getting target constant for atomic-load-op and add
an optimization for atomic-load-add on i16 with negative value
- Optimize a minor case for atomic-fetch-add i16 with negative operand. Test
case is revised.
llvm-svn: 164243
lib/Target/PowerPC/PPCISelLowering.{h,cpp}
Rename LowerFormalArguments_Darwin to LowerFormalArguments_Darwin_Or_64SVR4.
Rename LowerFormalArguments_SVR4 to LowerFormalArguments_32SVR4.
Receive small structs right-justified in LowerFormalArguments_Darwin_Or_64SVR4.
Rename LowerCall_Darwin to LowerCall_Darwin_Or_64SVR4.
Rename LowerCall_SVR4 to LowerCall_32SVR4.
Pass small structs right-justified in LowerCall_Darwin_Or_64SVR4.
test/CodeGen/PowerPC/structsinregs.ll
New test.
llvm-svn: 164228
aligned address. Based on patch by David Peixotto.
Also use vld1.64 / vst1.64 with 128-bit alignment to take advantage of alignment
hints. rdar://12090772, rdar://12238782
llvm-svn: 164089
Add LIS::pruneValue() and extendToIndices(). These two functions are
used by the register coalescer when merging two live ranges requires
more than a trivial value mapping as supported by LiveInterval::join().
The pruneValue() function can remove the part of a value number that is
going to conflict in join(). Afterwards, extendToIndices can restore the
live range, using any new dominating value numbers and updating the SSA
form.
Use this complex value mapping to support merging a register into a
vector lane that has a conflicting value, but the clobbered lane is
undef.
llvm-svn: 164074
It had patterns for zext-loading and extending. This commit adds patterns for loading a wide type, performing a bitcast,
and extending. This is an odd pattern, but it is commonly used when writing code with intrinsics.
rdar://11897677
llvm-svn: 163995
- Enhance the fix to PR12312 to support wider integer, such as 256-bit
integer. If more than 1 fully evaluated vectors are found, POR them
first followed by the final PTEST.
llvm-svn: 163832
- Find a legal vector type before casting and extracting element from it.
- As the new vector type may have more than 2 elements, build the final
hi/lo pair by BFS pairing them from bottom to top.
llvm-svn: 163830
Add a PatFrag to match X86tcret using 6 fixed registers or less. This
avoids folding loads into TCRETURNmi64 using 7 or more volatile
registers.
<rdar://problem/12282281>
llvm-svn: 163819
by xoring the high-bit. This fails if the source operand is a vector because we need to negate
each of the elements in the vector.
Fix rdar://12281066 PR13813.
llvm-svn: 163802
are within the lifetime zone. Sometime legitimate usages of allocas are
hoisted outside of the lifetime zone. For example, GEPS may calculate the
address of a member of an allocated struct. This commit makes sure that
we only check (abort regions or assert) for instructions that read and write
memory using stack frames directly. Notice that by allowing legitimate
usages outside the lifetime zone we also stop checking for instructions
which use derivatives of allocas. We will catch less bugs in user code
and in the compiler itself.
llvm-svn: 163791
We don't have enough GR64_TC registers when calling a varargs function
with 6 arguments. Since %al holds the number of vector registers used,
only %r11 is available as a scratch register.
This means that addressing modes using both base and index registers
can't be folded into TCRETURNmi64.
<rdar://problem/12282281>
llvm-svn: 163761
- BlockAddress has no support of BA + offset form and there is no way to
propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
support BA + offset addressing.
llvm-svn: 163743
nonvolatile condition register fields across calls under the SVR4 ABIs.
* With the 64-bit ABI, the save location is at a fixed offset of 8 from
the stack pointer. The frame pointer cannot be used to access this
portion of the stack frame since the distance from the frame pointer may
change with alloca calls.
* With the 32-bit ABI, the save location is just below the general
register save area, and is accessed via the frame pointer like the rest
of the save areas. This is an optional slot, so it must only be created
if any of CR2, CR3, and CR4 were modified.
* For both ABIs, save/restore logic is generated only if one of the
nonvolatile CR fields were modified.
I also took this opportunity to clean up an extra FIXME in
PPCFrameLowering.h. Save area offsets for 32-bit GPRs are meaningless
for the 64-bit ABI, so I removed them for correctness and efficiency.
Fixes PR13708 and partially also PR13623. It lets us enable exception handling
on PPC64.
Patch by William J. Schmidt!
llvm-svn: 163713
SelectionDAG::getConstantFP(double Val, EVT VT, bool isTarget);
should not be used when Val is not a simple constant (as the comment in
SelectionDAG.h indicates). This patch avoids using this function
when folding an unknown constant through a bitcast, where it cannot be
guaranteed that Val will be a simple constant.
llvm-svn: 163703
The input program may contain intructions which are not inside lifetime
markers. This can happen due to a bug in the compiler or due to a bug in
user code (for example, returning a reference to a local variable).
This commit adds checks that all of the instructions in the function and
invalidates lifetime ranges which do not contain all of the instructions.
llvm-svn: 163678
The ARM backend can eliminate cmp instructions by reusing flags from a
nearby sub instruction with similar arguments.
Don't do that if the sub is predicated - the flags are not written
unconditionally.
<rdar://problem/12263428>
llvm-svn: 163535
- If a boolean value is generated from CMOV and tested as boolean value,
simplify the use of test result by referencing the original condition.
RDRAND intrinisc is one of such cases.
llvm-svn: 163516
The RegisterCoalescer understands overlapping live ranges where one
register is defined as a copy of the other. With this change, register
allocators using LiveRegMatrix can do the same, at least for copies
between physical and virtual registers.
When a physreg is defined by a copy from a virtreg, allow those live
ranges to overlap:
%CL<def> = COPY %vreg11:sub_8bit; GR32_ABCD:%vreg11
%vreg13<def,tied1> = SAR32rCL %vreg13<tied0>, %CL<imp-use,kill>
We can assign %vreg11 to %ECX, overlapping the live range of %CL.
llvm-svn: 163336
If we have a BUILD_VECTOR that is mostly a constant splat, it is often better to splat that constant then insertelement the non-constant lanes instead of insertelementing every lane from an undef base.
llvm-svn: 163304
Now that it is possible to dynamically tie MachineInstr operands,
predicated instructions are possible in SSA form:
%vreg3<def> = SUBri %vreg1, -2147483647, pred:14, pred:%noreg, %opt:%noreg
%vreg4<def,tied1> = MOVCCr %vreg3<tied0>, %vreg1, %pred:12, pred:%CPSR
Becomes a predicated SUBri with a tied imp-use:
SUBri %vreg1, -2147483647, pred:13, pred:%CPSR, opt:%noreg, %vreg1<imp-use,tied0>
This means that any instruction that is safe to move can be folded into
a MOVCC, and the *CC pseudo-instructions are no longer needed.
The test case changes reflect that Thumb2SizeReduce recognizes the
predicated instructions. It didn't understand the pseudos.
llvm-svn: 163274
Previous patch accidentally decided it couldn't convert a VFP to a
NEON instruction after it had already destroyed the old one. Not a
good move.
llvm-svn: 163230
subreg_hireg of register pair Rp.
* lib/Target/Hexagon/HexagonPeephole.cpp(PeepholeDoubleRegsMap): New
DenseMap similar to PeepholeMap that additionally records subreg info
too.
(runOnMachineFunction): Record information in PeepholeDoubleRegsMap
and copy propagate the high sub-reg of Rp0 in Rp1 = lsr(Rp0, #32) to
the instruction Rx = COPY Rp1:logreg_subreg.
* test/CodeGen/Hexagon/remove_lsr.ll: New test.
llvm-svn: 163214
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder, or both.
Patch by Tyler Nowicki!
llvm-svn: 163150
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.
Bug 12213
Patch by Yin Ma!
llvm-svn: 163136
For example, the ARM target does not have efficient ISel handling for vector
selects with scalar conditions. This patch adds a TLI hook which allows the
different targets to report which selects are supported well and which selects
should be converted to CF duting codegen prepare.
llvm-svn: 163093
This reverts commit 5dd9e214fb92847e947f9edab170f9b4e52b908f.
Thanks to Duncan for explaining how this should have been done.
Conflicts:
test/CodeGen/X86/vec_select.ll
llvm-svn: 163064
output chain is correctly setup.
As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.
rdar://11457792
llvm-svn: 163036
- In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as
well as PSHUFB will zero elements with negative indices.
Patch by Sriram Murali <sriram.murali@intel.com>
llvm-svn: 163018