Load/store instructions w/ a fixup to be relative a function marked as thumb
don't use the low bit to specify thumb vs. non-thumb like interworking
branches do, so don't set it when dealing with those fixups.
rdar://10348687.
llvm-svn: 148366
When set, this bit indicates that a register is completely defined by
the value of its sub-registers.
Use the CoveredBySubRegs property to infer which super-registers are
call-preserved given a list of callee-saved registers. For example, the
ARM registers D8-D15 are callee-saved. This now automatically implies
that Q4-Q7 are call-preserved.
Conversely, Win64 callees save XMM6-XMM15, but the corresponding
YMM6-YMM15 registers are not call-preserved because they are not fully
defined by their sub-registers.
llvm-svn: 148363
In CanXFormVExtractWithShuffleIntoLoad we assumed that EXTRACT_VECTOR_ELT can be later handled by the DAGCombiner.
However, in some cases on AVX, the EXTRACT_VECTOR_ELT is legalized to EXTRACT_SUBVECTOR + EXTRACT_VECTOR_ELT, which
currently is not handled by the DAGCombiner. In this patch I added a check that we only extract from the XMM part.
llvm-svn: 148298
(This time I believe I've checked all the -Wreturn-type warnings from GCC & added the couple of llvm_unreachables necessary to silence them. If I've missed any, I'll happily fix them as soon as I know about them)
llvm-svn: 148262
We know that the blend instructions only use the MSB, so if the mask is
sign-extended then we can convert it into a SHL instruction. This is a
common pattern because the type-legalizer sign-extends the i1 type which
is used by the LLVM-IR for the condition.
Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL.
llvm-svn: 148225
live across BBs before register allocation. This miscompiled 197.parser
when a cmp + b are optimized to a cbnz instruction even though the CPSR def
is live-in a successor.
cbnz r6, LBB89_12
...
LBB89_12:
ble LBB89_1
The fix consists of two parts. 1) Teach LiveVariables that some unallocatable
registers might be liveouts so don't mark their last use as kill if they are.
2) ARM constantpool island pass shouldn't form cbz / cbnz if the conditional
branch does not kill CPSR.
rdar://10676853
llvm-svn: 148168
The QQ and QQQQ registers are not 'real', they are pseudo-registers used
to model some vld and vst instructions.
This makes the call clobber lists longer, but I intend to get rid of
those soon.
llvm-svn: 148151
The registers are placed into the saved registers list in the reverse order,
which is why the original loop was written to loop backwards.
llvm-svn: 148064
lc: X86ISelLowering.cpp:6480: llvm::SDValue llvm::X86TargetLowering::LowerVECTOR_SHUFFLE(llvm::SDValue, llvm::SelectionDAG&) const: Assertion `V1.getOpcode() != ISD::UNDEF&& "Op 1 of shuffle should not be undef"' failed.
Added a test.
llvm-svn: 148044
Restore the (obviously wrong) behavior from before r147938 without relying on
undefined behavior. Add a fat FIXME note.
This should fix nightly tester failures.
llvm-svn: 148030
In att style asm syntax memory operand size is derived from suffix attached with mnemonic. In intel style asm syntax it is part of memory operand hence predicate method check is required to select appropriate instruction.
llvm-svn: 148006
same pattern. We already had this pattern is a few places, but others
tried to make a rough approximation of an actual DAG structure. As not
everywhere went to this trouble, nothing could rely on this being done.
In fact, I've checked all references to these node Ids, and the ones
that are using the topo-sort properties are actually satisfied with
a strict-weak-ordering. The requirement appears to be that Use >= Def.
I've added a big blurb of comments to this bit of the transform to
clarify why the order is so important for the next reader of the code.
I'm starting with this change as it is very small, and trivially
reverted if something breaks or the >= above really does need to be >.
If that proves the case, we can hide the problem by reverting this
patch, but the problem exists elsewhere as well, and so a more
comprehensive solution will be needed.
llvm-svn: 148001
hoped this would revive one of the llvm-gcc selfhost build bots, but it
didn't so it doesn't appear that my transform is the culprit.
If anyone else is seeing failures, please let me know!
llvm-svn: 147957
strange build bot failures that look like a miscompile into an infloop.
I'll investigate this tomorrow, but I'd both like to know whether my
patch is the culprit, and get the bots back to green.
llvm-svn: 147945
mask+shift pairs at the beginning of the ISD::AND case block, and then
hoist the final pattern into a helper function, simplifying and
reflowing it appropriately. This should have no observable behavior
change, but several simplifications fell out of this such as directly
computing the new mask constant, etc.
llvm-svn: 147939
extracts and scaled addressing modes into its own helper function. No
functionality changed here, just hoisting and layout fixes falling out
of that hoisting.
llvm-svn: 147937
detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:
unsigned x = my_accelerator_table[input >> 11];
Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):
*(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));
The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.
In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.
llvm-svn: 147936
Allow LDRD to be formed from pairs with different LDR encodings. This was the original intention of the pass. Somewhere along the way, the LDR opcodes were refined which broke the optimization. We really don't care what the original opcodes are as long as they both map to the same LDRD and the immediate still fits.
Fixes rdar://10435045 ARMLoadStoreOptimization cannot handle mixed LDRi8/LDRi12
llvm-svn: 147922
This function runs after all constant islands have been placed, and may
shrink some instructions to their 2-byte forms. This can actually cause
some constant pool entries to move out of range because of growing
alignment padding.
Treat instructions that may be shrunk the same as inline asm - they
erode the known alignment bits.
Also reinstate an old assertion in verify(). It is correct now that
basic block offsets include alignments.
Add a single large test case that will hopefully exercise many parts of
the constant island pass.
<rdar://problem/10670199>
llvm-svn: 147885
As the comment around 7746 says, it's better to use the x87 extended precision
here than SSE. And the generic code doesn't know how to do that. It also regains
the speed lost for the uint64_to_float.c testcase.
<rdar://problem/10669858>
llvm-svn: 147869
of several newly un-defaulted switches. This also helps optimizers
(including LLVM's) recognize that every case is covered, and we should
assume as much.
llvm-svn: 147861
On Thumb, the displacement computation hardware uses the address of the
current instruction rouned down to a multiple of 4. Include this
rounding in the UserOffset we compute for each instruction.
When inline asm is present, the instruction alignment may not be known.
Constrain the maximum displacement instead in that case.
This makes it possible for CreateNewWater() and OffsetIsInRange() to
agree about the valid displacements. When they disagree, infinite
looping happens.
As always, test cases for this stuff are insane.
<rdar://problem/10660175>
llvm-svn: 147825
this substraction will result in small negative numbers at worst which
become very large positive numbers on assignment and are thus caught by
the <=4 check on the next line. The >0 check clearly intended to catch
these as negative numbers.
Spotted by inspection, and impossible to trigger given the shift widths
that can be used.
llvm-svn: 147773
file error checking. Use that to error on an unfinished cfi_startproc.
The error is not nice, but is already better than a segmentation fault.
llvm-svn: 147717
This eliminates a lot of constant pool entries for -O0 builds of code
with many global variable accesses.
This speeds up -O0 codegen of consumer-typeset by 2x because the
constant island pass no longer has to look at thousands of constant pool
entries.
<rdar://problem/10629774>
llvm-svn: 147712
Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX)
Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov
llvm-svn: 147601
This small bit of ASM code is sufficient to do what the old algorithm did:
movq %rax, %xmm0
punpckldq (c0), %xmm0 // c0: (uint4){ 0x43300000U, 0x45300000U, 0U, 0U }
subpd (c1), %xmm0 // c1: (double2){ 0x1.0p52, 0x1.0p52 * 0x1.0p32 }
#ifdef __SSE3__
haddpd %xmm0, %xmm0
#else
pshufd $0x4e, %xmm0, %xmm1
addpd %xmm1, %xmm0
#endif
It's arguably faster. One caveat, the 'haddpd' instruction isn't very fast on
all processors.
<rdar://problem/7719814>
llvm-svn: 147593
Now that canRealignStack() understands frozen reserved registers, it is
safe to use it for aligned spill instructions.
It will only return true if the registers reserved at the beginning of
register allocation allow for dynamic stack realignment.
<rdar://problem/10625436>
llvm-svn: 147579
Once register allocation has started the reserved registers are frozen.
Fix the ARM canRealignStack() hook to respect the frozen register state.
Now the hook returns false if register allocation was started with frame
pointer elimination enabled.
It also returns false if register allocation started without a reserved
base pointer, and stack realignment would require a base pointer. This
bug was breaking oggenc on armv6.
No test case, an upcoming patch will use this functionality to realign
the stack for spill slots when possible.
llvm-svn: 147578
versions derive from them.
- JALR64 is not needed since N64 does not emit jal.
- Add template parameter to BranchLink that sets the rt field.
- Fix the set of temporary registers for O32 and N64.
llvm-svn: 147518
(x > y) ? x : y
=>
(x >= y) ? x : y
So for something like
(x - y) > 0 : (x - y) ? 0
It will be
(x - y) >= 0 : (x - y) ? 0
This makes is possible to test sign-bit and eliminate a comparison against
zero. e.g.
subl %esi, %edi
testl %edi, %edi
movl $0, %eax
cmovgl %edi, %eax
=>
xorl %eax, %eax
subl %esi, $edi
cmovsl %eax, %edi
rdar://10633221
llvm-svn: 147512
This patch caused a miscompilation of oggenc because a frame pointer was
suddenly needed halfway through register allocation.
<rdar://problem/10625436>
llvm-svn: 147487
If anybody has strong feelings about 'default: assert(0 && "blah")' vs
'default: llvm_unreachable("blah")', feel free to regularize the instances of
each in this file.
llvm-svn: 147459
Implement encoder methods getJumpTargetOpValue and getBranchTargetOpValue
for jmptarget and brtarget Mips tablegen operand types in the code emitter
for old-style JIT. Rename the pc relative relocation for branches - new
name is Mips::reloc_mips_pc16.
Patch by Sasa Stankovic
llvm-svn: 147382
1. The ST*UX instructions that store and update the stack pointer did not set define/kill on R1. This became a problem when I activated post-RA scheduling (and had incorrectly adjusted the Frames-large test).
2. eliminateFrameIndex did not kill its scavenged temporary register, and this could cause the scavenger to exhaust all available registers (and its emergency spill slot) when there were a lot of CR values to spill. The 2010-02-12-saveCR test has been adjusted to check for this.
llvm-svn: 147359
LZCNT instructions are available. Force promotion to i32 to get
a smaller encoding since the fix-ups necessary are just as complex for
either promoted type
We can't do standard promotion for CTLZ when lowering through BSR
because it results in poor code surrounding the 'xor' at the end of this
instruction. Essentially, if we promote the entire CTLZ node to i32, we
end up doing the xor on a 32-bit CTLZ implementation, and then
subtracting appropriately to get back to an i8 value. Instead, our
custom logic just uses the knowledge of the incoming size to compute
a perfect xor. I'd love to know of a way to fix this, but so far I'm
drawing a blank. I suspect the legalizer could be more clever and/or it
could collude with the DAG combiner, but how... ;]
llvm-svn: 147251
'bsf' instructions here.
This one is actually debatable to my eyes. It's not clear that any chip
implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless
EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding.
Still, this restores the old behavior with 'tzcnt' enabled for now.
llvm-svn: 147246
X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:
(sizeof(x)*8 - 1) ^ __builtin_clz(x)
Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.
The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.
Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.
These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.
llvm-svn: 147244
ARM targets with NEON units have access to aligned vector loads and
stores that are potentially faster than unaligned operations.
Add support for spilling the callee-saved NEON registers to an aligned
stack area using 16-byte aligned NEON loads and store.
This feature is off by default, controlled by an -align-neon-spills
command line option.
llvm-svn: 147211
My change r146949 added register clobbers to the eh_sjlj_dispatchsetup pseudo
instruction, but on Thumb1 some of those registers cannot be used. This
caused massive failures on the testsuite when compiling for Thumb1. While
fixing that, I noticed that the eh_sjlj_setjmp instruction has a "nofp"
variant, and I realized that dispatchsetup needs the same thing, so I have
added that as well.
llvm-svn: 147204
The value from the operands isn't right yet, but we weren't encoding it at
all previously. The parser needs to twiddle the values when building the
instruction.
Partial for: rdar://10558523
llvm-svn: 147170
Rather than require the symbol to be explicitly an argument of the directive,
allow it to look ahead and grab the symbol from the next non-whitespace
line.
rdar://10611140
llvm-svn: 147100
instruction supported by mips32r2, and add a pattern which replaces bswap with
a ROTR and WSBH pair.
WSBW is removed since it is not an instruction the current architectures
support.
llvm-svn: 147015
Use the spill slot alignment as well as the local variable alignment to
determine when the stack needs to be realigned. This works now that the
ARM target can always realign the stack by using a base pointer.
Still respect the ARMBaseRegisterInfo::canRealignStack() function
vetoing a realigned stack. Don't use aligned spill code in that case.
llvm-svn: 146997
use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.
The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.
llvm-svn: 146974
We used to rely on the *eh_sjlj_setjmp instructions to mark that a function
with setjmp/longjmp exception handling clobbers all the registers. But with
the recent reorganization of ARM EH, those eh_sjlj_setjmp instructions are
expanded away earlier, before PEI can see them to determine what registers to
save and restore. Mark the dispatchsetup instruction in the same way, since
that instruction cannot be expanded early. This also more accurately reflects
when the registers are clobbered.
llvm-svn: 146949
"mov r1, r2, lsl #0" should assemble as "mov r1, r2" even though it's
not strictly legal UAL syntax. It's a common extension and the friendly
thing to do.
rdar://10604663
llvm-svn: 146937
This change reduces the number of instructions generated.
For example,
(load (add (sub $n0, $n1), (MipsLo got(s))))
results in the following sequence of instructions:
1. sub $n2, $n0, $n1
2. lw got(s)($n2)
Previously, three instructions were needed.
1. sub $n2, $n0, $n1
2. addiu $n3, $n2, got(s)
3. lw 0($n3)
llvm-svn: 146888
Use information computed while inferring new register classes to emit
accurate, table-driven implementations of getMatchingSuperRegClass().
Delete the old manual, error-prone implementations in the targets.
llvm-svn: 146873
This adjustment is already included in the block offsets computed by
BasicBlockInfo, and adjusting again here can cause the pass to loop.
When CreateNewWater splits a basic block, OffsetIsInRange would reject
the new CPE on the next pass because of the too conservative alignment
adjustment. This caused the block to be split again, and so on.
llvm-svn: 146751
The command line option should be removed, but not until the feature has
gotten a lot of testing. The ARMConstantIslandPass tends to have subtle
bugs that only show up after a while.
llvm-svn: 146739
the compact unwind claiming that one register was saved before another, which
isn't all that great in general. Process them in the natural order. Reverse the
list only when necessary for the algorithm.
llvm-svn: 146612
An aligned constant pool entry may require extra alignment padding where
the new water is created. Take that into account when computing offset.
Also consider the alignment of other constant pool entries when
splitting a basic block. Alignment padding may make it necessary to
move the split point higher.
llvm-svn: 146609
r0 = mov #0
r0 = moveq #1
Then the second instruction has an implicit data dependency on the first
instruction. Sadly I have yet to come up with a small test case that
demonstrate the post-ra scheduler taking advantage of this.
llvm-svn: 146583
Work in progress. Parsing for non-writeback, single spaced register lists
works now. The rest have the representations better factored, but still
need more to be able to parse properly.
llvm-svn: 146579
When 'cmp rn #imm' doesn't match due to the immediate not being representable,
but 'cmn rn, #-imm' does match, use the latter in place of the former, as
it's equivalent.
rdar://10552389
llvm-svn: 146567
to finalize MI bundles (i.e. add BUNDLE instruction and computing register def
and use lists of the BUNDLE instruction) and a pass to unpack bundles.
- Teach more of MachineBasic and MachineInstr methods to be bundle aware.
- Switch Thumb2 IT block to MI bundles and delete the hazard recognizer hack to
prevent IT blocks from being broken apart.
llvm-svn: 146542
undefined result. This adds new ISD nodes for the new semantics,
selecting them when the LLVM intrinsic indicates that the undef behavior
is desired. The new nodes expand trivially to the old nodes, so targets
don't actually need to do anything to support these new nodes besides
indicating that they should be expanded. I've done this for all the
operand types that I could figure out for all the targets. Owners of
various targets, please review and let me know if any of these are
incorrect.
Note that the expand behavior is *conservatively correct*, and exactly
matches LLVM's current behavior with these operations. Ideally this
patch will not change behavior in any way. For example the regtest suite
finds the exact same instruction sequences coming out of the code
generator. That's why there are no new tests here -- all of this is
being exercised by the existing test suite.
Thanks to Duncan Sands for reviewing the various bits of this patch and
helping me get the wrinkles ironed out with expanding for each target.
Also thanks to Chris for clarifying through all the discussions that
this is indeed the approach he was looking for. That said, there are
likely still rough spots. Further review much appreciated.
llvm-svn: 146466
Constant pool entries with different alignment may cause more alignment
padding to be inserted. Compute the amount of padding needed, and try to
pick the location that requires the least amount of padding.
Also take the extra padding into account when the water is above the
use.
llvm-svn: 146458
subdirectories to traverse into.
- Originally I wanted to avoid this and just autoscan, but this has one key
flaw in that new subdirectories can not automatically trigger a rerun of the
llvm-build tool. This is particularly a pain when switching back and forth
between trees where one has added a subdirectory, as the dependencies will
tend to be wrong. This will also eliminates FIXME implicitly.
llvm-svn: 146436
These modifiers simply select either the low or high D subregister of a Neon
Q register. I've also removed the unimplemented 'p' modifier, which turns out
to be a bit different than the comment here suggests and as far as I can tell
was only intended for internal use in Apple's version of gcc.
llvm-svn: 146417