Allows returning a PPC::Predicate from a function with a no-predicate
value possible. Preparatory patch for fast-isel on PPC64 ELF. No
behavioral change intended.
llvm-svn: 183841
I've been comparing the object file output of LLVM's integrated
assembler against the external assembler on PowerPC, and one
area where differences still remain are in DWARF sections.
In particular, the GNU assembler generates .debug_frame and
.debug_line sections using a code alignment factor of 4, since
all PowerPC instructions have size 4 and must be aligned to a
multiple of 4. However, current MC code hard-codes a code
alignment factor of 1.
This patch changes this by adding a "minimum instruction alignment"
data element to MCAsmInfo and using this as code alignment factor.
This requires passing a MCContext into MCDwarfLineAddr::Encode
and MCDwarfLineAddr::EncodeAdvanceLoc. Note that one caller,
MCDwarfLineAddr::Write, didn't actually have that information
available. However, it turns out that this routine is in fact
never used in the whole code base, so the patch simply removes
it. If it turns out to be needed again at a later time, it
could be re-added with an updated interface.
llvm-svn: 183834
The pass emits a call to sqrt that has attribute "read-none". This call will be
converted to an ISD::FSQRT node during DAG construction, which will turn into
a mips native sqrt instruction.
llvm-svn: 183802
Sign- and zero-extension folding was slightly incorrect because it wasn't checking that the shift on extensions was zero. Further, I recently added AND rd, rn, #255 as a form of 8-bit zero extension, and failed to add the folding code for it.
This patch fixes both issues.
This patch fixes both, and the test should remain the same:
test/CodeGen/ARM/fast-isel-fold.ll
llvm-svn: 183794
Negative zero is returned by the primary expression parser as INT32_MIN, so all that the method needs to do is to accept this value.
Behavior already present for Thumb2.
llvm-svn: 183734
- Don't use assert(0), or tests may pass or fail according to assertions.
- For now, The tests are marked as XFAIL for win32 hosts.
FIXME: Could we avoid XFAIL to specify triple in the RUN lines?
llvm-svn: 183728
Some ARM CPUs only support ARM mode (ancient v4 ones, for example) and some
only support Thumb mode (M-class ones currently). This makes sure such CPUs
default to the correct mode and makes the AsmParser diagnose an attempt to
switch modes incorrectly.
rdar://14024354
llvm-svn: 183710
Previously LEA64_32r went through virtually the entire backend thinking it was
using 32-bit registers until its blissful illusions were cruelly snatched away
by MCInstLower and 64-bit equivalents were substituted at the last minute.
This patch makes it behave normally, and take 64-bit registers as sources all
the way through. Previous uses (for 32-bit arithmetic) are accommodated via
SUBREG_TO_REG instructions which make the types and classes agree properly.
llvm-svn: 183693
A plain "sc" without argument is supposed to be treated like "sc 0"
by the assembler. This patch adds a corresponding alias.
Problem reported by Joerg Sonnenberger.
llvm-svn: 183687
The extended branch mnemonics are supposed to use an implied CR0
if there is no explicit condition register specified. This patch
adds extra variants of the mnemonics to this effect.
Problem reported by Joerg Sonnenberger.
llvm-svn: 183686
the Mips16 port. A few of the psuedos could either take signed
or unsigned arguments and I did not distinguish the case and improperly
rejected some valid cases that the assembler had previously accepted
when they were pure pseudos that expanded as assembly instructions.
llvm-svn: 183633
Changes to ARM unwind opcode assembler:
* Fix multiple .save or .vsave directives. Besides, the
order is preserved now.
* For the directives which will generate multiple opcodes,
such as ".save {r0-r11}", the order of the unwind opcode
is fixed now, i.e. the registers with less encoding value
are popped first.
* Fix the $sp offset calculation. Now, we can use the
.setfp, .pad, .save, and .vsave directives at any order.
Changes to test cases:
* Add test cases to check the order of multiple opcodes
for the .save directive.
* Fix the incorrect $sp offset in the test case. The
stack pointer offset specified in the test case was
incorrect. (Changed test cases: ehabi-mc-section.ll and
ehabi-mc.ll)
* The opcode to restore $sp are slightly reordered. The
behavior are not changed, and the new output is same
as the output of GNU as. (Changed test cases:
eh-directive-pad.s and eh-directive-setfp.s)
llvm-svn: 183627
The register classes when emitting loads weren't quite restricting enough, leading to MI verification failure on the result register.
These are new failures that weren't there the first time I tried enabling ARM FastISel for new targets.
llvm-svn: 183624
Handle the case when the disassembler table can't tell
the difference between some encodings of QADD and CPS.
Add some necessary safe guards in CPS decoding as well.
llvm-svn: 183610
This is using a hint from AMD APP OpenCL Programming Guide with
empirically tweaked parameters.
I used Unigine Heaven 3.0 to determine best parameters on my system
(i7 2600/Radeon 6950/Kernel 3.9.4) the benchmark :
it went from 38.8 average fps to 39.6, which is ~3% gain.
(Lightmark 2008.2 gain is much more marginal: from 537 to 539)
There is no lit test provided as the parameter were determined
empirically and it it would be nearly impossiblet to find a test
program that check for optimal behavior.
llvm-svn: 183593
On PPC32, [su]div,rem on i64 types are transformed into runtime library
function calls. As a result, they are not allowed in counter-based loops (the
counter-loops verification pass caught this error; this change fixes PR16169).
llvm-svn: 183581
We weren't computing structure size correctly and we were relying on
the original alloca instruction to compute the offset, which isn't
always reliable.
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
llvm-svn: 183568
My recent ARM FastISel patch exposed this bug:
http://llvm.org/bugs/show_bug.cgi?id=16178
The root cause is that it can't select integer sext/zext pre-ARMv6 and
asserts out.
The current integer sext/zext code doesn't handle other cases gracefully
either, so this patch makes it handle all sext and zext from i1/i8/i16
to i8/i16/i32, with and without ARMv6, both in Thumb and ARM mode. This
should fix the bug as well as make FastISel faster because it bails to
SelectionDAG less often. See fastisel-ext.patch for this.
fastisel-ext-tests.patch changes current tests to always use reg-imm AND
for 8-bit zext instead of UXTB. This simplifies code since it is
supported on ARMv4t and later, and at least on A15 both should perform
exactly the same (both have exec 1 uop 1, type I).
2013-05-31-char-shift-crash.ll is a bitcode version of the above bug
16178 repro.
fast-isel-ext.ll tests all sext/zext combinations that ARM FastISel
should now handle.
Note that my ARM FastISel enabling patch was reverted due to a separate
failure when dealing with MCJIT, I'll fix this second failure and then
turn FastISel on again for non-iOS ARM targets.
I've tested "make check-all" on my x86 box, and "lnt test-suite" on A15
hardware.
llvm-svn: 183551
Add earlyclobber constaints to prevent input register being allocated as
the output register because, according to Intel spec [1], "If any pair
of the index, mask, or destination registers are the same, this
instruction results a UD fault."
---
[1] http://software.intel.com/sites/default/files/319433-014.pdf
llvm-svn: 183327
Add some generic SchedWrites and assign resources for Swift and Cortex A9.
Reapply of r183257. (Removed empty InstRW for division on swift)
llvm-svn: 183319
An instruction with less than 3 inputs is trivially a fast immediate shift.
Reapply of 183256, should not have caused the tablegen segfault on linux either.
llvm-svn: 183314
In ELF (as in MachO), not all relocations point to symbols. Represent this
properly by using a symbol_iterator instead of a SymbolRef. Update llvm-readobj
ELF's dumper to handle relocatios without symbols.
llvm-svn: 183284
The ARM backend did not expect LDRBi12 to hold a constant pool operand.
Allow for LLVM to deal with the instruction similar to how it deals with
LDRi12.
This fixes PR16215.
llvm-svn: 183238
The CopyToReg nodes will sometimes try to copy a value from a VGPR to an
SGPR. This kind of copy is not possible, so we need to detect
VGPR->SGPR copies and do something else. The current strategy is to
replace these copies with VGPR->VGPR copies and hope that all the users
of CopyToReg can accept VGPRs as arguments.
llvm-svn: 183132
The MOV64ri64i32 instruction required hacky MCInst lowering because it
was allocated as setting a GR64, but the eventual instruction ("movl")
only set a GR32. This converts it into a so-called "MOV32ri64" which
still accepts a (appropriate) 64-bit immediate but defines a GR32.
This is then converted to the full GR64 by a SUBREG_TO_REG operation,
thus keeping everyone happy.
This fixes a typo in the opcode field of the original patch, which
should make the legact JIT work again (& adds test for that problem).
llvm-svn: 183068
NOTE: If this broke your out-of-tree backend, in *RegisterInfo.td, change
the instances of SubRegIndex that have a comps template arg to use the
ComposedSubRegIndex class instead.
In TableGen land, this adds Size and Offset attributes to SubRegIndex,
and the ComposedSubRegIndex class, for which the Size and Offset are
computed by TableGen. This also adds an accessor in MCRegisterInfo, and
Size/Offsets for the X86 and ARM subreg indices.
llvm-svn: 183020
These instructions are deprecated oddities, but we still need to be able to
disassemble (and reassemble) them if and when they're encountered.
Patch by Amaury de la Vieuville.
llvm-svn: 183011
The disassembly of VEXT instructions was too lax in the bits checked. This
fixes the case where the instruction affects Q-registers but a misaligned lane
was specified (should be UNDEFINED).
Patch by Amaury de la Vieuville
llvm-svn: 183003
Unlike most -- hopefully "all other", but I'm still checking -- memory
instructions we support, LOAD REVERSED and STORE REVERSED may access
the memory location several times. This means that they are not suitable
for volatile loads and stores.
This patch is a prerequisite for better atomic load and store support.
The same principle applies there: almost all memory instructions we
support are inherently atomic ("block concurrent"), but LOAD REVERSED
and STORE REVERSED are exceptions.
Other instructions continue to allow volatile operands. I will add
positive "allows volatile" tests at the same time as the "allows atomic
load or store" tests.
llvm-svn: 183002
Now that 3.3 is branched, we are re-enabling virtual registers to help
iron out bugs before the next release. Some of the post-RA passes do
not play well with virtual registers, so we disable them for now. The
needed functionality of the PrologEpilogInserter pass is copied to a
new backend-specific NVPTXPrologEpilog pass.
The test for this commit is not breaking the existing tests.
llvm-svn: 182998
The MOV64ri64i32 instruction required hacky MCInst lowering because it was
allocated as setting a GR64, but the eventual instruction ("movl") only set a
GR32. This converts it into a so-called "MOV32ri64" which still accepts a
(appropriate) 64-bit immediate but defines a GR32. This is then converted to
the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy.
llvm-svn: 182991
Instead of having a bunch of separate MOV8r0, MOV16r0, ... pseudo-instructions,
it's better to use a single MOV32r0 (which will expand to "xorl %reg, %reg")
and obtain other sizes with EXTRACT_SUBREG and SUBREG_TO_REG. The encoding is
smaller and partial register updates can sometimes be avoided.
Until recently, this sequence was a barrier to rematerialization though. That
should now be fixed so it's an appropriate time to make the change.
llvm-svn: 182928
32-bit writes on amd64 zero out the high bits of the corresponding 64-bit
register. LLVM makes use of this for zero-extension, but until now relied on
custom MCLowering and other code to fixup instructions. Now we have proper
handling of sub-registers, this can be done by creating SUBREG_TO_REG
instructions at selection-time.
Should be no change in functionality.
llvm-svn: 182921
The code to distinguish between unaligned and aligned addresses was
already there, so this is mostly just a switch-on-and-test process.
llvm-svn: 182920
Fixes PR16146: gdb.base__call-ar-st.exp fails after
pre-RA-sched=source fixes.
Patch by Xiaoyi Guo!
This also fixes an unsupported dbg.value test case. Codegen was
previously incorrect but the test was passing by luck.
llvm-svn: 182885
This corrects a problem where x86 instructions that implicitly define/use both
an A-register (RAX, EAX, ..) and EFLAGS were declared as only defining/using
EFLAGS, because the outer "let Defs/Uses = [EFLAGS]" in the various multiclasses
overrides the "let Defs/Uses = [areg]" in BinOpAI.
The instructions deriving from BinOpAI were moved out of the "let Defs", and a
BinOpAI_FF class was created, for instructions that implicitly define and use
EFLAGS and the A-register (SBC, ADC).
llvm-svn: 182883
FastISel was only enabled for iOS ARM and Thumb2, this patch enables it
for ARM (not Thumb2) on Linux and NaCl.
Thumb2 support needs a bit more work, mainly around register class
restrictions.
The patch punts to SelectionDAG when doing TLS relocation on non-Darwin
targets. I will fix this and other FastISel-to-SelectionDAG failures in
a separate patch.
The patch also forces FastISel to retain frame pointers: iOS always
keeps them for backtracking (so emitted code won't change because of
this), but Linux was getting much worse code that was incorrect when
using big frames (such as test-suite's lencod). I'll also fix this in a
later patch, it will probably require a peephole so that FastISel
doesn't rematerialize frame pointers back-to-back.
The test changes are straightforward, similar to:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130513/174279.html
They also add a vararg test that got dropped in that change.
I ran all of test-suite on A15 hardware with --optimize-option=-O0 and
all the tests pass.
llvm-svn: 182877
Tidy up three places where the register class for ARM and Thumb wasn't
restrictive enough:
- No PC dest for reg-reg add/orr/sub.
- No PC dest for shifts.
- No PC or SP for Thumb2 reg-imm add.
I encountered this while combining FastISel with
-verify-machineinstrs. These instructions defined registers whose
classes weren't restrictive enough, and the uses failed
verification. They're also undefined in the ISA, or would produce code
that FastISel wouldn't want. This doesn't fix the register class
narrowing issue (where uses should restrict definitions), and isn't
thorough, but it's a small step in the right direction.
llvm-svn: 182863
This patch solves the problem of numeric register values not being accepted:
../set_alias.s:1:11: error: expected valid expression after comma
.set r4,$4
^
The parsing of .set directive is changed and handling of symbols in code
as well to enable this feature.
The test example is added.
Patch by Vladimir Medic
llvm-svn: 182807
This patch adds support for the CRJ and CGRJ instructions. Support for
the immediate forms will be a separate patch.
The architecture has a large number of comparison instructions. I think
it's generally better to concentrate on using the "best" comparison
instruction first and foremost, then only use something like CRJ if
CR really was the natual choice of comparison instruction. The patch
therefore opportunistically converts separate CR and BRC instructions
into a single CRJ while emitting instructions in ISelLowering.
llvm-svn: 182764
When -ffast-math is in effect (on Linux, at least), clang defines
__FINITE_MATH_ONLY__ > 0 when including <math.h>. This causes the
preprocessor to include <bits/math-finite.h>, which renames the sqrt functions.
For instance, "sqrt" is renamed as "__sqrt_finite".
This patch adds the 3 new names in such a way that they will be treated
as equivalent to their respective original names.
llvm-svn: 182739
isConsecutiveLS is a slightly more general form of
SelectionDAG::isConsecutiveLoad. Aside from also handling stores, it also does
not assume equality of the chain operands is necessary. In the case of the PPC
backend, this chain condition is checked in a more general way by the
surrounding code.
Mostly, this part of the refactoring in preparation for supporting optimized
unaligned stores.
llvm-svn: 182723
When expanding unaligned Altivec loads, we use the decremented offset trick to
prevent page faults. Unfortunately, if we have a sequence of consecutive
unaligned loads, this leads to suboptimal code generation because the 'extra'
load from the first unaligned load can be combined with the base load from the
second (but only if the decremented offset trick is not used for the first).
Search up and down the chain, through loads and token factors, looking for
consecutive loads, and if one is found, don't use the offset reduction trick.
These duplicate loads are later combined to yield the desired sequence (in the
future, we might want a more-powerful chain search, but that will require some
changes to allow the combiner routines to access the AA object).
This should complete the initial implementation of the optimized unaligned
Altivec load expansion. There is some refactoring that should be done, but
that will happen when the unaligned store expansion is added.
llvm-svn: 182719
The lvsl permutation control instruction is a function only of the alignment of
the pointer operand (relative to the 16-byte natural alignment of Altivec
vectors). As a result, multiple lvsl intrinsics where the operands differ by a
multiple of 16 can be combined.
llvm-svn: 182708
Altivec only directly supports aligned loads, but the loads have a strange
property: If given an unaligned address, they truncate the address to the next
lower aligned address, and load from there. This property, along with an extra
load and some special-purpose permutation-control instructions that generate
the appropriate permutations from the original unaligned address, allow
efficient lowering of aligned loads. This code uses the trick explained in the
Apple Velocity Engine optimization overview document to prevent the needed
extra load from possibly causing a page fault if the original address happens
to be aligned.
As noted in the FIXMEs, there are several additional optimizations that can be
performed to reduce the cost of these loads even more. These will be
implemented in future commits.
llvm-svn: 182691
Previously, an invalid instruction like:
foo %r1, %r0
would generate the rather odd error message:
....: error: unknown token in expression
foo %r1, %r0
^
We now get the more informative:
....: error: invalid instruction
foo %r1, %r0
^
The same would happen if an address were used where a register was expected.
We now get "invalid operand for instruction" instead.
llvm-svn: 182644
The idea is to make sure that:
(1) "register expected" is restricted to cases where ParseRegister()
is called and the token obviously isn't a register.
(2) "invalid register" is restricted to cases where a register-like "%..."
sequence is found, but the "..." makes no sense.
(3) the generic "invalid operand for instruction" is used in cases where
the wrong register type is used (GPR instead of FPR, etc.).
(4) the new "invalid register pair" is used if the register has the right type,
but is not a valid register pair.
Testing of (1)-(3) is now restricted to regs-bad.s. It uses a representative
instruction for each register class to make sure that only registers from
that class are accepted.
(4) is tested by both regs-bad.s (which checks all invalid register pairs)
and insn-bad.s (which tests one invalid pair for each instruction that
requires a pair).
While there, I changed "Number" to "Num" for consistency with the
operand class.
llvm-svn: 182643
There was exactly one caller using this API right, the others were relying on
specific behavior of the default implementation. Since it's too hard to use it
right just remove it and standardize on the default behavior.
Defines away PR16132.
llvm-svn: 182636
This patch builds on some existing code to do CFG reconstruction from
a disassembled binary:
- MCModule represents the binary, and has a list of MCAtoms.
- MCAtom represents either disassembled instructions (MCTextAtom), or
contiguous data (MCDataAtom), and covers a specific range of addresses.
- MCBasicBlock and MCFunction form the reconstructed CFG. An MCBB is
backed by an MCTextAtom, and has the usual successors/predecessors.
- MCObjectDisassembler creates a module from an ObjectFile using a
disassembler. It first builds an atom for each section. It can also
construct the CFG, and this splits the text atoms into basic blocks.
MCModule and MCAtom were only sketched out; MCFunction and MCBB were
implemented under the experimental "-cfg" llvm-objdump -macho option.
This cleans them up for further use; llvm-objdump -d -cfg now generates
graphviz files for each function found in the binary.
In the future, MCObjectDisassembler may be the right place to do
"intelligent" disassembly: for example, handling constant islands is just
a matter of splitting the atom, using information that may be available
in the ObjectFile. Also, better initial atom formation than just using
sections is possible using symbols (and things like Mach-O's
function_starts load command).
This brings two minor regressions in llvm-objdump -macho -cfg:
- The printing of a relocation's referenced symbol.
- An annotation on loop BBs, i.e., which are their own successor.
Relocation printing is replaced by the MCSymbolizer; the basic CFG
annotation will be superseded by more related functionality.
llvm-svn: 182628
This is a basic first step towards symbolization of disassembled
instructions. This used to be done using externally provided (C API)
callbacks. This patch introduces:
- the MCSymbolizer class, that mimics the same functions that were used
in the X86 and ARM disassemblers to symbolize immediate operands and
to annotate loads based off PC (for things like c string literals).
- the MCExternalSymbolizer class, which implements the old C API.
- the MCRelocationInfo class, which provides a way for targets to
translate relocations (either object::RelocationRef, or disassembler
C API VariantKinds) to MCExprs.
- the MCObjectSymbolizer class, which does symbolization using what it
finds in an object::ObjectFile. This makes simple symbolization (with
no fancy relocation stuff) work for all object formats!
- x86-64 Mach-O and ELF MCRelocationInfos.
- A basic ARM Mach-O MCRelocationInfo, that provides just enough to
support the C API VariantKinds.
Most of what works in otool (the only user of the old symbolization API
that I know of) for x86-64 symbolic disassembly (-tvV) works, namely:
- symbol references: call _foo; jmp 15 <_foo+50>
- relocations: call _foo-_bar; call _foo-4
- __cf?string: leaq 193(%rip), %rax ## literal pool for "hello"
Stub support is the main missing part (because libObject doesn't know,
among other things, about mach-o indirect symbols).
As for the MCSymbolizer API, instead of relying on the disassemblers
to call the tryAdding* methods, maybe this could be done automagically
using InstrInfo? For instance, even though PC-relative LEAs are used
to get the address of string literals in a typical Mach-O file, a MOV
would be used in an ELF file. And right now, the explicit symbolization
only recognizes PC-relative LEAs. InstrInfo should have already have
most of what is needed to know what to symbolize, so this can
definitely be improved.
I'd also like to remove object::RelocationRef::getValueString (it seems
only used by relocation printing in objdump), as simply printing the
created MCExpr is definitely enough (and cleaner than string concats).
llvm-svn: 182625
Now that there is no longer any distinction between symbolLo
and symbolHi operands in either printing, encoding, or parsing,
the operand types can be removed in favor of simply using
s16imm.
This completes the patch series to decouple lo/hi operand part
processing from the particular instruction whose operand it is.
No change in code generation expected from this patch.
llvm-svn: 182618
When targeting the Darwin assembler, we need to generate markers ha16() and
lo16() to designate the high and low parts of a (symbolic) immediate. This
is necessary not just for plain symbols, but also for certain symbolic
expression, typically along the lines of ha16(A - B). The latter doesn't
work when simply using VariantKind flags on the symbol reference.
This is why the current back-end uses hacks (explicitly called out as such
via multiple FIXMEs) in the symbolLo/symbolHi print methods.
This patch uses target-defined MCExpr codes to represent the Darwin
ha16/lo16 constructs, following along the lines of the equivalent solution
used by the ARM back end to handle their :upper16: / :lower16: markers.
This allows us to get rid of special handling both in the symbolLo/symbolHi
print method and in the common code MCExpr::print routine. Instead, the
ha16 / lo16 markers are printed simply in a custom print routine for the
target MCExpr types. (As a result, the symbolLo/symbolHi print methods
can now replaced by a single printS16ImmOperand routine that also handles
symbolic operands.)
The patch also provides a EvaluateAsRelocatableImpl routine to handle
ha16/lo16 constructs. This is not actually used at the moment by any
in-tree code, but is provided as it makes merging into David Fang's
out-of-tree Mach-O object writer simpler.
Since there is no longer any need to treat VK_PPC_GAS_HA16 and
VK_PPC_DARWIN_HA16 differently, they are merged into a single
VK_PPC_ADDR16_HA (and likewise for the _LO16 types).
llvm-svn: 182616
This implements the @llvm.readcyclecounter intrinsic as the specific
MRC instruction specified in the ARM manuals for CPUs with the Power
Management extensions.
Older CPUs had slightly different methods which may also have to be
implemented eventually, but this should cover all v7 cases.
rdar://problem/13939186
llvm-svn: 182603
Performance monitors, including a basic cycle counter, are an official
extension in the ARMv7 specification. This adds support for enabling and
disabling them, orthogonally from CPU selection.
rdar://problem/13939186
llvm-svn: 182602
The error was:
error: non-constant-expression cannot be narrowed from type 'long long' to 'long' in initializer list [-Wc++11-narrowing]
MI.getOperand(6).getImm() & 0x1F,
llvm-svn: 182584
Using PatLeaf rather than ImmLeaf when defining immediate predicates
prevents simple patterns using those predicates from being recognized
for fast instruction selection. This patch replaces the immSExt16
PatLeaf predicate with two ImmLeaf predicates, imm32SExt16 and
imm64SExt16, allowing a few more patterns to be recognized (ADDI,
ADDIC, MULLI, ADDI8, and ADDIC8). Using the new predicates does not
help for LI, LI8, SUBFIC, and SUBFIC8 because these are rejected for
other reasons, but I see no reason to retain the PatLeaf predicate.
No functional change intended, and thus no test cases yet. This is
preliminary work for enabling fast-isel support for PowerPC. When
that support is ready, we'll be able to test this function.
llvm-svn: 182510
Addresses a review comment from Ulrich Weigand. No functional change intended.
I'm not sure whether the old TODO that this patch touches still holds,
but that's something we'd get to when adding a targetted scheduling
description.
llvm-svn: 182474
The original version of the pass could underestimate the length of a backward
branch in cases like:
alignment to N bytes or more
...
relaxable branch A
...
foo: (aligned to M<N bytes)
...
bar: (aligned to N bytes)
...
relaxable branch B to foo
We don't add any misalignment gap for "bar" because N bytes of alignment
had already been reached earlier in the function. In this case, assuming
that A is relaxed can push "foo" closer to "bar", and make B appear to be
in range. Similar problems can occur for forward branches.
I don't think it's possible to create blocks with mixed alignments as
things stand, not least because we haven't yet defined getPrefLoopAlignment()
for SystemZ (that would need benchmarking). So I don't think we can test
this yet.
Thanks to Rafael Espíndola for spotting the bug.
llvm-svn: 182460
Although I had added some support for the BDZ/BDNZ branches into the selector
(in r158204), I had not correctly adjusted the condition at the top of the
loop. As a result, these branches were still essentially unsupported.
This fixes PR16086. Unfortunately, any test case would be very large (because
it would need to force the loop backedge to exceed the range of the 16-bit
immediate).
llvm-svn: 182385
pic calls. These need to be there so we don't try and use helper
functions when we call those.
As part of this, make sure that we properly exclude helper functions in pic
mode when indirect calls are involved.
llvm-svn: 182343
By default, a teq instruction is inserted after integer divide. No divide-by-zero
checks are performed if option "-mnocheck-zero-division" is used.
llvm-svn: 182306
Now that the preheader insertion logic in LoopSimplify is externally exposed,
use it, and remove the copy-and-pasted version.
No functionality change intended.
llvm-svn: 182300
As the pairing of this instruction form with the bdnz/bdz branches is now
enforced by the verification pass, make it clear from the name that these
are used only for counter-based loops.
No functionality change intended.
llvm-svn: 182296
When asserts are enabled, this adds a verification pass for PPC counter-loop
formation. Unfortunately, without sacrificing code quality, there is no better
way of forming counter-based loops except at the (late) IR level. This means
that we need to recognize, at the IR level, anything which might turn into a
function call (or indirect branch). Because this is currently a finite set of
things, and because SelectionDAG lowering is basic-block local, this can be
done. Nevertheless, it is fragile, and failure results in a miscompile. This
verification pass checks that all (reachable) counter-based branches are
dominated by a loop mtctr instruction, and that no instructions in between
clobber the counter register. If these conditions are not satisfied, then an
ICE will be triggered.
In short, this is to help us sleep better at night.
llvm-svn: 182295
R600TextureIntrinsicsReplacer.cpp:232: warning: the address of ‘ArgsType’ will always evaluate as ‘true’
This doesn't have any effect on the output as a vararg intrinsic behaves the
same way as a non-vararg one.
llvm-svn: 182293