We update until we hit a fixpoint. This is probably slow but also
slightly simplifies the code. It should also fix the occasional
invalid domtrees observed when building with expensive checking.
I couldn't find a case where this had a measurable slowdown, but
if someone finds a pathological case where it does we may have
to find a cleverer way of updating dominators here.
Thanks to Duncan for the test case.
llvm-svn: 163091
Most of the code guarded with ANDROIDEABI are not
ARM-specific, and having no relation with arm-eabi.
Thus, it will be more natural to call this
environment "Android" instead of "ANDROIDEABI".
Note: We are not using ANDROID because several projects
are using "-DANDROID" as the conditional compilation
flag.
llvm-svn: 163087
NEON domain conversion was too heavy-handed with its widened
registers, which could have stripped existing instructions of their
dependency, leaving them vulnerable to scheduling errors.
llvm-svn: 163070
This reverts commit 5dd9e214fb92847e947f9edab170f9b4e52b908f.
Thanks to Duncan for explaining how this should have been done.
Conflicts:
test/CodeGen/X86/vec_select.ll
llvm-svn: 163064
output chain is correctly setup.
As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.
rdar://11457792
llvm-svn: 163036
Manage tied operands entirely internally to MachineInstr. This makes it
possible to change the representation of tied operands, as I will do
shortly.
The constraint that tied uses and defs must be in the same order was too
restrictive.
llvm-svn: 163021
- In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as
well as PSHUFB will zero elements with negative indices.
Patch by Sriram Murali <sriram.murali@intel.com>
llvm-svn: 163018
on the size of the extraction and its position in the 64 bit word.
This patch allows support of the dext transformations with mips64 direct
object output.
0 <= msb < 32 0 <= lsb < 32 0 <= pos < 32 1 <= size <= 32
DINS
The field is entirely contained in the right-most word of the doubleword
32 <= msb < 64 0 <= lsb < 32 0 <= pos < 32 2 <= size <= 64
DINSM
The field straddles the words of the doubleword
32 <= msb < 64 32 <= lsb < 64 32 <= pos < 64 1 <= size <= 32
DINSU
The field is entirely contained in the left-most word of the doubleword
llvm-svn: 163010
- Overloading operator<< for raw_ostream and pointers is dangerous, it alters
the behavior of code that includes the header.
- Remove unused ID.
- Use LLVM's byte swapping helpers instead of a hand-coded.
- Make ReadProfilingData work directly on a pointer.
No functionality change.
llvm-svn: 162992
The assembly string for the VMOVPQIto64rr instruction incorrectly lacked the 'v'
prefix, resulting in mis-assembly of the vanilla movd instruction.
llvm-svn: 162963
because it does not support CMOV of vectors. To implement this efficientlyi, we broadcast the condition bit and use a sequence of NAND-OR
to select between the two operands. This is the same sequence we use for targets that don't have vector BLENDs (like SSE2).
rdar://12201387
llvm-svn: 162926
- Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is
enabled.
As the penalty of inter-mixing SSE and AVX instructions, we need
prevent SSE legacy insn from being generated except explicitly
specified through some intrinsics. For patterns supported by both
SSE and AVX, so far, we force AVX insn will be tried first relying on
AddedComplexity or position in td file. It's error-prone and
introduces bugs accidentally.
'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited
by AVX, we need this predicate to force VEX encoding or SSE legacy
encoding only.
For insns not inherited by AVX, we still use the previous predicates,
i.e. 'HasSSEx'. So far, these insns fall into the following
categories:
* SSE insns with MMX operands
* SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH,
CRC, and etc.)
* SSE4A insns.
* MMX insns.
* x87 insns added by SSE.
2 test cases are modified:
- test/CodeGen/X86/fast-isel-x86-64.ll
AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be
selected by fast-isel due to complicated pattern and fast-isel
fallback to materialize it from constant pool.
- test/CodeGen/X86/widen_load-1.ll
AVX code generation is different from SSE one after fixing SSE/AVX
inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of
'vmovaps'.
llvm-svn: 162919
[Tobias von Koch] What's happening here is that the CR6SET/CR6UNSET is breaking the chain of register copies glued to the function call (BL_SVR4 node). The scheduler then moves other instructions in between those and the function call, which isn't good!
Right. That's the case where there is no chain of register copies before the call, so InFlag == 0... Attached is a new revision of the patch which should fix this for good.
llvm-svn: 162916
The old PHI updating code in loop-rotate was replaced with SSAUpdater a while
ago, it has no problems with comples PHIs. What had to be fixed is detecting
whether a loop was already rotated and updating dominators when multiple exits
were present.
This change increases overall code size a bit, mostly due to additional loop
unrolling opportunities. Passes test-suite and selfhost with -verify-dom-info.
Fixes PR7447.
Thanks to Andy for the input on the domtree updating code.
llvm-svn: 162912
When a MachineInstr is constructed, its implicit operands are added
first, then the explicit operands are inserted before the implicits.
MCInstrDesc has oprand flags like early clobber and operand ties that
apply to the explicit operands.
Don't look at those flags when the implicit operands are first added in
the explicit operands's positions.
llvm-svn: 162910
- The root cause is that target constant materialization in X86 fast-isel
creates a PC-rel addressing which may overflow 32-bit range in non-Small code
model if .rodata section is allocated too far away from code segment in
MCJIT, which uses Large code model so far.
- Follow the similar logic to fix non-Small code model in fast-isel by skipping
non-Small code model.
llvm-svn: 162881
When there are multiple tied use-def pairs on an inline asm instruction,
the tied uses must appear in the same order as the defs.
It is possible to write an LLVM IR inline asm instruction that breaks
this constraint, but there is no reason for a front end to emit the
operands out of order.
The gnu inline asm syntax specifies tied operands as a single read/write
constraint "+r", so ouf of order operands are not possible.
llvm-svn: 162878
Tombstones and full hash collisions are rare, mark the "empty"
and "no collision" paths as likely. The bug in simplifycfg
that prevented the hints from being picked during selfhost
up was fixed recently :)
llvm-svn: 162874
For normal instructions, isTied() is set automatically by addOperand(),
based on MCInstrDesc, but inline asm has tied operands outside the
descriptor.
llvm-svn: 162869
Ordered memory operations are more constrained than volatile loads and
stores because they must be ordered with respect to all other memory
operations.
llvm-svn: 162861
It is technically allowed to move a normal load across a volatile load,
but probably not a good idea.
It is not allowed to move a load across an atomic load with
Ordering > Monotonic, and we model those with MOVolatile as well.
I recently removed the mayStore flag from atomic load instructions, so
they don't need a pseudo-opcode. This patch makes up for the difference.
llvm-svn: 162857
This lets the user run the program from a different directory and still have the
.gcda files show up in the correct place.
<rdar://problem/12179524>
llvm-svn: 162855
We need to reserve space for the mandatory traceback fields,
though leaving them as zero is appropriate for now.
Although the ABI calls for these fields to be filled in fully, no
compiler on Linux currently does this, and GDB does not read these
fields. GDB uses the first word of zeroes during exception handling to
find the end of the function and the size field, allowing it to compute
the beginning of the function. DWARF information is used for everything
else. We need the extra 8 bytes of pad so the size field is found in
the right place.
As a comparison, GCC fills in a few of the fields -- language, number
of saved registers -- but ignores the rest. IBM's proprietary OSes do
make use of the full traceback table facility.
Patch by Bill Schmidt.
llvm-svn: 162854
The operands on an INLINEASM machine instruction are divided into groups
headed by immediate flag operands. Verify this structure.
Extract verifyTiedOperands(), and only call it for non-inlineasm
instructions.
llvm-svn: 162849
This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
is specified. This has been a problem for a long time but became more severe
with the recent memory builtin improvements.
Since the memory builtin functions are used everywhere, this required passing
TLI in many places. This means that functions that now have an optional TLI
argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
mallocs anymore if the TLI argument is missing. I've updated most passes to do
the right thing.
Fixes PR13694 and probably others.
llvm-svn: 162841
I have tested the fix, but have not been successfull in generating
a robust unit test. This can only be exposed through particular
register assignments.
llvm-svn: 162821
WHen running with -verify-machineinstrs, check that tied operands come
in matching use/def pairs, and that they are consistent with MCInstrDesc
when it applies.
llvm-svn: 162816
The isTied bit is set automatically when a tied use is added and
MCInstrDesc indicates a tied operand. The tie is broken when one of the
tied operands is removed.
llvm-svn: 162814
This patch implements ProfileDataLoader which loads profile data generated by
-insert-edge-profiling and updates branch weight metadata accordingly.
Patch by Alastair Murray.
llvm-svn: 162799
on the size of the extraction and its position in the 64 bit word.
This patch allows support of the dext transformations with mips64 direct
object output.
0 <= msb < 32 0 <= lsb < 32 0 <= pos < 32 1 <= size <= 32
DINS
The field is entirely contained in the right-most word of the doubleword
32 <= msb < 64 0 <= lsb < 32 0 <= pos < 32 2 <= size <= 64
DINSM
The field straddles the words of the doubleword
32 <= msb < 64 32 <= lsb < 64 32 <= pos < 64 1 <= size <= 32
DINSU
The field is entirely contained in the left-most word of the doubleword
llvm-svn: 162782
transformed to the final instruction variant. An
example would be dsrll which is transformed into
dsll32 if the shift value is greater than 32.
For direct object output we need to do this transformation
in the codegen. If the instruction was inside branch
delay slot, it was being missed. This patch corrects this
oversight.
llvm-svn: 162779
traceback table on PowerPC64. This helps gdb handle exceptions. The other
mandatory fields are ignored by gdb and harder to implement so just add
there a FIXME.
Patch by Bill Schmidt. PR13641.
llvm-svn: 162778
Fix a couple of bugs in mips' long branch pass.
This patch was supposed to be committed along with r162731, so I don't have a
new test case.
llvm-svn: 162777
While in SSA form, a MachineInstr can have pairs of tied defs and uses.
The tied operands are used to represent read-modify-write operands that
must be assigned the same physical register.
Previously, tied operand pairs were computed from fixed MCInstrDesc
fields, or by using black magic on inline assembly instructions.
The isTied flag makes it possible to add tied operands to any
instruction while getting rid of (some of) the inlineasm magic.
Tied operands on normal instructions are needed to represent predicated
individual instructions in SSA form. An extra <tied,imp-use> operand is
required to represent the output value when the instruction predicate is
false.
Adding a predicate to:
%vreg0<def> = ADD %vreg1, %vreg2
Will look like:
%vreg0<tied,def> = ADD %vreg1, %vreg2, pred:3, %vreg7<tied,imp-use>
The virtual register %vreg7 is the value given to %vreg0 when the
predicate is false. It will be assigned the same physreg as %vreg0.
This commit adds the isTied flag and sets it based on MCInstrDesc when
building an instruction. The flag is not used for anything yet.
llvm-svn: 162774
Register operands are manipulated by a lot of target-independent code,
and it is not always possible to preserve target flags. That means it is
not safe to use target flags on register operands.
None of the targets in the tree are using register operand target flags.
External targets should be using immediate operands to annotate
instructions with operand modifiers.
llvm-svn: 162770
No test case, undefined shifts get folded early, but can occur when other
transforms generate a constant. Thanks to Duncan for bringing this up.
llvm-svn: 162755
- Add a target-specific DAG optimization to recognize a pattern PTEST-able.
Such a pattern is a OR'd tree with X86ISD::OR as the root node. When
X86ISD::OR node has only its flag result being used as a boolean value and
all its leaves are extracted from the same vector, it could be folded into an
X86ISD::PTEST node.
llvm-svn: 162735
These extra flags are not required to properly order the atomic
load/store instructions. SelectionDAGBuilder chains atomics as if they
were volatile, and SelectionDAG::getAtomic() sets the isVolatile bit on
the memory operands of all atomic operations.
The volatile bit is enough to order atomic loads and stores during and
after SelectionDAG.
This means we set mayLoad on atomic_load, mayStore on atomic_store, and
mayLoad+mayStore on the remaining atomic read-modify-write operations.
llvm-svn: 162733
This wasn't the right way to enforce ordering of atomics.
We are already setting the isVolatile bit on memory operands of atomic
operations which is good enough to enforce the correct ordering.
llvm-svn: 162732
Instructions emitted to compute branch offsets now use immediate operands
instead of symbolic labels. This change was needed because there were problems
when R_MIPS_HI16/LO16 relocations were used to make shared objects.
llvm-svn: 162731
Slight reorganisation of PPC instruction classes for scheduling. No
functionality change for existing subtargets.
- Clearly separate load/store-with-update instructions from regular loads and stores.
- Split IntRotateD -> IntRotateD and IntRotateDI
- Split out fsub and fadd from FPGeneral -> FPAddSub
- Update existing itineraries
Patch by Tobias von Koch.
llvm-svn: 162729
In SelectionDAGLegalize::ExpandLegalINT_TO_FP, expand INT_TO_FP nodes without
using any f64 operations if f64 is not a legal type.
Patch by Stefan Kristiansson.
llvm-svn: 162728
Allow load-immediates to be rematerialised in the register coalescer for
PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail,
because it relies on a register move getting emitted. The immediate load is
equivalent, so change this test case.
Patch by Tobias von Koch.
llvm-svn: 162727
Adds the vendor 'fsl' (used by Freescale SDK) to Triple. This will allow
clang support for Freescale cross-compile configurations.
Patch by Tobias von Koch.
llvm-svn: 162726
The 32-bit ABI requires CR bit 6 to be set if the call has fp arguments and
unset if it doesn't. The solution up to now was to insert a MachineNode to
set/unset the CR bit, which produces a CR vreg. This vreg was then copied
into CR bit 6. When the register allocator saw a bunch of these in the same
function, it allocated the set/unset CR bit in some random CR register (1
extra instruction) and then emitted CR moves before every vararg function
call, rather than just setting and unsetting CR bit 6 directly before every
vararg function call. This patch instead inserts a PPCcrset/PPCcrunset
instruction which are then matched by a dedicated instruction pattern.
Patch by Tobias von Koch.
llvm-svn: 162725
The zeroextend IR instruction is lowered to an 'and' node with an immediate
mask operand, which in turn gets legalised to a sequence of ori's & ands.
This can be done more efficiently using the rldicl instruction.
Patch by Tobias von Koch.
llvm-svn: 162724
It is not safe to use normal LDR instructions because they may be
reordered by the scheduler. The ATOMIC_LDR pseudos have a mayStore flag
that prevents reordering.
Atomic loads are also prevented from participating in rematerialization
and load folding.
llvm-svn: 162713
This section (introduced in DWARF-3) is used to define instruction address
ranges for functions that are not contiguous and can't be described
by low_pc/high_pc attributes (this is the usual case for inlined subroutines).
The patch is the first step to support fetching complete inlining info from DWARF.
Reviewed by Benjamin Kramer.
llvm-svn: 162657
It's not clear that they should be marked as such, but tbb formation
fails if t2LEApcrelJT is hoisted of of a loop.
This doesn't change the flags on these instructions,
UnmodeledSideEffects was already inferred from the missing pattern.
llvm-svn: 162603
The ARM BL and BLX instructions don't have predicate operands, but the
thumb variants tBL and tBLX do.
The argument registers should be added as implicit uses.
llvm-svn: 162593
the case of multiple edges from one block to another.
A simple example is a switch statement with multiple values to the same
destination. The definition of an edge is modified from a pair of blocks to
a pair of PredBlock and an index into the successors.
Also set the weight correctly when building SelectionDAG from LLVM IR,
especially when converting a Switch.
IntegersSubsetMapping is updated to calculate the weight for each cluster.
llvm-svn: 162572
No intended behavior change. This was introduced in r162023. With the fixed
algorithm a Release build of ARMInstPrinter.cpp goes from 16s to 10s on a
2011 MBP.
llvm-svn: 162559