Most constant BUILD_VECTOR's are matched using ComplexPatterns which cover
bitcasted as well as normal vectors. However, it doesn't seem to be possible to
match ldi.[bhwd] in a type-agnostic manner (e.g. to support the widest range of
immediates, it should be possible to use ldi.b to load v2i64) using TableGen so
ldi.[bhwd] is matched using custom code in MipsSEISelDAGToDAG.cpp
This made the majority of the constant splat BUILD_VECTOR lowering redundant.
The only transformation remaining for constant splats is when an (up-to) 32-bit
constant splat is possible but the value does not fit into a 10-bit signed
integer. In this case, the BUILD_VECTOR is transformed into a bitcasted
BUILD_VECTOR so that fill.[bhw] can be used to splat the vector from a GPR32
register (which is initialized using the usual lui/addui sequence).
There are no additional tests since this is a re-implementation of previous
functionality. The change is intended to make it easier to implement some of
the upcoming instruction selection patches since they can rely on existing
support for BUILD_VECTOR's in the DAGCombiner.
compare_float.ll changed slightly because a BITCAST is no longer
introduced during legalization.
llvm-svn: 191299
Patch by Ana Pazos.
1.Added support for v1ix and v1fx types.
2.Added Scalar Pairwise Reduce instructions.
3.Added initial implementation of Scalar Arithmetic instructions.
llvm-svn: 191263
Sometimes a copy from a vreg -> vreg sneaks into the middle of a terminator
sequence. It is safe to slice this into the stack protector success bb.
This fixes PR16979.
llvm-svn: 191260
The recursive nature of the address selection code can cause the stack to
explode if there is a long chain of GEPs. Convert the recursive bit into a
iterative method to avoid this.
<rdar://problem/12445434>
llvm-svn: 191252
Changes to MIPS SelectionDAG:
* Added nodes VEXTRACT_[SZ]EXT_ELT to represent extract and extend in a single
operation and implemented the DAG combines necessary to fold sign/zero
extends into the extract.
llvm-svn: 191199
Note: There's a later patch on my branch that re-implements this to select
build_vector without the custom SelectionDAG nodes. The future patch avoids
the constant-folding problems stemming from the custom node (i.e. it doesn't
need to re-implement all the DAG combines related to BUILD_VECTOR).
Changes to MIPS specific SelectionDAG nodes:
* Added VSPLAT
This is a special case of BUILD_VECTOR that covers the case the
BUILD_VECTOR is a splat operation.
* Added VSPLATD
This is a special case of VSPLAT that handles the cases when v2i64 is legal
llvm-svn: 191191
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.
Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.
This should fix PR15840.
llvm-svn: 191165
In AVX 256bit vectors are valid vectors and therefore the Type Legalizer doesn't
split the VSELECT and SETCC nodes. AVX only supports MIN/MAX on 128bit vectors
and this fix enables vector splitting for this special case in the X86 DAG
Combiner.
This fix is related to PR16695, PR17002, and <rdar://problem/14594431>.
llvm-svn: 191131
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask for the given target. This mask has usually
te same size as the VSELECT return type (except for Intel KNL). Now the type
legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
llvm-svn: 191130
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.
This commit extends the DAGCombiner in the way that the pattern
(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))
is folded into
([az]ext (rotl x, y))
The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.
This fixes PR16726.
llvm-svn: 191049
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.
This commit extends the DAGCombiner in the way that the pattern
(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))
is folded into
([az]ext (rotl x, y))
The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.
This fixes PR16726.
llvm-svn: 191045
When selecting the DAG (add (WrapperRIP ...), (FrameIndex ...)), X86 code had
spotted the FrameIndex possibility and was working out whether it could fold
the WrapperRIP into this.
The test for forming a %rip version is notionally whether we already have a
base or index register (%rip precludes both), but we were forgetting to account
for the register that would be inserted later to access the frame.
rdar://problem/15024520
llvm-svn: 190995
1) make sure that the first two instructions of the sequence cannot
separate from each other. The linker requires that they be sequential.
If they get separated, it can still work but it will not work in all
cases because the first of the instructions mostly involves the hi part
of the pc relative offset and that part changes slowly. You would have
to be at the right boundary for this to matter.
2) make sure that this sequence begins on a longword boundary.
There appears to be a bug in binutils which makes some of these calculations
get messed up if the instruction sequence does not begin on a longword
boundary. This is being investigated with the appropriate binutils folks.
llvm-svn: 190966
For some reason I never got around to adding these at the same time as
the signed versions. No idea why.
I'm not sure whether this SystemZII::BranchC* stuff is useful, or whether
it should just be replaced with an "is normal" flag. I'll leave that
for later though.
There are some boundary conditions that can be tweaked, such as preferring
unsigned comparisons for equality with [128, 256), and "<= 255" over "< 256",
but again I'll leave those for a separate patch.
llvm-svn: 190930
Summary:
We indicate that the object files are safe by emitting a @feat.00
absolute address symbol. The address is presumably interpreted as a
bitfield of features that the compiler would like to enable. Bit 0 is
documented in the PE COFF spec to opt in to "registered SEH", which is
what /safeseh enables.
LLVM's object files are safe by default because LLVM doesn't know how to
produce SEH handlers.
Reviewers: Bigcheese
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1691
llvm-svn: 190898
Large code model on PPC64 requires creating and referencing TOC entries when
using the addis/ld form of addressing. This was not being done in all cases.
The changes in this patch to PPCAsmPrinter::EmitInstruction() fix this. Two
test cases are also modified to reflect this requirement.
Fast-isel was not creating correct code for loading floating-point constants
using large code model. This also requires the addis/ld form of addressing.
Previously we were using the addis/lfd shortcut which is only applicable to
medium code model. One test case is modified to reflect this requirement.
llvm-svn: 190882
When a truncate node defines a legal vector type but uses an illegal
vector type, the legalization process was splitting the vector until
<1 x vector> type, but then it was failing to scalarize the node because
it did not know how to handle TRUNCATE.
<rdar://problem/14989896>
llvm-svn: 190830
- check that -mcpu=slm uses the call register indirect optimization
- check that -mcpu=slm runs the scheduler
- check that -mcpu=slm supports the movbe instruction
llvm-svn: 190814
The port originally had special patterns for extload, mapping them to the
same instructions as sextload. It seemed neater to have patterns that
match "an extension that is allowed to be signed" and "an extension that
is allowed to be unsigned".
This was originally meant to be a clean-up, but it does improve the handling
of promoted integers a little, as shown by args-06.ll.
llvm-svn: 190777
This is a re-commit of r190764, with an extra check to make sure that we're not
performing the transformation on illegal types (a small test case has been
added for this as well).
Original commit message:
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.
llvm-svn: 190771
This is causing test-suite failures.
Original commit message:
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.
llvm-svn: 190765
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.
llvm-svn: 190764
DAGCombiner::isAlias can be called with SrcValue1 or SrcValue2 null, and we
can't use AA in this case (if we try, then the casting code in AA will assert).
llvm-svn: 190763
When a structure is passed by value, and that structure contains a vector
member, according to the PPC ABI, the structure will receive enhanced alignment
(so that the vector within the structure will always be aligned).
This should resolve PR16641.
llvm-svn: 190636
In fast-math mode sqrt(x) is calculated using the fast expansion of the
reciprocal of the reciprocal sqrt expansion. The reciprocal and reciprocal
sqrt expansions use the associated estimate instructions along with some Newton
iterations. Unfortunately, as a result, sqrt(0) was being calculated as NaN,
which is not correct. Now we explicitly return a result of zero if the input is
zero.
llvm-svn: 190624
Aggressive anti-dependency breaking is enabled by default for all PPC cores.
This provides a general speedup on the P7 and other platforms (among other
factors, the instruction group formation for the non-embedded PPC cores is done
during post-RA scheduling). In order to do this safely, the incompatibility
between uses of the MFOCRF instruction and anti-dependency breaking are
resolved by marking MFOCRF with hasExtraSrcRegAllocReq. As noted in the removed
FIXME, the problem was that MFOCRF's output is sensitive to the identify of the
source register, and always paired with a shift to undo this effect. Because
anti-dependency breaking is unaware of this hidden dependency of the shift
amount on the source register of the MFOCRF instruction, changing that register
must be inhibited.
Two test cases were adjusted: The SjLj test was made more insensitive to
register choices and scheduling; the saveCR test disabled anti-dependency
breaking because part of what it is testing is proper register reuse.
llvm-svn: 190587
For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.
The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
a resource descriptor might be nicer.
The maximum number of input SGPRs is bumped to 17.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190575
The main complication here is that TM and TMY (the memory forms) set
CC differently from the register forms. When the tested bits contain
some 0s and some 1s, the register forms set CC to 1 or 2 based on the
value the uppermost bit. The memory forms instead set CC to 1
regardless of the uppermost bit.
Until now, I've tried to make it so that a branch never tests for an
impossible CC value. E.g. NR only sets CC to 0 or 1, so branches on the
result will only test for 0 or 1. Originally I'd tried to do the same
thing for TM and TMY by using custom matching code in ISelDAGToDAG.
That ended up being very ugly though, and would have meant duplicating
some of the chain checks that the common isel code does.
I've therefore gone for the simpler alternative of adding an extra
operand to the TM DAG opcode to say whether a memory form would be OK.
This means that the inverse of a "TM;JE" is "TM;JNE" rather than the
more precise "TM;JNLE", just like the inverse of "TMLL;JE" is "TMLL;JNE".
I suppose that's arguably less confusing though...
llvm-svn: 190400
Fix XCoreLowerThreadLocal trying to initialise globals
which have no initializer.
Add handling of const expressions containing thread local variables.
These need to be replaced with instructions, as the thread ID is
used to access the thread local variable.
llvm-svn: 190300
This sidesteps a bug in PrescheduleNodesWithMultipleUses() which
does not check if callResources will be affected by the transformation.
llvm-svn: 190299
We used to generate the compact unwind encoding from the machine
instructions. However, this had the problem that if the user used `-save-temps'
or compiled their hand-written `.s' file (with CFI directives), we wouldn't
generate the compact unwind encoding.
Move the algorithm that generates the compact unwind encoding into the
MCAsmBackend. This way we can generate the encoding whether the code is from a
`.ll' or `.s' file.
<rdar://problem/13623355>
llvm-svn: 190290
precision loads and stores as well as reg+imm double precision loads and stores.
Previously, expansion of loads and stores was done after register allocation,
but now it takes place during legalization. As a result, users will see double
precision stores and loads being emitted to spill and restore 64-bit FP registers.
llvm-svn: 190235
Field 2 of DIType (Context), field 9 of DIDerivedType (TypeDerivedFrom),
field 12 of DICompositeType (ContainingType), fields 2, 7, 12 of DISubprogram
(Context, Type, ContainingType).
llvm-svn: 190205
Occasionally DAGCombiner can spot that a SETCC operation is completely
redundant and reduce it to "all true" or "all false". If this happens to a
vector, the value produced has to take account of what a normal comparison
would have produced, which may be an all-1s bitmask.
The fix in SelectionDAG.cpp is tested, however, as far as I can see the code in
TargetLowering.cpp is possibly unreachable and almost certainly irrelevant when
triggered so there are no tests. However, I believe it's still clearly the
right change and may save someone else some hassle if it suddenly becomes
reachable. So I'm doing it anyway.
llvm-svn: 190147
The architecture has many comparison instructions, including some that
extend one of the operands. The signed comparison instructions use sign
extensions and the unsigned comparison instructions use zero extensions.
In cases where we had a free choice between signed or unsigned comparisons,
we were trying to decide at lowering time which would best fit the available
instructions, taking things like extension type into account. The code
to do that was getting increasingly hairy and was also making some bad
decisions. E.g. when comparing the result of two LLCs, it is better to use
CR rather than CLR, since CR can be fused with a branch while CLR can't.
This patch removes the lowering code and instead adds an operand to
integer comparisons to say whether signed comparison is required,
whether unsigned comparison is required, or whether either is OK.
We can then leave the choice of instruction up to the normal isel code.
llvm-svn: 190138
If the DAG already has only legal types, then the second round of DAG combines
is skipped. In this case VSELECT+SETCC patterns that match a more efficient
instruction (e.g. min/max) are never recognized.
This fix allows VSELECT+SETCC combines if the types are already legal before DAG
type legalization.
Reviewer: Nadav
llvm-svn: 190105
Solution is not sufficient to prevent 'mov pc, lr' being emitted for jump table code.
Test case doesn't trigger the added functionality.
llvm-svn: 190047
This improves code generation for jump tables by avoiding the emission of "mov pc, lr" which could fool the processor into believing this is a return from a function causing mispredicts. The code generation logic for jump tables uses ADR to materialize the address of the jump target.
Patch by Daniel Stewart!
llvm-svn: 190043
In sparc, setjmp stores only the registers %fp, %sp, %i7 and %o7. longjmp restores
the stack, and the callee-saved registers (all local/in registers: %i0-%i7, %l0-%l7)
using the stored %fp and register windows. However, this does not guarantee that the longjmp
will restore the registers, as they were when the setjmp was called. This is because these
registers may be clobbered after returning from setjmp, but before calling longjmp.
This patch prevents the registers %i0-%i5, %l0-l7 to live across the setjmp call using the register mask.
llvm-svn: 190033
Fast register pressure tracking currently only takes effect during
bottom up scheduling. Forcing this is a bit faster and simpler for
targets that don't have many scheduling constraints and don't need
top-down scheduling.
llvm-svn: 190014
'Force' values in registers using the calling convention. Now, we only depend on
the calling convention and that the allocator performs copy coalescing.
llvm-svn: 189985
This reverts commit r189648.
Fixes for the previously failing clang-side arm_neon_intrinsics test
cases will be checked in separately.
llvm-svn: 189841
For now this just handles simple comparisons of an ANDed value with zero.
The CC value provides enough information to do any comparison for a
2-bit mask, and some nonzero comparisons with more populated masks,
but that's all future work.
llvm-svn: 189819
What we really want is to enable Swift by default for *v7s triples (and there already seems to be some logic which attempts to do that). In that case the iOS version doesn't matter.
llvm-svn: 189763
don't exist in libc. This is really not the right way to solve this problem;
but it's not clear to me at this time exactly what is the right way.
If we create stubs here, they will cause link errors because these functions
do not exist in libc.
llvm-svn: 189727
This patch adds fast-isel support for calls (but not intrinsic calls
or varargs calls). It also removes a badly-formed assert. There are
some new tests just for calls, and also for folding loads into
arguments on calls to avoid extra extends.
llvm-svn: 189701
has hard float, when you compile the mips32 code you have to make sure
that it knows to compile any mips32 routines as hard float. I need to clean
up the way mips16 hard float is specified but I need to first think through
all the details. Mips16 always has a form of soft float, the difference being
whether the underlying hardware has floating point. So it's not really
necessary to pass the -soft-float to llvm since soft-float is always true
for mips16 by virtue of the fact that it will not register floating point
registers. By using this fact, I can simplify the way this is all handled.
llvm-svn: 189690
Yet another chunk of fast-isel code. This one handles various
conversions involving floating-point. (It also includes some
miscellaneous handling throughout the back end for LWA_32 and LWAX_32
that should have been part of the load-store patch.)
llvm-svn: 189677
Created SUPressureDiffs array to hold the per node PDiff computed during DAG building.
Added a getUpwardPressureDelta API that will soon replace the old
one. Compute PressureDelta here from the precomputed PressureDiffs.
Updating for liveness will come next.
llvm-svn: 189640
This is the next big chunk of fast-isel code. The primary purpose is
to implement selection of loads and stores, but there is a lot of
drag-along to support this. The common code to analyze addresses for
both loads and stores is substantial. It's also necessary to add the
materialization code for global values.
Related to load-store processing is the code to fold loads into
integer extends, since otherwise we generate lots of redundant
instructions. We also need to add some overrides to some FastEmit
routines to ensure we don't assign GPR 0 to a virtual register when
this would change the meaning of an instruction.
I added handling selection of a few binary arithmetic instructions, to
enable committing some test cases I wrote a while back.
Finally, ap couple of miscellaneous changes:
* I cleaned up some poor style from a previous patch in
PPCISelLowering.cpp, pointed out by David Blaikie.
* I enlarged the Addr.Offset field to avoid sign problems with 32-bit
offsets.
llvm-svn: 189636
In addition to recognizing when the multiply's second argument is
coming from an explicit VDUPLANE, also look for a plain scalar
f32 reference and reference it via the corresponding vector
lane.
rdar://14870054
llvm-svn: 189619
The usual default of "dmb ish" (inner-shareable) isn't even a valid instruction
on v6M or v7M (well, it does the same thing but software is strongly
discouraged from using it) so we should emit a full-system barrier there.
llvm-svn: 189483
The vqdmlal and vqdmlls instructions are really just a fused pair consisting of
a vqdmull.sN and a vqadd.sN. This adds patterns to LLVM so that we can switch
Clang's CodeGen over to generating these instead of the special vqdmlal
intrinsics.
llvm-svn: 189480
These intrinsics are legalized to V(ALL|ANY)_(NON)?ZERO nodes,
are matched as SN?Z_[BHWDV]_PSEUDO pseudo's, and emitted as
a branch/mov sequence to evaluate to 0 or 1.
Note: The resulting code is sub-optimal since it doesnt seem to be possible
to feed the result of an intrinsic directly into a brcond. At the moment
it uses (SETCC (VALL_ZERO $ws), 0, SETEQ) and similar which unnecessarily
evaluates the boolean twice.
llvm-svn: 189478
For now just handles simple comparisons of an ANDed value with zero.
The CC value provides enough information to do any comparison for a
2-bit mask, and some nonzero comparisons with more populated masks,
but that's all future work.
llvm-svn: 189469
The MSA control registers have been added as reserved registers,
and are only used via ISD::Copy(To|From)Reg. The intrinsics are lowered
into these nodes.
llvm-svn: 189468
We want to convert code like (or (srl N, 8), (shl N, 8)) into (srl (bswap N),
const), but this is only valid if the bits above 16 on the source pattern are
0, the checks we were doing on this were slightly wrong before.
llvm-svn: 189348
These instructions aren't particularly complicated and it's well worth having
patterns for some reasonably useful LLVM IR that will match them. Soon we
should be able to switch Clang over to producing this natural version.
llvm-svn: 189335
Note that all of these tests use ld.b and st.b for the loads and stores
regardless of the data size. This is because the definition of bitcast is
equivalent to a store/load sequence and DAG combiner accordingly folds bitcasts
to/from v16i8 into the load/store nodes to product load/store nodes with
type v16i8.
llvm-svn: 189333
Lengths up to a certain threshold (currently 6 * 256) use a series of MVCs.
Lengths above that threshold use a loop to handle X*256 bytes followed
by a single MVC to handle the excess (if any). This loop will also be
needed in future when support for variable lengths is added.
Because the same tablegen classes are used to define MVC and CLC,
the patch also has the side-effect of defining a pseudo loop instruction
for CLC. That instruction isn't used yet (and wouldn't be handled correctly
if it were). I'm planning to use it soon though.
llvm-svn: 189331
DICompositeType will have an identifier field at position 14. For now, the
field is set to null in DIBuilder.
For DICompositeTypes where the template argument field (the 13th field)
was optional, modify DIBuilder to make sure the template argument field is set.
Now DICompositeType has 15 fields.
Update DIBuilder to use NULL instead of "i32 0" for null value of a MDNode.
Update verifier to check that DICompositeType has 15 fields and the last
field is null or a MDString.
Update testing cases to include an extra field for DICompositeType.
The identifier field will be used by type uniquing so a front end can
genearte a DICompositeType with a unique identifer.
llvm-svn: 189282
Get the register class right for the TST instruction. This keeps the
machine verifier happy, enabling us to turn it on for another test.
rdar://12594152
llvm-svn: 189274
Incremental improvement to fast-isel for PPC64. This allows us to
select on ret, sext, and zext. Filling in sext/zext improves some of
the existing logic in handling compare-immediates that needed extends.
A simplified return convention for fast-isel is also added to the
PPC64 calling conventions. All call/return processing for DAG
selection is handled with custom code, so there isn't an existing CC
to rely on here. The include of PPCGenCallingConv.inc causes compiler
warnings due to the 32-bit calling conventions that are not used, so
the dummy function "usePPC32CCs()" is added here to silence those.
Test cases for the return and extend logic are added.
llvm-svn: 189266
If we have a binary operation like ISD:ADD, we can set the result type
equal to the result type of one of its operands rather than using
TargetLowering::getPointerTy().
Also, any use of DAG.getIntPtrConstant(C) as an operand for a binary
operation can be replaced with:
DAG.getConstant(C, OtherOperand.getValueType());
llvm-svn: 189227
This adds minimal support to the SelectionDAG for handling address spaces
with different pointer sizes. The SelectionDAG should now correctly
lower pointer function arguments to the correct size as well as generate
the correct code when lowering getelementptr.
This patch also updates the R600 DataLayout to use 32-bit pointers for
the local address space.
v2:
- Add more helper functions to TargetLoweringBase
- Use CHECK-LABEL for tests
llvm-svn: 189221
First chunk of actual fast-isel selection code. This handles direct
and indirect branches, as well as feeding compares for direct
branches. PPCFastISel::PPCEmitIntExt() is just roughed in and will be
expanded in a future patch. This also corrects a problem with
selection for constant pool entries in JIT mode or with small code
model.
llvm-svn: 189202
I need to add the rest of these to the list or else to delay putting
out the actual stub until later in code generation when I know if
the external function ever got emitted
Resubmit this patch. The target triple needs to be added to the test so that
clang does not tell the backend the wrong target when the host is BSD. There
is a clang bug in here somewhere that I need to track down. At Mips this
has been filed internally as a bug.
llvm-svn: 189186
I need to add the rest of these to the list or else to delay putting
out the actual stub until later in code generation when I know if
the external function ever got emitted.
llvm-svn: 189161
If we had a store of an integer to memory, and the integer and store size
were suitable for a form of MV..., we used MV... no matter what. We could
then have sequences like:
lay %r2, 0(%r3,%r4)
mvi 0(%r2), 4
In these cases it seems better to force the constant into a register
and use a normal store:
lhi %r2, 4
stc %r2, 0(%r3, %r4)
since %r2 is more likely to be hoisted and is easier to rematerialize.
llvm-svn: 189098
...so that it can be used for z too. Most of the code is the same.
The only real change is to use TargetTransformInfo to test when a sqrt
instruction is available.
The pass is opt-in because at the moment it only handles sqrt.
llvm-svn: 189097
I'd forgotten that "Requires" blocks override rather than add to the
constraints, so my pseudo-instruction was being selected in Thumb mode leading
to nonsense instructions.
rdar://problem/14817358
llvm-svn: 189096
A single metadata will not span multiple lines. This also helps me with
my script to automatic update the testing cases.
A debug info testing case should have a llvm.dbg.cu.
Do not use hard-coded id for debug nodes.
llvm-svn: 189033
Back in the mists of time (2008), it seems TableGen couldn't handle the
patterns necessary to match ARM's CMOV node that we convert select operations
to, so we wrote a lot of fairly hairy C++ to do it for us.
TableGen can deal with it now: there were a few minor differences to CodeGen
(see tests), but nothing obviously worse that I could see, so we should
probably address anything that *does* come up in a localised manner.
llvm-svn: 188995
When truncated vector stores were being custom lowered in
VectorLegalizer::LegalizeOp(), the old (illegal) and new (legal) node pair
was not being added to LegalizedNodes list. Instead of the legalized
result being passed to VectorLegalizer::TranslateLegalizeResult(),
the result was being passed back into VectorLegalizer::LegalizeOp(),
which ended up adding a (new, new) pair to the list instead.
This was causing an assertion failure when a custom lowered truncated
vector store was the last instruction a basic block and the VectorLegalizer
was unable to find it in the LegalizedNodes list when updating the
DAG root.
llvm-svn: 188953
The small utility function that pattern matches Base + Index +
Offset patterns for loads and stores fails to recognize the base
pointer for loads/stores from/into an array at offset 0 inside a
loop. As a result DAGCombiner::MergeConsecutiveStores was not able
to merge all stores.
This commit fixes the issue by adding an additional pattern match
and also a test case.
Reviewer: Nadav
llvm-svn: 188936
def imm0_63 : Operand<i32>, ImmLeaf<i32, [{ return Imm >= 0 && Imm < 63;}]>{
As it seems Imm <63 should be Imm <= 63. ImmLeaf is used in pattern match, but there is already a function check the shift amount range, so just remove ImmLeaf. Also add a test to check 63.
llvm-svn: 188911
The initial port used MLG(R) for i64 UMUL_LOHI but left the other three
combinations as not-legal-or-custom. Although 32x32->{32,32}
multiplications exist, they're not as quick as doing a normal 64-bit
multiplication, so it didn't seem like i32 SMUL_LOHI and UMUL_LOHI
would be useful. There's also no direct instruction for i64 SMUL_LOHI,
so it needs to be implemented in terms of UMUL_LOHI.
However, not defining these patterns means that we don't convert
division by a constant into multiplication, so this patch fills
in the other cases. The new i64 SMUL_LOHI sequence is simpler
than the one that we used previously for 64x64->128 multiplication,
so int-mul-08.ll now tests the full sequence.
llvm-svn: 188898
functions be compiled as mips32, without having to add attributes. This
is useful in certain situations where you don't want to have to edit the
function attributes in the source. For now it's only an option used for
the compiler developers when debugging the mips16 port.
llvm-svn: 188826
Update testcase to be more careful about checking register
values. While regexes are general goodness for these sorts of
testcases, in this example, the registers are constrained by
the calling convention, so we can and should check their
explicit values.
rdar://14779513
llvm-svn: 188819
SystemZTargetLowering::emitStringWrapper() previously loaded the character
into R0 before the loop and made R0 live on entry. I'd forgotten that
allocatable registers weren't allowed to be live across blocks at this stage,
and it confused LiveVariables enough to cause a miscompilation of f3 in
memchr-02.ll.
This patch instead loads R0 in the loop and leaves LICM to hoist it
after RA. This is actually what I'd tried originally, but I went for
the manual optimisation after noticing that R0 often wasn't being hoisted.
This bug forced me to go back and look at why, now fixed as r188774.
We should also try to optimize null checks so that they test the CC result
of the SRST directly. The select between null and the SRST GPR result could
then usually be deleted as dead.
llvm-svn: 188779
Post-RA LICM keeps three sets of registers: PhysRegDefs, PhysRegClobbers
and TermRegs. When it sees a definition of R it adds all aliases of R
to the corresponding set, so that when it needs to test for membership
it only needs to test a single register, rather than worrying about
aliases there too. E.g. the final candidate loop just has:
unsigned Def = Candidates[i].Def;
if (!PhysRegClobbers.test(Def) && ...) {
to test whether register Def is multiply defined.
However, there was also a shortcut in ProcessMI to make sure we didn't
add candidates if we already knew that they would fail the final test.
This shortcut was more pessimistic than the final one because it
checked whether _any alias_ of the defined register was multiply defined.
This is too conservative for targets that define register pairs.
E.g. on z, R0 and R1 are sometimes used as a pair, so there is a
128-bit register that aliases both R0 and R1. If a loop used
R0 and R1 independently, and the definition of R0 came first,
we would be able to hoist the R0 assignment (because that used
the final test quoted above) but not the R1 assignment (because
that meant we had two definitions of the paired R0/R1 register
and would fail the shortcut in ProcessMI).
This patch just uses the same check for the ProcessMI shortcut as
we use in the final candidate loop.
llvm-svn: 188774
Previously we used a const-pool load for virtually all 64-bit floating values.
Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov"
instructions of one stripe or another.
llvm-svn: 188773
copysign/copysignf never become function calls (because the SDAG expansion code
does not lower to the corresponding function call, but rather directly
implements the associated logic), but copysignl almost always is lowered into a
call to the requested libm functon (and, thus, might clobber CTR).
llvm-svn: 188727
- split WidenVecRes_Binary into WidenVecRes_Binary and WidenVecRes_BinaryCanTrap
- WidenVecRes_BinaryCanTrap preserves the original behaviour for operations
that can trap
- WidenVecRes_Binary simply widens the operation and improves codegen for
3-element vectors by allowing widening and promotion on x86 (matches the
behaviour of unary and ternary operation widening)
- use WidenVecRes_Binary for operations on integers.
Reviewed by: nrotem
llvm-svn: 188699
For now this matches the equivalent of (neg (abs ...)), which did hit a few
times in projects/test-suite. We should probably also match cases where
absolute-like selects are used with reversed arguments.
llvm-svn: 188671
This first cut is pretty conservative. The final argument register (R6)
is call-saved, so we would need to make sure that the R6 argument to a
sibling call is the same as the R6 argument to the calling function,
which seems worth keeping as a separate patch.
Saying that integer truncations are free means that we no longer
use the extending instructions LGF and LLGF for spills in int-conv-09.ll
and int-conv-10.ll. Instead we treat the registers as 64 bits wide and
truncate them to 32-bits where necessary. I think it's unlikely we'd
use LGF and LLGF for spills in other situations for the same reason,
so I'm removing the tests rather than replacing them. The associated
code is generic and applies to many more instructions than just
LGF and LLGF, so there is no corresponding code removal.
llvm-svn: 188669
We had previously been asserting when faced with a FCOPYSIGN f64, ppcf128 node
because there was no way to expand the FCOPYSIGN node. Because ppcf128 is the
sum of two doubles, and the first double must have the larger magnitude, we
can take the sign from the first double. As a result, in addition to fixing the
crash, this is also an optimization.
llvm-svn: 188655
Modern PPC cores support a floating-point copysign instruction, and we can use
this to lower the FCOPYSIGN node (which is created from calls to the libm
copysign function). A couple of extra patterns are necessary because the
operand types of FCOPYSIGN need not agree.
llvm-svn: 188653
When patching inlineasm nodes to use GPRPair for 64-bit values, we
were dropping the information that two operands were tied, which
effectively broke the live-interval of vregs affected.
llvm-svn: 188643
Properly constrain the operand register class for instructions used
in [sz]ext expansion. Update more tests to use the verifier now that
we're getting the register classes correct.
rdar://12594152
llvm-svn: 188594
Teach the generic instruction selection helper functions to constrain
the register classes of their input operands. For non-physical register
references, the generic code needs to be careful not to mess that up
when replacing references to result registers. As the comment indicates
for MachineRegisterInfo::replaceRegWith(), it's important to call
constrainRegClass() first.
rdar://12594152
llvm-svn: 188593
Lots of machine verifier errors result from using a plain GPR regclass
for incoming argument copies. A more restrictive rGPR class is more
appropriate since it more accurately represents what's happening, plus
it lines up better with isel later on so the verifier is happier.
Reduces the number of ARM fast-isel tests not running with the verifier
enabled by over half.
rdar://12594152
llvm-svn: 188592
This regards how mips16 is viewed. It's not really a target type but
there has always been a target for it in the td files. It's more properly
-mcpu=mips32 -mattr=+mips16 . This is how clang treats it but we have
always had the -mcpu=mips16 which I probably should delete now but it will
require updating all the .ll test cases for mips16. In this case it changed
how we decide if we have a count bits instruction and whether instruction
lowering should then expand ctlz. Now that we have dual mode compilation,
-mattr=+mips16 really just indicates the inital processor mode that
we are compiling for. (It is also possible to have -mcpu=64 -mattr=+mips16
but as far as I know, nobody has even built such a processor, though there
is an architecture manual for this).
llvm-svn: 188586
The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD
instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused
it to corrupt the encoding of that by clobbering the first operand with
the second one.
Undo that damage and only apply the SMRD logic to that.
Fixes some derivates related piglit regressions with radeonsi.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188558
This unbreaks PIC with fast isel on ELF targets (PR16717). The output matches
what GCC and SDag do for PIC but may not cover all of the many flavors of PIC
that exist.
llvm-svn: 188551
Generalize r188163 to cope with return types other than MVT::i32, just
as the existing visitMemCmpCall code did. I've split this out into a
subroutine so that it can be used for other upcoming patches.
I also noticed that I'd used the wrong API to record the out chain.
It's a load that uses DAG.getRoot() rather than getRoot(), so the out
chain should go on PendingLoads. I don't have a testcase for that because
we don't do any interesting scheduling on z yet.
llvm-svn: 188540
r188163 used CLC to implement memcmp. Code that compares the result
directly against zero can test the CC value produced by CLC, but code
that needs an integer result must use IPM. The sequence I'd used was:
ipm <reg>
sll <reg>, 2
sra <reg>, 30
but I'd forgotten that this inverts the order, so that CC==1 ("less")
becomes an integer greater than zero, and CC==2 ("greater") becomes
an integer less than zero. This sequence should only be used if the
CLC arguments are reversed to compensate. The problem then is that
the branch condition must also be reversed when testing the CLC
result directly.
Rather than do that, I went for a different sequence that works with
the natural CLC order:
ipm <reg>
srl <reg>, 28
rll <reg>, <reg>, 31
One advantage of this is that it doesn't clobber CC. A disadvantage
is that any sign extension to 64 bits must be done separately,
rather than being folded into the shifts.
llvm-svn: 188538
- Benjamin fixed the emission of this file in r179937, but it still lives on a
few buildbots. We should probably clean up the build dirs once in a while,
eh?
llvm-svn: 188527
The SIInsertWaits pass was overwriting the first operand (gds bit) of
DS_WRITE_B32 with the second operand (value to write). This meant that
any time the value to write was stored in an odd number VGPR, the gds
bit would be set causing the instruction to write to GDS instead of LDS.
llvm-svn: 188522
- Instead of setting the suffixes in a bunch of places, just set one master
list in the top-level config. We now only modify the suffix list in a few
suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).
- Aside from removing the need for a bunch of lit.local.cfg files, this enables
4 tests that were inadvertently being skipped (one in
Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
XFAILED).
- This commit also fixes a bunch of config files to use config.root instead of
older copy-pasted code.
llvm-svn: 188513
Now that compute support is better on SI, we can't continue using v16i8
for descriptors since this is also a legal type in OpenCL.
This patch fixes numerous hangs with the piglit OpenCL test and since
we now use a target specific DAG node for LOAD_CONSTANT with the
correct MemOperandFlags, this should also fix:
https://bugs.freedesktop.org/show_bug.cgi?id=66805
llvm-svn: 188429
Using REG_SEQUENCE for BUILD_VECTOR rather than a series of INSERT_SUBREG
instructions should make it easier for the register allocator to coalasce
unnecessary copies.
v2:
- Use an SGPR register class if all the operands of BUILD_VECTOR are
SGPRs.
llvm-svn: 188427
The previous code declared the operand as unknown:$vaddr, which made
it possible for scalar registers to be used instead of vector registers.
llvm-svn: 188425
This is a follow-up to r187693, correcting that code to request the correct
register class. The previous version, with the wrong register class, was not
really correcting the constraints, but rather was removing them. Coincidentally,
this fixed the failing test case in r187693, but obviously created other
problems.
llvm-svn: 188407
When determining if two different loads are from the same base address,
this patch allows one load to use a t2LDRi8 address mode and another to
use a t2LDRi12 address mode. The current implementation is very
conservative and this allows the case of differing Thumb2 byte loads to
be considered. Allowing these differing modes instead of forcing the exact
same opcode is useful for situations where one opcodes loads from a base
address+1 and a second opcode loads for a base address-1.
Patch by Daniel Stewart.
llvm-svn: 188385
A common idiom is to use zero and all-ones as sentinal values and to
check for both in a single conditional ("x != 0 && x != (unsigned)-1").
That generates code, for i32, like:
testl %edi, %edi
setne %al
cmpl $-1, %edi
setne %cl
andb %al, %cl
With this transform, we generate the simpler:
incl %edi
cmpl $1, %edi
seta %al
Similar improvements for other integer sizes and on other platforms. In
general, combining the two setcc instructions into one is better.
rdar://14689217
llvm-svn: 188315
R600 doesn't need to do any scheduling on the SelectionDAG now that it
has a very good MachineScheduler. Also, using the VLIW SelectionDAG
scheduler was having a major impact on compile times. For example with
the phatk kernel here are the LLVM IR to machine code compile times:
With Sched::VLIW
Total Compile Time: 1.4890 Seconds (User + System)
SelectionDAG Instruction Scheduling: 1.1670 Seconds (User + System)
With Sched::Source
Total Compile Time: 0.3330 Seconds (User + System)
SelectionDAG Instruction Scheduling: 0.0070 Seconds (User + System)
The code ouput was identical with both schedulers. This may not be true
for all programs, but it gives me confidence that there won't be much
reduction, if any, in code quality by using Sched::Source.
llvm-svn: 188215
Various tests had sprung up over the years which had --check-prefix=ABC on the
RUN line, but "CHECK-ABC:" later on. This happened to work before, but was
strictly incorrect. FileCheck is getting stricter soon though.
Patch by Ron Ofir.
llvm-svn: 188173
If the tail-callee and caller give the same bits via the same signext/zeroext
attribute then a tail-call should be allowed, since the extension has already
been done by the callee.
llvm-svn: 188159
is actually an instrinsic that will not occur in libc. This list here
is not exhaustive but fixes the one places in test-suite where this occurs.
I have filed a bug against myself to research the full list and add them
to the array of such cases. In the future, actual stub generation will occur
in a later phase and we won't need this code because we will know at that time
during the compilation that in fact no helper function was even needed.
llvm-svn: 188149
I need to go through all the runtime routine list and see if there
are any more I need to add for mips16 floating point. Prototypes must
be correct or else I don't know to add a helper function call.
llvm-svn: 188106
This patch decouples the stack protector pass so that we can support stack
protector implementations that do not use the IR level generated stack protector
fail basic block.
No codesize increase is caused by this change since the MI level tail merge pass
properly merges together the fail condition blocks (see the updated test).
llvm-svn: 188105
For most libm ISD nodes, TargetLoweringBase::initActions sets the default
scalar-type action to Expand, and leaves the vector-type action default as
Legal. This is not appropriate for the new ISD::FROUND node (which no backend
but PowerPC handles explicitly).
Fixes PR16842.
llvm-svn: 188048
this records relocation entries in the mach-o object file
for PIC code generation.
tested on powerpc-darwin8, validated against darwin otool -rvV
llvm-svn: 188004
the type exists.
Fix up cases where we weren't checking for optional types and add
an assert to addType to make sure we catch this in the future.
Fix up a testcase that was using the tag for DW_TAG_array_type
when it meant DW_TAG_enumeration_type.
llvm-svn: 187963
Making use of the recently-added ISD::FROUND, which allows for custom lowering
of round(), the PPC backend will now map frin to round(). Previously, we had
been using frin to lower nearbyint() (and rint() via some custom lowering to
handle the extra fenv flags requirements), but only in fast-math mode because
frin does not tie-to-even. Several users had complained about this behavior,
and this new mapping of frin to round is certainly more appropriate (and does
not require fast-math mode).
In effect, this reverts r178362 (and part of r178337, replacing the nearbyint
mapping with the round mapping).
llvm-svn: 187960
Original commit message:
Stop emitting weak symbols into the "coal" sections.
The Mach-O linker has been able to support the weak-def bit on any symbol for
quite a while now. The compiler however continued to place these symbols into a
"coal" section, which required the linker to map them back to the base section
name.
Replace the sections like this:
__TEXT/__textcoal_nt instead use __TEXT/__text
__TEXT/__const_coal instead use __TEXT/__const
__DATA/__datacoal_nt instead use __DATA/__data
<rdar://problem/14265330>
llvm-svn: 187939
This follows the same lines as the integer code. In the end it seemed
easier to have a second 4-bit mask in TSFlags to specify the compare-like
CC values. That eats one more TSFlags bit than adding a CCHasUnordered
would have done, but it feels more concise.
llvm-svn: 187883
Since the VSrc_* register classes contain both VGPRs and SGPRs, copies
that used be emitted by isel like this:
SGPR = COPY VGPR
Will now be emitted like this:
VSrC = COPY VGPR
This patch also adds a pass that tries to identify and fix situations where
a VGPR to SGPR copy may occur. Hopefully, these changes will make it
impossible for the compiler to generate illegal VGPR to SGPR copies.
llvm-svn: 187831
Also remove checking of llvm.dbg.sp since it is not used in generating dwarf.
Current state of Finder:
DebugInfoFinder tries to list all debug info MDNodes used in a module. To
list debug info MDNodes used by an instruction, DebugInfoFinder provides
processDeclare, processValue and processLocation to handle DbgDeclareInst,
DbgValueInst and DbgLoc attached to instructions. processModule will go
through all DICompileUnits in llvm.dbg.cu and list debug info MDNodes
used by the CUs.
TODO:
1> Finder has a list of CUs, SPs, Types, Scopes and global variables. We
need to add a list of variables that are used by DbgDeclareInst and
DbgValueInst.
2> MDString fields should be null or isa<MDString> and MDNode fields should be
null or isa<MDNode>. We currently use empty string or int 0 to represent null.
3> Go though Verify functions and make sure that they check field types.
4> Clean up existing testing cases to remove llvm.dbg.sp and make sure each
testing case has a llvm.dbg.cu.
Re-apply r187609 with fix to pass ocaml binding. vmcore.ml generates a debug
location with scope being metadata !{}, in verifier we treat this as a null
scope.
llvm-svn: 187812
The PPC backend had been missing a pattern to generate mulli for 64-bit
multiples. We had been generating it only for 32-bit multiplies. Unfortunately,
generating li + mulld unnecessarily increases register pressure.
llvm-svn: 187807
This change converts the NVPTX target to use the MC infrastructure
instead of directly emitting MachineInstr instances. This brings
the target more up-to-date with LLVM TOT, and should fix PR15175
and PR15958 (libNVPTXInstPrinter is empty) as a side-effect.
llvm-svn: 187798
This change came about primarily because of two issues in the existing code.
Niether of:
define i64 @test1(i64 %val) {
%in = trunc i64 %val to i32
tail call i32 @ret32(i32 returned %in)
ret i64 %val
}
define i64 @test2(i64 %val) {
tail call i32 @ret32(i32 returned undef)
ret i32 42
}
should be tail calls, and the function sameNoopInput is responsible. The main
problem is that it is completely symmetric in the "tail call" and "ret" value,
but in reality different things are allowed on each side.
For these cases:
1. Any truncation should lead to a larger value being generated by "tail call"
than needed by "ret".
2. Undef should only be allowed as a source for ret, not as a result of the
call.
Along the way I noticed that a mismatch between what this function treats as a
valid truncation and what the backends see can lead to invalid calls as well
(see x86-32 test case).
This patch refactors the code so that instead of being based primarily on
values which it recurses into when necessary, it starts by inspecting the type
and considers each fundamental slot that the backend will see in turn. For
example, given a pathological function that returned {{}, {{}, i32, {}}, i32}
we would consider each "real" i32 in turn, and ask if it passes through
unchanged. This is much closer to what the backend sees as a result of
ComputeValueVTs.
Aside from the bug fixes, this eliminates the recursion that's going on and, I
believe, makes the bulk of the code significantly easier to understand. The
trade-off is the nasty iterators needed to find the real types inside a
returned value.
llvm-svn: 187787
This patch just uses a peephole test for "add; compare; branch" sequences
within a single block. The IR optimizers already convert loops to
decrement-and-branch-on-nonzero form in some cases, so even this
simplistic test triggers many times during a clang bootstrap and
projects/test-suite run. It looks like there are still cases where we
need to more strongly prefer branches on nonzero though. E.g. I saw a
case where a loop that started out with a check for 0 ended up with a
check for -1. I'll try to look at that sometime.
I ended up adding the Reference class because MachineInstr::readsRegister()
doesn't check for subregisters (by design, as far as I could tell).
llvm-svn: 187723
helper functions. This can be optimized out later when the remaining
parts of the helper function work is moved into the Mips16HardFloat pass.
For now it forces us to use the 32 bit save/restore instructions instead
of the 16 bit ones.
llvm-svn: 187712
Due to the weird and wondeful usual arithmetic conversions, some
calculations involving negative values were getting performed in
uint32_t and then promoted to int64_t, which is really not a good
idea.
Patch by Katsuhiro Ueno.
llvm-svn: 187703
Internally, the PowerPC backend names the 32-bit GPRs R[0-9]+, and names the
64-bit parent GPRs X[0-9]+. When matching inline assembly constraints with
explicit register names, on PPC64 when an i64 MVT has been requested, we need
to follow gcc's convention of using r[0-9]+ to refer to the 64-bit (parent)
registers.
At some point, we'll probably want to arrange things so that the generic code
in TargetLowering uses the AsmName fields declared in *RegisterInfo.td in order
to match these inline asm register constraints. If we do that, this change can
be reverted.
llvm-svn: 187693
Function attributes are the future! So just query whether we want to realign the
stack directly from the function instead of through a random target options
structure.
llvm-svn: 187618
This is actually an LLVM bug in the way it generates signatures for these
when soft float is enabled. For example, floor ends up having the signature
of int64(int64). The signature part is not the same as where the actual
parameter types are recorded, and those ARE of course int64(int64) when
soft float is enabled. (Yes, Mips16 hard float uses soft float but with
different runtime rounes but then has to interoperate with Mips32 using
normal floating point). This logic will eventually be moved to the
Mips16HardFloat pass so it's not worth sorting out these issues in LLVM
since nobody but Mips16 cares about these signatures, as far as I know,
and even I won't eventually either.
llvm-svn: 187613
Also remove checking of llvm.dbg.sp since it is not used in generating dwarf.
Current state of Finder:
DebugInfoFinder tries to list all debug info MDNodes used in a module. To
list debug info MDNodes used by an instruction, DebugInfoFinder provides
processDeclare, processValue and processLocation to handle DbgDeclareInst,
DbgValueInst and DbgLoc attached to instructions. processModule will go
through all DICompileUnits in llvm.dbg.cu and list debug info MDNodes
used by the CUs.
TODO:
1> Finder has a list of CUs, SPs, Types, Scopes and global variables. We
need to add a list of variables that are used by DbgDeclareInst and
DbgValueInst.
2> MDString fields should be null or isa<MDString> and MDNode fields should be
null or isa<MDNode>. We currently use empty string or int 0 to represent null.
3> Go though Verify functions and make sure that they check field types.
4> Clean up existing testing cases to remove llvm.dbg.sp and make sure each
testing case has a llvm.dbg.cu.
llvm-svn: 187609
* Added R600_Reg64 class
* Added T#Index#.XY registers definition
* Added v2i32 register reads from parameter and global space
* Added f32 and i32 elements extraction from v2f32 and v2i32
* Added v2i32 -> v2f32 conversions
Tom Stellard:
- Mark vec2 operations as expand. The addition of a vec2 register
class made them all legal.
Patch by: Dmitry Cherkassov
Signed-off-by: Dmitry Cherkassov <dcherkassov@gmail.com>
llvm-svn: 187582
This also fixes a bug in the predication of LR to LOCR: I'd forgotten
that with these in-place instruction builds, the implicit operands need
to be added manually. I think this was latent until now, but is tested
by int-cmp-45.c. It also adds a CC valid mask to STOC, again tested by
int-cmp-45.c.
llvm-svn: 187573
Convert >= 1 to > 0, etc. Using comparison with zero isn't a win on its own,
but it exposes more opportunities for CC reuse (the next patch).
llvm-svn: 187571
Patch by Ana Pazos.
- Completed implementation of instruction formats:
AdvSIMD three same
AdvSIMD modified immediate
AdvSIMD scalar pairwise
- Completed implementation of instruction classes
(some of the instructions in these classes
belong to yet unfinished instruction formats):
Vector Arithmetic
Vector Immediate
Vector Pairwise Arithmetic
- Initial implementation of instruction formats:
AdvSIMD scalar two-reg misc
AdvSIMD scalar three same
- Intial implementation of instruction class:
Scalar Arithmetic
- Initial clang changes to support arm v8 intrinsics.
Note: no clang changes for scalar intrinsics function name mangling yet.
- Comprehensive test cases for added instructions
To verify auto codegen, encoding, decoding, diagnosis, intrinsics.
llvm-svn: 187567
The loop optimizers were assuming that scales > 1 were OK. I think this
is actually a bug in TargetLoweringBase::isLegalAddressingMode(),
since it seems to be trying to reject anything that isn't r+i or r+r,
but it has no default case for scales other than 0, 1 or 2. Implementing
the hook for z means that z can no longer test any change there though.
llvm-svn: 187497
Extend r187495 to conditional loads. I split this out because the
easiest way seemed to be to force a particular operand order in
SystemZISelDAGToDAG.cpp.
llvm-svn: 187496
System z branches have a mask to select which of the 4 CC values should
cause the branch to be taken. We can invert a branch by inverting the mask.
However, not all instructions can produce all 4 CC values, so inverting
the branch like this can lead to some oddities. For example, integer
comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater).
If an integer EQ is reversed to NE before instruction selection,
the branch will test for 1 or 2. If instead the branch is reversed
after instruction selection (by inverting the mask), it will test for
1, 2 or 3. Both are correct, but the second isn't really canonical.
This patch therefore keeps track of which CC values are possible
and uses this when inverting a mask.
Although this is mostly cosmestic, it fixes undefined behavior
for the CIJNLH in branch-08.ll. Another fix would have been
to mask out bit 0 when generating the fused compare and branch,
but the point of this patch is that we shouldn't need to do that
in the first place.
The patch also makes it easier to reuse CC results from other instructions.
llvm-svn: 187495
r187116 moved compare-and-branch generation from the instruction-selection
pass to the peephole optimizer (via optimizeCompare). It turns out that even
this is a bit too early. Fused compare-and-branch instructions don't
interact well with predication, where a CC result is needed. They also
make it harder to reuse the CC side-effects of earlier instructions
(not yet implemented, but the subject of a later patch).
Another problem was that the AnalyzeBranch family of routines weren't
handling compares and branches, so we weren't able to reverse the fused
form in cases where we would reverse a separate branch. This could have
been fixed by extending AnalyzeBranch, but given the other problems,
I've instead moved the fusing to the long-branch pass, which is also
responsible for the opposite transformation: splitting out-of-range
compares and branches into separate compares and long branches.
I've added a test for the AnalyzeBranch problem. A test for the
predication problem is included in the next patch, which fixes a bug
in the choice of CC mask.
llvm-svn: 187494
r186399 aggressively used the RISBG instruction for immediate ANDs,
both because it can handle some values that AND IMMEDIATE can't,
and because it allows the destination register to be different from
the source. I realized later while implementing the distinct-ops
support that it would be better to leave the choice up to
convertToThreeAddress() instead. The AND IMMEDIATE form is shorter
and is less likely to be cracked.
This is a problem for 32-bit ANDs because we assume that all 32-bit
operations will leave the high word untouched, whereas RISBG used in
this way will either clear the high word or copy it from the source
register. The patch uses the z196 instruction RISBLG for this instead.
This means that z10 will be restricted to NILL, NILH and NILF for
32-bit ANDs, but I think that should be OK for now. Although we're
using z10 as the base architecture, the optimization work is going
to be focused more on z196 and zEC12.
llvm-svn: 187492
All insertf*/extractf* functions replaced with insert/extract since we have insertf and inserti forms.
Added lowering for INSERT_VECTOR_ELT / EXTRACT_VECTOR_ELT for 512-bit vectors.
Added lowering for EXTRACT/INSERT subvector for 512-bit vectors.
Added a test.
llvm-svn: 187491
When simplifying a (or (and B A) (and C ~A)) to a (VBSL A B C) ensure that the
bitwidth of the second operands to both ands match before comparing the negation
of the values.
Split the check of the value of the second operands to the ands. Move the cast
and variable declaration slightly higher to make it slightly easier to follow.
Bug-Id: 16700
Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org>
llvm-svn: 187404
This patch prevents the following combine when the input vector is used more
than once.
insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx
=>
build_vector elt0, ..., NewEltIdx, ..., eltN
The reasons are:
- Building a vector may be expensive, so try to reuse the existing part of a
vector instead of creating a new one (think big vectors).
- elt0 to eltN now have two users instead of one. This may prevent some other
optimizations.
llvm-svn: 187396
Also always add DIType, DISubprogram and DIGlobalVariable to the list
in DebugInfoFinder without checking them, so we can verify them later
on.
llvm-svn: 187285
We used to call Verify before adding DICompileUnit to the list, and now we
remove the check and always add DICompileUnit to the list in DebugInfoFinder,
so we can verify them later on.
llvm-svn: 187237
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.
This also adds a test case for NVPTX that depends on this custom
legalization.
Differential Revision: http://llvm-reviews.chandlerc.com/D1195
Attempt to fix the buildbots by making the X86 test I just added platform independent
llvm-svn: 187202
This reverts commit 187198. It broke the bots.
The soft float test probably needs a -triple because of name differences.
On the hard float test I am getting a "roundss $1, %xmm0, %xmm0", instead of
"vroundss $1, %xmm0, %xmm0, %xmm0".
llvm-svn: 187201
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.
This also adds a test case for NVPTX that depends on this custom
legalization.
Differential Revision: http://llvm-reviews.chandlerc.com/D1195
llvm-svn: 187198
The previous change to local live range allocation also suppressed
eviction of local ranges. In rare cases, this could result in more
expensive register choices. This commit actually revives a feature
that I added long ago: check if live ranges can be reassigned before
eviction. But now it only happens in rare cases of evicting a local
live range because another local live range wants a cheaper register.
The benefit is improved code size for some benchmarks on x86 and armv7.
I measured no significant compile time increase and performance
changes are noise.
llvm-svn: 187140
Also avoid locals evicting locals just because they want a cheaper register.
Problem: MI Sched knows exactly how many registers we have and assumes
they can be colored. In cases where we have large blocks, usually from
unrolled loops, greedy coloring fails. This is a source of
"regressions" from the MI Scheduler on x86. I noticed this issue on
x86 where we have long chains of two-address defs in the same live
range. It's easy to see this in matrix multiplication benchmarks like
IRSmk and even the unit test misched-matmul.ll.
A fundamental difference between the LLVM register allocator and
conventional graph coloring is that in our model a live range can't
discover its neighbors, it can only verify its neighbors. That's why
we initially went for greedy coloring and added eviction to deal with
the hard cases. However, for singly defined and two-address live
ranges, we can optimally color without visiting neighbors simply by
processing the live ranges in instruction order.
Other beneficial side effects:
It is much easier to understand and debug regalloc for large blocks
when the live ranges are allocated in order. Yes, global allocation is
still very confusing, but it's nice to be able to comprehend what
happened locally.
Heuristics could be added to bias register assignment based on
instruction locality (think late register pairing, banks...).
Intuituvely this will make some test cases that are on the threshold
of register pressure more stable.
llvm-svn: 187139
Before the patch we took advantage of the fact that the compare and
branch are glued together in the selection DAG and fused them together
(where possible) while emitting them. This seemed to work well in practice.
However, fusing the compare so early makes it harder to remove redundant
compares in cases where CC already has a suitable value. This patch
therefore uses the peephole analyzeCompare/optimizeCompareInstr pair of
functions instead.
No behavioral change intended, but it paves the way for a later patch.
llvm-svn: 187116
These instructions are allowed to trap even if the condition is false,
so for now they are only used for "*ptr = (cond ? x : *ptr)"-style
constructs.
llvm-svn: 187111
Make sure the context and type fields are MDNodes. We will generate
verification errors if those fields are non-empty strings.
Fix testing cases to make them pass the verifier.
llvm-svn: 187106
There's no need to specify a flag to omit frame pointer elimination on non-leaf
nodes...(Honestly, I can't parse that option out.) Use the function attribute
stuff instead.
llvm-svn: 187093
Prior to this patch, IfConverter may widen the cases where a sequence of
instructions were executed because of the way it uses nested predicates. This
result in incorrect execution.
For instance, Let A be a basic block that flows conditionally into B and B be a
predicated block.
B can be predicated with A.BrToBPredicate into A iff B.Predicate is less
"permissive" than A.BrToBPredicate, i.e., iff A.BrToBPredicate subsumes
B.Predicate.
The IfConverter was checking the opposite: B.Predicate subsumes
A.BrToBPredicate.
<rdar://problem/14379453>
llvm-svn: 187071
These are really the same address space in hardware. The only
difference is that CONSTANT_ADDRESS uses a special cache for faster
access. When we are unable to use the constant kcache for some reason
(e.g. smaller types or lack of indirect addressing) then the instruction
selector must use GLOBAL_ADDRESS loads instead.
llvm-svn: 187006
When vectors are built from a single value, the ARM lowering issues a
scalar_to_vector node.
This node is then always morphed into a move from the general purpose unit to
the vector unit.
When the value comes from a load, this can be simplified into a vector load to
the right lane.
This patch changes the lowering of insert_vector_elt to expose a vector
friendly pattern in this situation.
This is a step toward fixing <rdar://problem/14170854>.
llvm-svn: 186999
This increases the number of opportunites we have for folding. With the
previous implementation we were unable to fold into any instructions
other than the first when multiple instructions were selected from a
single SDNode.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186919
A side-effect of this is that now the compiler expects kernel arguments
to be 4-byte aligned.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186916
MDNodes used by DbgDeclareInst and DbgValueInst.
Another 16 testing cases failed and they are disabled with
-disable-debug-info-verifier.
A total of 34 cases are disabled with -disable-debug-info-verifier and will be
corrected.
llvm-svn: 186902
instructions. With this patch:
1. ldr.n is recognized as mnemonic for the short encoding
2. ldr.w is recognized as menmonic for the long encoding
3. ldr will map to either short or long encodings depending on the size of the offset
llvm-svn: 186831
indirect branches correctly. Under some circumstances, this led to the deletion
of basic blocks that were the destination of indirect branches. In that case it
left indirect branches to nowhere in the code.
This patch replaces, and is more general than either of the previous fixes for
indirect-branch-analysis issues, r181161 and r186461.
For other branches (not indirect) this refactor should have *almost* identical
behavior to the previous version. There are some corner cases where this
refactor is able to analyze blocks that the previous version could not (e.g.
this necessitated the update to thumb2-ifcvt2.ll).
<rdar://problem/14464830>
llvm-svn: 186735
We were incorrectly using compiler_used instead of compiler.used. Unfortunately
the passes using the broken name had tests also using the broken name.
llvm-svn: 186705
The atomic tests assume the two-operand forms, so I've restricted them to z10.
Running and-01.ll, or-01.ll and xor-01.ll for z196 as well as z10 shows why
using convertToThreeAddress() is better than exposing the three-operand forms
first and then converting back to two operands where possible (which is what
I'd originally tried). Using the three-operand form first stops us from
taking advantage of NG, OG and XG for spills.
llvm-svn: 186683
1> Use DebugInfoFinder to find debug info MDNodes.
2> Add disable-debug-info-verifier to disable verifying debug info.
3> Disable verifying for testing cases that fail (will update the testing cases
later on).
4> MDNodes generated by clang can have empty filename for TAG_inheritance and
TAG_friend, so DIType::Verify is modified accordingly.
Note that DebugInfoFinder does not list all debug info MDNode.
For example, clang can generate:
metadata !{i32 786468}, which will fail to verify.
This MDNode is used by debug info but not included in DebugInfoFinder.
This MDNode is generated as a temporary node in DIBuilder::createFunction
Value *TElts[] = { GetTagConstant(VMContext, DW_TAG_base_type) };
MDNode::getTemporary(VMContext, TElts)
llvm-svn: 186634
All changes were made by the following bash script:
find test/CodeGen -name "*.ll" | \
while read NAME; do
echo "$NAME"
grep -q "^; *RUN: *llc.*debug" $NAME && continue
grep -q "^; *RUN:.*llvm-objdump" $NAME && continue
grep -q "^; *RUN: *opt.*" $NAME && continue
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\([A-Za-z0-9_-]*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC[:]* *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
done
sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
mv $TEMP $NAME
done
This script catches a superset of the cases caught by the script associated with commit r186280. It initially found some false positives due to unusual constructs in a minority of tests; all such cases were disambiguated first in commit r186621.
llvm-svn: 186624
The original code only folded SRA into ROTATE ... SELECTED BITS
if there was no outer shift. This patch splits out that check
and generalises it slightly. The extra cases aren't really that
interesting, but this is paving the way for RNSBG support.
llvm-svn: 186571
Support for dynamic stack alignments in the PPC backend has been unfinished, in
part because it depends on dynamic stack realignment (which I only just
recently implemented fully). Now we can also support dynamic allocas with
higher than the default target stack alignment (16 bytes).
In order to round-up the requested size to the maximum requested alignment, we
need an additional register to hold the rounded-up size. We're already using one
scavenged register to hold the previous stack-pointer value (which needs to be
stored with the signal-safe stdux update), and so when we have dynamic allocas
and a large alignment, we allocate two emergency spill slots for the scavenger.
llvm-svn: 186562
First, this changes the base-pointer implementation to remove an unnecessary
complication (and one that is incompatible with how builtin SjLj is
implemented): instead of using r31 as the base pointer when it is not needed as
a frame pointer, now the base pointer will always be r30 when needed.
Second, we introduce another pseudo register, BP, which is used just like the FP
pseudo register to refer to the base register before we know for certain what
register it will be.
Third, we now save BP into the jmp_buf, and restore r30 from that slot in
longjmp. If the function that called setjmp did not use a base pointer, then
r30 will be overwritten by the setjmp-calling-function's restore code. FP
restoration (which is restored into r31) works the same way.
llvm-svn: 186545
Because the builtin longjmp implementation uses a CTR-based indirect jump, when
the control flow arrives at the builtin setjmp call, the CTR register has
necessarily been clobbered. Correspondingly, this adds CTR to the list of
implicit definitions of the builtin setjmp pseudo instruction.
We don't need to add CTR to the implicit definitions of builtin longjmp
because, even though it does clobber the CTR register, the control flow cannot
return to inside the loop unless there is also a builtin setjmp call.
llvm-svn: 186488
This builds on some frame-lowering code that has existed since 2005 (r24224)
but was disabled in 2008 (r48188) because it needed base pointer support to
function correctly. This implementation follows the strategy suggested by Dale
Johannesen in r48188 where the following comment was added:
This does not currently work, because the delta between old and new stack
pointers is added to offsets that reference incoming parameters after the
prolog is generated, and the code that does that doesn't handle a variable
delta. You don't want to do that anyway; a better approach is to reserve
another register that retains to the incoming stack pointer, and reference
parameters relative to that.
And now we do exactly that. If we don't need a frame pointer, then we use r31
as a base pointer. If we do need a frame pointer, then we use r30 as a base
pointer. The base pointer retains the value of the stack pointer before it was
decremented in the prologue. We then use the base pointer to resolve all
negative frame indicies. The basic scheme follows that for base pointers in the
X86 backend.
We use a base pointer when we need to dynamically realign the incoming stack
pointer. This currently applies only to static objects (dynamic allocas with
large alignments, and base-pointer support in SjLj lowering will come in future
commits).
llvm-svn: 186478
Use PMIN/PMAX for UGE/ULE vector comparions to reduce the number of required
instructions. This trick also works for UGT/ULT, but there is no advantage in
doing so. It wouldn't reduce the number of instructions and it would actually
reduce performance.
Reviewer: Ben
radar:5972691
llvm-svn: 186432
When truncating to a format with fewer mantissa bits, APFloat::convert
will perform a right shift of the mantissa by the difference of the
precision of the two formats. Usually, this will result in just the
mantissa bits needed for the target format.
One special situation is if the input number is denormal. In this case,
the right shift may discard significant bits. This is usually not a
problem, since truncating a denormal usually results in zero (underflow)
after normalization anyway, since the result format's exponent range is
usually smaller than the target format's.
However, there is one case where the latter property does not hold:
when truncating from ppc_fp128 to double. In particular, truncating
a ppc_fp128 whose first double of the pair is denormal should result
in just that first double, not zero. The current code however
performs an excessive right shift, resulting in lost result bits.
This is then caught in the APFloat::normalize call performed by
APFloat::convert and causes an assertion failure.
This patch checks for the scenario of truncating a denormal, and
attempts to (possibly partially) replace the initial mantissa
right shift by decrementing the exponent, if doing so will still
result in a valid *target format* exponent.
Index: test/CodeGen/PowerPC/pr16573.ll
===================================================================
--- test/CodeGen/PowerPC/pr16573.ll (revision 0)
+++ test/CodeGen/PowerPC/pr16573.ll (revision 0)
@@ -0,0 +1,11 @@
+; RUN: llc < %s | FileCheck %s
+
+target triple = "powerpc64-unknown-linux-gnu"
+
+define double @test() {
+ %1 = fptrunc ppc_fp128 0xM818F2887B9295809800000000032D000 to double
+ ret double %1
+}
+
+; CHECK: .quad -9111018957755033591
+
Index: lib/Support/APFloat.cpp
===================================================================
--- lib/Support/APFloat.cpp (revision 185817)
+++ lib/Support/APFloat.cpp (working copy)
@@ -1956,6 +1956,23 @@
X86SpecialNan = true;
}
+ // If this is a truncation of a denormal number, and the target semantics
+ // has larger exponent range than the source semantics (this can happen
+ // when truncating from PowerPC double-double to double format), the
+ // right shift could lose result mantissa bits. Adjust exponent instead
+ // of performing excessive shift.
+ if (shift < 0 && isFiniteNonZero()) {
+ int exponentChange = significandMSB() + 1 - fromSemantics.precision;
+ if (exponent + exponentChange < toSemantics.minExponent)
+ exponentChange = toSemantics.minExponent - exponent;
+ if (exponentChange < shift)
+ exponentChange = shift;
+ if (exponentChange < 0) {
+ shift -= exponentChange;
+ exponent += exponentChange;
+ }
+ }
+
// If this is a truncation, perform the shift before we narrow the storage.
if (shift < 0 && (isFiniteNonZero() || category==fcNaN))
lostFraction = shiftRight(significandParts(), oldPartCount, -shift);
llvm-svn: 186409
Another patch in the series to make more use of R.SBG. This one extends
r186072 and r186073 to handle cases where the AND is inside the shift.
llvm-svn: 186399
This patch enables calls to __aeabi_idivmod when in EABI mode,
by using the remainder value returned on registers (R1),
enabled by the ARM triple "none-eabi". Note that Darwin and
GNUEABI triples will continue lowering on GNU style, that is,
using the stack for the remainder.
Still need to add SREM/UREM support fix for 64-bit lowering.
llvm-svn: 186390
We can have a FrameSetup in one basic block and the matching FrameDestroy
in a different basic block when we have struct byval. In that case, SPAdj
is not zero at beginning of the basic block.
Modify PEI to correctly set SPAdj at beginning of each basic block using
DFS traversal. We used to assume SPAdj is 0 at beginning of each basic block.
PEI had an assert SPAdjCount || SPAdj == 0.
If we have a Destroy <n> followed by a Setup <m>, PEI will assert failure.
We can add an extra condition to make sure the pairs are matched:
The pairs start with a FrameSetup.
But since we are doing a much better job in the verifier, this patch removes
the check in PEI.
PR16393
llvm-svn: 186364
PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the
common subclass of the true and false inputs, and then selecting either the
32-bit or the 64-bit isel variant based on the result of calling
PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC)
(where RC is the common subclass). Unfortunately, this is not quite right: if
we have something like this:
%vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76;
G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6
then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and
G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8
pseudo-register). As a result, we also need to check the common subclass
against GPRC_NOR0 and G8RC_NOX0 explicitly.
This had not been a problem for clients of insertSelect that called
canInsertSelect first (because it had a compensating mistake), but insertSelect
is also used by the PPC pseudo-instruction expander, and this error was causing
a problem in that context.
This problem was found by csmith.
llvm-svn: 186343
There is a comment at the top of DAGTypeLegalizer::PerformExpensiveChecks
which, in part, says:
// Note that these invariants may not hold momentarily when processing a node:
// the node being processed may be put in a map before being marked Processed.
Unfortunately, this assert would be valid only if the above-mentioned invariant
held unconditionally. This was causing llc to assert when, in fact,
everything was fine.
Thanks to Richard Sandiford for investigating this issue!
Fixes PR16562.
llvm-svn: 186338
This update was done with the following bash script:
find test/CodeGen -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
done
sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
mv $TEMP $NAME
fi
done
llvm-svn: 186280
This was done with the following sed invocation to catch label lines demarking function boundaries:
sed -i '' "s/^;\( *\)\([A-Z0-9_]*\):\( *\)test\([A-Za-z0-9_-]*\):\( *\)$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen/*/*.ll
which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct.
llvm-svn: 186258
ARM paired GPR COPY was being lowered to two MOVr without CC. This
patch puts the CC back.
My test is a reduction of the case where I encountered the issue,
64-bit atomics use paired GPRs.
The issue only occurs with selectionDAG, FastISel doesn't encounter it
so I didn't bother calling it.
llvm-svn: 186226
In particular:
movsbw %al, %ax --> cbtw
movswl %ax, %eax --> cwtl
movslq %eax, %rax --> cltq
According to Intel's manual those have the same performance characteristics but
come with a smaller encoding.
llvm-svn: 186174
Normal (sext (setcc ...)) sequences are optimised into
(select_cc ..., -1, 0) by DAGCombiner::visitSIGN_EXTEND.
However, this is deliberately not done for vectors, and after
vector type legalization we have (sext_inreg (setcc ...)) instead.
I wondered about trying to extend DAGCombiner to handle this case too,
but it seemed to be a loss on some other targets I tried, even those for
which SETCC isn't "legal" and SELECT_CC is.
llvm-svn: 186149
If the source of these instructions is spilled we should load the destination.
If the destination is spilled we should store the source.
llvm-svn: 186147
Summary:
This patch adds explicit calling convention types for the Win64 and
System V/x86-64 ABIs. This allows code to override the default, and use
the Win64 convention on a target that wants to use SysV (and
vice-versa). This is needed to implement the `ms_abi` and `sysv_abi` GNU
attributes.
Reviewers:
CC:
llvm-svn: 186144
We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for
v8i16 (which occurs in the test case) or v16i8. The same was true for
V_SETALLONES (so I added the associated patterns for those as well).
Another bug found by llvm-stress.
llvm-svn: 186108
This fixes a bug (found by csmith) at -O0 where we attempt to create a RLWIMI
with an out-of-range operand. Most uses of the isRunOfOnes function are guarded
by a condition that the value is not zero. This was not true in two places, and
in both places a zero input would result in an out-of-rage MB value (= 32).
To fix this, isRunOfOnes returns false on a zero input (and I've remove one
now-redundant guard).
llvm-svn: 186101
RISBG can handle some ANDs for which no AND IMMEDIATE exists.
It also acts as a three-operand AND for some cases where an
AND IMMEDIATE could be used instead.
It might be worth adding a pass to replace RISBG with AND IMMEDIATE
in cases where the register operands end up being the same and where
AND IMMEDIATE is smaller.
llvm-svn: 186072
When computing currently-live registers, the register scavenger excludes undef
uses. As a result, undef uses are ignored when computing the restore points of
registers spilled into the emergency slots. While the register scavenger
normally excludes from consideration, when scavenging, registers used by the
current instruction, we need to not exclude undef uses. Otherwise, we might end
up requiring more emergency spill slots than we have (in the case where the
undef use *is* the currently-spilled register).
Another bug found by llvm-stress.
llvm-svn: 186067
I had thought that these tests could be target-neutral, but in practice this is
not the case (on some targets, like Hexagon and Darwin), they trigger an assert
(a different assert than the one that r186044 fixes).
llvm-svn: 186051
Propagate the fix from r185712 to Thumb2 codegen as well. Original
commit message applies here as well:
A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and
packs them in the bottom half of "x". An arithmetic and logic shift are
only equivalent in this context if the shift amount is 16. We would be
shifting in ones into the bottom 16bits instead of zeros if "y" is
negative.
rdar://14338767
llvm-svn: 185982
Change the informal convention of DBG_VALUE machine instructions so that
we can express a register-indirect address with an offset of 0.
The old convention was that a DBG_VALUE is a register-indirect value if
the offset (operand 1) is nonzero. The new convention is that a DBG_VALUE
is register-indirect if the first operand is a register and the second
operand is an immediate. For plain register values the combination reg,
reg is used. MachineInstrBuilder::BuildMI knows how to build the new
DBG_VALUES.
rdar://problem/13658587
llvm-svn: 185966
Because integer BUILD_VECTOR operands may have a larger type than the result's
vector element type, and all operands must have the same type, when widening a
BUILD_VECTOR node by adding UNDEFs, we cannot use the vector element type, but
rather must use the type of the existing operands.
Another bug found by llvm-stress.
llvm-svn: 185960
A more complete example of the bug in PR16556 was recently provided,
showing that the previous fix was not sufficient. The previous fix is
reverted herein.
The real problem is that ReplaceNodeResults() uses LowerFP_TO_INT as
custom lowering for FP_TO_SINT during type legalization, without
checking whether the input type is handled by that routine.
LowerFP_TO_INT requires the input to be f32 or f64, so we fail when
the input is ppcf128.
I'm leaving the test case from the initial fix (r185821) in place, and
adding the new test as another crash-only check.
llvm-svn: 185959
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:
1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.
2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.
3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.
The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.
llvm-svn: 185956
ScalarEvolution::getSignedRange uses ComputeNumSignBits from ValueTracking on
ashr instructions. ComputeNumSignBits can return zero, but this case was not
handled correctly by the code in getSignedRange which was calling:
APInt::getSignedMinValue(BitWidth).ashr(NS - 1)
with NS = 0, resulting in an assertion failure in APInt::ashr.
Now, we just return the conservative result (as with NS == 1).
Another bug found by llvm-stress.
llvm-svn: 185955
When folding sub x, x (and other similar constructs), where x is a vector, the
result is a vector of zeros. After type legalization, make sure that the input
zero elements have a legal type. This type may be larger than the result's
vector element type.
This was another bug found by llvm-stress.
llvm-svn: 185949
In the commit message to r185476 I wrote:
>The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
>correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
>This causes some confusion with the asm parser, since VK_PPC_TLSGD
>is output as @tlsgd, which is then read back in as VK_TLSGD.
>
>To avoid this confusion, this patch removes the PowerPC-specific
>modifiers and uses the generic modifiers throughout. (The only
>drawback is that the generic modifiers are printed in upper case
>while the usual convention on PowerPC is to use lower-case modifiers.
>But this is just a cosmetic issue.)
This was unfortunately incorrect, there is is fact another,
serious drawback to using the default VK_TLSLD/VK_TLSGD
variant kinds: using these causes ELFObjectWriter::RelocNeedsGOT
to return true, which in turn causes the ELFObjectWriter to emit
an undefined reference to _GLOBAL_OFFSET_TABLE_.
This is a problem on powerpc64, because it uses the TOC instead
of the GOT, and the linker does not provide _GLOBAL_OFFSET_TABLE_,
so the symbol remains undefined. This means shared libraries
using TLS built with the integrated assembler are currently
broken.
While the whole RelocNeedsGOT / _GLOBAL_OFFSET_TABLE_ situation
probably ought to be properly fixed at some point, for now I'm
simply reverting the r185476 commit. Now this in turn exposes
the breakage of handling @tlsgd/@tlsld in the asm parser that
this check-in was originally intended to fix.
To avoid this regression, I'm also adding a different fix for
this problem: while common code now parses @tlsgd as VK_TLSGD,
a special hack in the asm parser translates this code to the
platform-specific VK_PPC_TLSGD that the back-end now expects.
While this is not really pretty, it's self-contained and
shouldn't hurt anything else for now. One the underlying
problem is fixed, this hack can be reverted again.
llvm-svn: 185945
Test is not included as it is several 1000 lines long.
To test this functionnality, a test case must generate at least 2 ALU clauses,
where an ALU clause is ~110 instructions long.
NOTE: This is a candidate for the stable branch.
llvm-svn: 185943
This patch broke `make check-asan` on Mac, causing ld warnings like the following one:
ld: warning: direct access in __GLOBAL__I_a to global weak symbol
___asan_mapping_scale means the weak symbol cannot be overridden at
runtime. This was likely caused by different translation units being
compiled with different visibility settings.
The resulting test binaries crashed with incorrect ASan warnings.
llvm-svn: 185923
Look for patterns of the form (store (load ...), ...) in which the two
locations are known not to partially overlap. (Identical locations are OK.)
These sequences are better implemented by MVC unless either the load or
the store could use RELATIVE LONG instructions.
The testcase showed that we weren't using LHRL and LGHRL for extload16,
only sextloadi16. The patch fixes that too.
llvm-svn: 185919
Use "STC;MVC" for memsets that are too big for two STCs or MV...Is yet
small enough for a single MVC. As with memcpy, I'm leaving longer cases
till later.
The number of tests might seem excessive, but f33 & f34 from memset-04.ll
failed the first cut because I'd not added the "?:" on the calculation
of Size1.
llvm-svn: 185918
This fixes another bug found by llvm-stress!
If we happen to be doing an i64 load or store into a stack slot that has less
than a 4-byte alignment, then the frame-index elimination may need to use an
indexed load or store instruction (because the offset may not be a multiple of
4, a requirement of the STD/LD instructions). The extra register needed to hold
the offset comes from the register scavenger, and it is possible that the
scavenger will need to use an emergency spill slot. As a result, we need to
make sure that a spill slot is allocated when doing an i64 load/store into a
less-than-4-byte-aligned stack slot.
Because test cases for things like this tend to be fairly fragile, I've
concatenated a few small bugpoint-reduced test cases together to form the
regression test.
llvm-svn: 185907
The Mach-O linker has been able to support the weak-def bit on any symbol for
quite a while now. The compiler however continued to place these symbols into a
"coal" section, which required the linker to map them back to the base section
name.
Replace the sections like this:
__TEXT/__textcoal_nt instead use __TEXT/__text
__TEXT/__const_coal instead use __TEXT/__const
__DATA/__datacoal_nt instead use __DATA/__data
<rdar://problem/14265330>
llvm-svn: 185872
A setting in MCAsmInfo defines the "assembler dialect" to use. This is used
by common code to choose between alternatives in a multi-alternative GNU
inline asm statement like the following:
__asm__ ("{sfe|subfe} %0,%1,%2" : "=r" (out) : "r" (in1), "r" (in2));
The meaning of these dialects is platform specific, and GCC defines those
for PowerPC to use dialect 0 for old-style (POWER) mnemonics and 1 for
new-style (PowerPC) mnemonics, like in the example above.
To be compatible with inline asm used with GCC, LLVM ought to do the same.
Specifically, this means we should always use assembler dialect 1 since
old-style mnemonics really aren't supported on any current platform.
However, the current LLVM back-end uses:
AssemblerDialect = 1; // New-Style mnemonics.
in PPCMCAsmInfoDarwin, and
AssemblerDialect = 0; // Old-Style mnemonics.
in PPCLinuxMCAsmInfo.
The Linux setting really isn't correct, we should be using new-style
mnemonics everywhere. This is changed by this commit.
Unfortunately, the setting of this variable is overloaded in the back-end
to decide whether or not we are on a Darwin target. This is done in
PPCInstPrinter (the "SyntaxVariant" is initialized from the MCAsmInfo
AssemblerDialect setting), and also in PPCMCExpr. Setting AssemblerDialect
to 1 for both Darwin and Linux no longer allows us to make this distinction.
Instead, this patch uses the MCSubtargetInfo passed to createPPCMCInstPrinter
to distinguish Darwin targets, and ignores the SyntaxVariant parameter.
As to PPCMCExpr, this patch adds an explicit isDarwin argument that needs
to be passed in by the caller when creating a target MCExpr. (To do so
this patch implicitly also reverts commit 184441.)
llvm-svn: 185858
Another bug found by llvm-stress! This fixes hitting
llvm_unreachable("Invalid integer vector compare condition");
at the end of getVCmpInst in PPCISelDAGToDAG.
llvm-svn: 185855
PPCTargetLowering::LowerFP_TO_INT() expects its source operand to be
either an f32 or f64, but this is not checked. A long double
(ppcf128) operand will normally be custom-lowered to a conversion to
f64 in this context. However, this isn't the case for an UNDEF node.
This patch recognizes a ppcf128 as a legal source operand for
FP_TO_INT only if it's an undef, in which case it creates an undef of
the target type.
At some point we might want to do a wholesale custom lowering of
ISD::UNDEF when the type is ppcf128, but it's not really clear that's
a great idea, and probably more work than it's worth for a situation
that only arises in the case of a programming error. At this point I
think simple is best.
The test case comes from PR16556, and is a crash-test only.
llvm-svn: 185821
This fixes a bug (found by llvm-stress) in
DAGTypeLegalizer::PromoteIntRes_BUILD_VECTOR where it assumed that the result
type would always be larger than the original operands. This is not always
true, however, with boolean vectors. For example, promoting a node of type v8i1
(where the operands will be of type i32, the type to which i1 is promoted) will
yield a node with a result vector element type of i16 (and operands of type
i32). As a result, we cannot blindly assume that we can ANY_EXTEND the operands
to the result type.
llvm-svn: 185794
ReduceLoadWidth unconditionally drops extensions from loads. Limit it to the
case when all of the bits the extension would otherwise produce are dropped by
the shrink. It would be possible to shrink the load in more cases by merging
the extensions, but this isn't trivial and a very rare case. I left a TODO for
that case.
Fixes PR16551.
llvm-svn: 185755
This prevents the emission of DAG-generated vreg definitions after a
tail call be dropping them entirely (on the grounds that nothing could
use them anyway, and they interfere with O0 CodeGen).
llvm-svn: 185754
A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and packs them
in the bottom half of "x". An arithmetic and logic shift are only equivalent in
this context if the shift amount is 16. We would be shifting in ones into the
bottom 16bits instead of zeros if "y" is negative.
radar://14338767
llvm-svn: 185712
The stack coloring pass has code to delete stores and loads that become
trivially dead after coloring. Extend it to cope with single instructions
that copy from one frame index to another.
The testcase happens to show an example of this kicking in at the moment.
It did occur in Real Code too though.
llvm-svn: 185705
The stack coloring pass renumbered frame indexes with a loop of the form:
for each frame index FI
for each instruction I that uses FI
for each use of FI in I
rename FI to FI'
This caused problems if an instruction used two frame indexes F0 and F1
and if F0 was renamed to F1 and F1 to F2. The first time we visited the
instruction we changed F0 to F1, then we changed both F1s to F2.
In other words, the problem was that SSRefs recorded which instructions
used an FI, but not which MachineOperands and MachineMemOperands within
that instruction used it.
This is easily fixed for MachineOperands by walking the instructions
once and processing each operand in turn. There's already a loop to
do that for dead store elimination, so it seemed more efficient to
fuse the two at the block level.
MachineMemOperands are more tricky because they can be shared between
instructions. The patch handles them by making SSRefs an array of
MachineMemOperands rather than an array of MachineInstrs. We might end
up processing the same MachineMemOperand twice, but that's OK because
we always know from the SSRefs index what the original frame index was.
llvm-svn: 185703
...now that the problem that prompted the restriction has been fixed.
The original spill-02.py was a compromise because at the time I couldn't
find an example that actually failed without the two scavenging slots.
The version included here did.
llvm-svn: 185701
This is another prerequisite for frame-to-frame MVC copies.
I'll commit the patch that makes use of the slot separately.
The downside of trying to test many corner cases with each of the
available addressing modes is that a fair few tests need to account
for the new frame layout. I do still think it's useful to have all
these tests though, since it's something that wouldn't get much coverage
otherwise.
llvm-svn: 185698
In the SelectionDAG immediate operands to inline asm are constructed as
two separate operands. The first is a constant of value InlineAsm::Kind_Imm
and the second is a constant with the value of the immediate.
In ARMDAGToDAGISel::SelectInlineAsm, if we reach an operand of Kind_Imm we
should skip over the next operand too.
llvm-svn: 185688
In the ARM back-end, build_vector nodes are lowered to a target specific
build_vector that uses floating point type.
This works well, unless the inserted bitcasts survive until instruction
selection. In that case, they incur moves between integer unit and floating
point unit that may result in inefficient code.
In other words, this conversion may introduce artificial dependencies when the
code leading to the build vector cannot be completed with a floating point type.
In particular, this happens when loads are not aligned.
Before this patch, in that case, the compiler generates general purpose loads
and creates the floating point vector from them, instead of directly using the
vector unit.
The patch uses a vector friendly sequence of code when the inserted bitcasts to
floating point survived DAGCombine.
This is done by a target specific DAGCombine that changes the target specific
build_vector into a sequence of insert_vector_elt that get rid of the bitcasts.
<rdar://problem/14170854>
llvm-svn: 185587
Just as with mfocrf, it is also preferable to use mtocrf instead of
mtcrf when only a single CR register is to be written.
Current code however always emits mtcrf. This probably does not matter
when using an external assembler, since the GNU assembler will in fact
automatically replace mtcrf with mtocrf when possible. It does create
inefficient code with the integrated assembler, however.
To fix this, this patch adds MTOCRF/MTOCRF8 instruction patterns and
uses those instead of MTCRF/MTCRF8 everything. Just as done in the
MFOCRF patch committed as 185556, these patterns will be converted
back to MTCRF if MTOCRF is not available on the machine.
As a side effect, this allows to modify the MTCRF pattern to accept
the full range of mask operands for the benefit of the asm parser.
llvm-svn: 185561
Add a mapping from register-based <INSN>R instructions to the corresponding
memory-based <INSN>. Use it to cut down on the number of spill loads.
Some instructions extend their operands from smaller fields, so this
required a new TSFlags field to say how big the unextended operand is.
This optimisation doesn't trigger for C(G)R and CL(G)R because in practice
we always combine those instructions with a branch. Adding a test for every
other case probably seems excessive, but it did catch a missed optimisation
for DSGF (fixed in r185435).
llvm-svn: 185529
Swift cores implement store barriers that are stronger than the ARM
specification but weaker than general barriers. They are, in fact, just about
enough to provide the ordering needed for atomic operations with release
semantics.
This patch makes use of that quirk.
llvm-svn: 185527
The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
This causes some confusion with the asm parser, since VK_PPC_TLSGD
is output as @tlsgd, which is then read back in as VK_TLSGD.
To avoid this confusion, this patch removes the PowerPC-specific
modifiers and uses the generic modifiers throughout. (The only
drawback is that the generic modifiers are printed in upper case
while the usual convention on PowerPC is to use lower-case modifiers.
But this is just a cosmetic issue.)
llvm-svn: 185476
Fixes some cases where we were using full 64-bit division for (sdiv i32, i32)
and (sdiv i64, i32).
The "32" in "SDIVREM32" just refers to the second operand. The first operand
of all *DIVREM*s is a GR128.
llvm-svn: 185435
Try to use MVC when spilling the destination of a simple load or the source
of a simple store. As explained in the comment, this doesn't yet handle
the case where the load or store location is also a frame index, since
that could lead to two simultaneous scavenger spills, something the
backend can't handle yet. spill-02.py tests that this restriction kicks in,
but unfortunately I've not yet found a case that would fail without it.
The volatile trick I used for other scavenger tests doesn't work here
because we can't use MVC for volatile accesses anyway.
I'm planning on relaxing the restriction later, hopefully with a test
that does trigger the problem...
Tests @f8 and @f9 also showed that L(G)RL and ST(G)RL were wrongly
classified as SimpleBDX{Load,Store}. It wouldn't be easy to test for
that bug separately, which is why I didn't split out the fix as a
separate patch.
llvm-svn: 185434
r182680 replaced CountLeadingZeros_32 with a template function
countLeadingZeros that relies on using the correct argument type to give
the right result. The type passed in the XCore backend after this
revision was incorrect in a couple of places.
Patch by Robert Lytton.
llvm-svn: 185430
DAGCombiner was counting all uses of a load node when considering whether it's
worth combining into a zextload. Really, it wants to ignore the chain and just
count real uses.
rdar://problem/13896307
llvm-svn: 185419
There are a couple of (small) related changes here:
1. The printed name of the VRSAVE register has been changed from VRsave to
vrsave in order to match the name accepted by GNU binutils.
2. Support for parsing vrsave has been added to the asm parser (it seems that
there was no test case specifically covering this code, so I've added one).
3. The list of Altivec registers, which was common to all calling conventions,
has been separated out. This allows us to define the base CSR lists, and then
lists for each ABI with Altivec included. This allows SjLj, for example, to
work correctly on non-Altivec targets without using unnatural definitions of
the NoRegs CSR list.
4. VRSAVE is now always reserved on non-Darwin targets and all Altivec
registers are reserved when Altivec is disabled.
With these changes, it is now possible to compile a function containing
__builtin_unwind_init() on Linux/PPC64 with debugging information. This did not
work previously because GNU binutils assumes that all .cfi_offset offsets will
be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned
offset). This is not true for the vrsave register, however, because this
register is used only on Darwin, GCC does not bother printing a .cfi_offset
entry for it (even though there is a slot in the stack frame for it as
specified by the ABI). This change allows us to do the same: we will also not
print .cfi_offset directives for vrsave.
llvm-svn: 185409
When phis get lowered, destination copies are inserted using an iterator that is
determined once for all phis in the block, which BuildMI interprets as a request
to insert an instruction directly before the iterator. In the case of a cyclic
phi, source copies may also be inserted directly before this iterator, which can
cause source copies to be inserted before destination copies. The fix is to keep
an iterator to the last phi and then advance it while lowering each phi in order
to insert destination copies directly after the phis.
llvm-svn: 185363
Although you can't generate this from C on PPC64, if you have a loop using a
64-bit counter on PPC32 then you can't form a CTR-based loop for it. This had
been cauing the PPCCTRLoops pass to assert.
Thanks to Joerg Sonnenberger for providing a test case!
llvm-svn: 185361
According to the AArch64 ELF specification (4.6.8), it's the
assembler's responsibility to make sure the shift amount is correct in
relocated MOVZ/MOVK instructions.
This wasn't being obeyed by either the MCJIT CodeGen or RuntimeDyldELF
(which happened to work out well for JIT tests). This commit should
make us compliant in this area.
llvm-svn: 185360
Turns out I'd misread the architecture reference manual and thought
that was a load/store-store barrier, when it's not.
Thanks for pointing it out Eli!
llvm-svn: 185356
I believe the full "dmb ish" barrier is not required to guarantee release
semantics for atomic operations. The weaker "dmb ishst" prevents previous
operations being reordered with a store executed afterwards, which is enough.
A key point to note (fortunately already correct) is that this barrier alone is
*insufficient* for sequential consistency, no matter how liberally placed.
llvm-svn: 185339
Fix a case where we were incorrectly sign-extending a value when we should have been zero-extending the value.
Also change some SIGN_EXTEND to ANY_EXTEND because we really dont care and may have more opportunity to fold subexpressions
llvm-svn: 185331
This fixes PR16418, which reports that a function calling
__builtin_unwind_init() asserts. The cause is that this generates a
spill/restore for VRSAVE, and we support that only on Darwin (because VRSAVE is
only really used on Darwin).
The test case checks only that we don't crash. We can add correctness checks
once someone verifies what behavior the function is supposed to have.
llvm-svn: 185235
On OpenBSD, the stack-smash protection transform uses "__guard_local"
and "__stack_smash_handler" instead of "__stack_chk_guard" and
"__stack_chk_fail". However, CodeGen/PowerPC/stack-protector.ll
doesn't specify a target OS, so on OpenBSD it fails.
Add -mtriple=ppc32-unknown-linux to make the test host-OS agnostic. While
there, convert to FileCheck.
Patch by Matthew Dempsky.
llvm-svn: 185206
Under certain (evidently rare) circumstances, this code used to convert OR(a,
AND(x, y)) into OR(a, x). This was incorrect.
While there, I've added a comment to the code immediately above.
llvm-svn: 185201
should expand ATOMIC_CMP_SWAP nodes the same way that it does for ATOMIC_SWAP.
Since ATOMIC_LOADs on some targets (e.g. older ARM variants) get legalized to
ATOMIC_CMP_SWAPs, the missing case had been causing i64 atomic loads to crash
during isel.
<rdar://problem/14074644>
llvm-svn: 185186
Fix ABI handling for function
returning bool -- use st.param.b32 to return the value
and use ld.param.b32 in caller to load the return value.
llvm-svn: 185177
This patch assigns paired GPRs for inline asm with
64-bit data on ARM. It's enabled for both ARM and Thumb to support modifiers
like %H, %Q, %R.
llvm-svn: 185169
We were generating intrinsics for NEON fixed-point conversions that didn't
exist (e.g. float -> i16). There are two cases to consider:
+ iN is smaller than float. In this case we can do the conversion but need an
extend or truncate as well.
+ iN is larger than float. In this case using the NEON conversion would be
incorrect so we don't perform any combining.
llvm-svn: 185158
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.
Also update testing cases to make them conform to the format of DI classes.
llvm-svn: 185135
The purpose of this test was to check boundary conditions for the size
of an ALU clause. This test is very sensitive to changes to the
optimizer or scheduler, because it requires an exact number of ALU
instructions in order to remain valid. It's not good to have a test
this sensitive, because it is confusing to developers who implement
optimizations and then 'break' the test.
I'm not sure if there is a good way to test these limits using lit, but
if I can come up with replacement test that isn't as sensitive I'll add
it back to the tree.
llvm-svn: 185084
Add pseudo conditional store instructions, so that we use:
branch foo:
store
foo:
instead of:
load
branch foo:
move
foo:
store
z196 has real 32-bit and 64-bit conditional stores, but we don't use
any z196 instructions yet.
llvm-svn: 185065
Note: Only adding test for evergreen, not SI yet.
When I attempted to expand vselect for SI, I got the following:
llc: /home/awatry/src/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:522:
llvm::SDValue llvm::DAGTypeLegalizer::PromoteIntRes_SETCC(llvm::SDNode*):
Assertion `SVT.isVector() == N->getOperand(0).getValueType().isVector() &&
"Vector compare must return a vector result!"' failed.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184847
No test/expansion for SI has been added yet. Attempts to expand this
operation for SI resulted in a stacktrace in (IIRC) LegalizeIntegerTypes
which was complaining about vector comparisons being required to return
a vector type.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184845
Also add lit test for both cases on SI, and v2i32 for evergreen.
Note: I followed the guidance of the v4i32 EG check... UREM produces really
complex code, so let's just check that the instruction was lowered
successfully.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184844
Also add lit test for both cases on SI, and v2i32 for evergreen.
Note: I followed the guidance of the v4i32 EG check... UDIV produces really
complex code, so let's just check that the instruction was lowered
successfully.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184843
In reality, some unaligned memory accesses are legal for 32-bit types and
smaller too, but it all depends on the address space. Allowing
unaligned loads/stores for > 32-bit types is mainly to prevent the
legalizer from splitting one load into multiple loads of smaller types.
https://bugs.freedesktop.org/show_bug.cgi?id=65873
llvm-svn: 184822
This should only make a difference in programs that use a lot of the
vector ALU instructions like BFI_INT and BIT_ALIGN. There is a slight
improvement in the phatk bitcoin mining kernel with this patch on
Evergreen (vector size == 1):
Before:
1173 Instruction Groups / 9520 dwords
After:
1167 Instruction Groups / 9510 dwords
Reviewed-by: Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 184819
This makes it possible to write unit tests that are less susceptible
to minor code motion, particularly copy placement. block-placement.ll
covers this case with -pre-RA-sched=source which will soon be
default. One incorrectly named block is already fixed, but without
this fix, enabling new coalescing and scheduling would cause more
failures.
llvm-svn: 184680
This is an awful implementation of the target hook. But we don't have
abstractions yet for common machine ops, and I don't see any quick way
to make it table-driven.
llvm-svn: 184664
A FastISel optimization was causing us to emit no information for such
parameters & when they go missing we end up emitting a different
function type. By avoiding that shortcut we not only get types correct
(very important) but also location information (handy) - even if it's
only live at the start of a function & may be clobbered later.
Reviewed/discussion by Evan Cheng & Dan Gohman.
llvm-svn: 184604
it at the moment.
This allows to form more paired loads even when stack coloring pass destroys the
memoryoperand's value.
<rdar://problem/13978317>
llvm-svn: 184492
Also add a v2i32 test to the existing v4i32 test.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry<awatry@gmail.com>
llvm-svn: 184482
Also add SI tests to existing file and a v2i32 test for both
R600 and SI.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 184481
The custom lowering causes llc to crash with a segfault.
Ideally, the custom lowering can be fixed, but this allows
programs which load/store v2i32 to work without crashing.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry<awatry@gmail.com>
llvm-svn: 184480
Fix up three tests - one that was relying on abbreviation number,
another relying on a location list in this case (& testing raw asm,
changed that to use dwarfdump on the debug_info now that that's where
the location is), and another which was added in r184368 - exposing a
bug in that fix that is exposed when we emit the location inline rather
than through a location list. Fix that bug while I'm here.
llvm-svn: 184387
value is zero.
This allows optmizations to kick in more easily.
Fix some test cases so that they remain meaningful (i.e., not completely dead
coded) when optimizations apply.
<rdar://problem/14096009> superfluous multiply by high part of zero-extended
value.
llvm-svn: 184222
The main advantages here are way better heuristics, taking into account not
just loop depth but also __builtin_expect and other static heuristics and will
eventually learn how to use profile info. Most of the work in this patch is
pushing the MachineBlockFrequencyInfo analysis into the right places.
This is good for a 5% speedup on zlib's deflate (x86_64), there were some very
unfortunate spilling decisions in its hottest loop in longest_match(). Other
benchmarks I tried were mostly neutral.
This changes register allocation in subtle ways, update the tests for it.
2012-02-20-MachineCPBug.ll was deleted as it's very fragile and the instruction
it looked for was gone already (but the FileCheck pattern picked up unrelated
stuff).
llvm-svn: 184105
Rather than using the full power of target-specific addressing modes in
DBG_VALUEs with Frame Indicies, simply use Frame Index + Offset. This
reduces the complexity of debug info handling down to two
representations of values (reg+offset and frame index+offset) rather
than three or four.
Ideally we could ensure that frame indicies had been eliminated by the
time we reached an assembly or dwarf generation, but I haven't spent the
time to figure out where the FIs are leaking through into that & whether
there's a good place to convert them. Some FI+offset=>reg+offset
conversion is done (see PrologEpilogInserter, for example) which is
necessary for some SelectionDAG assumptions about registers, I believe,
but it might be possible to make this a more thorough conversion &
ensure there are no remaining FIs no matter how instruction selection
is performed.
llvm-svn: 184066
Replace the ill-defined MinLatency and ILPWindow properties with
with straightforward buffer sizes:
MCSchedMode::MicroOpBufferSize
MCProcResourceDesc::BufferSize
These can be used to more precisely model instruction execution if desired.
Disabled some misched tests temporarily. They'll be reenabled in a few commits.
llvm-svn: 184032
Also add a seperate vector lit test file, since r600 doesn't seem to handle
v2i32 load/store yet, but we can test both for SI.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 184021
We were using RAT_INST_STORE_RAW, which seemed to work, but the docs
say this instruction doesn't exist for Cayman, so it's probably safer
to use a documented instruction instead.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 184015
When we're rematerializing into a not-quite-right register we already add the
real definition as an imp-def, but we should also be marking the "official"
register as dead, since nothing else is going to use it as a result of this
remat.
Not doing this can affect pressure tracking.
rdar://problem/14158833
llvm-svn: 184002
in functions which call __builtin_unwind_init()
__builtin_unwind_init() is an undocumented gcc intrinsic which has this effect,
and is used in libgcc_eh.
Goes part of the way toward fixing PR8541.
llvm-svn: 183984
This is a resubmit of r182877, which was reverted because it broken
MCJIT tests on ARM. The patch leaves MCJIT on ARM as it was before: only
enabled for iOS. I've CC'ed people from the original review and revert.
FastISel was only enabled for iOS ARM and Thumb2, this patch enables it
for ARM (not Thumb2) on Linux and NaCl, but not MCJIT.
Thumb2 support needs a bit more work, mainly around register class
restrictions.
The patch punts to SelectionDAG when doing TLS relocation on non-Darwin
targets. I will fix this and other FastISel-to-SelectionDAG failures in
a separate patch.
The patch also forces FastISel to retain frame pointers: iOS always
keeps them for backtracking (so emitted code won't change because of
this), but Linux was getting much worse code that was incorrect when
using big frames (such as test-suite's lencod). I'll also fix this in a
later patch, it will probably require a peephole so that FastISel
doesn't rematerialize frame pointers back-to-back.
The test changes are straightforward, similar to:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130513/174279.html
They also add a vararg test that got dropped in that change.
I ran all of lnt test-suite on A15 hardware with --optimize-option=-O0
and all the tests pass. All the tests also pass on x86 make check-all. I
also re-ran the check-all tests that failed on ARM, and they all seem to
pass.
llvm-svn: 183966
This is a preliminary patch for fast instruction selection on
PowerPC. Code generation can differ between DAG isel and fast isel.
Existing tests that specify -O0 were written to expect DAG isel. Make
this explicit by adding -fast-isel=false to the tests.
In some cases specifying -fast-isel=false produces different code even
when there isn't a fast instruction selector specified. This is
because TM.Options.EnableFastISel = 1 at -O0 whether or not a FastISel
object exists. Thus disabling fast isel can actually produce less
conservative code. Because of this, some of the expected code
generation in the -O0 tests needs to be adjusted.
In particular, handling of function arguments is less conservative
with -fast-isel=false (see isOnlyUsedInEntryBlock() in
SelectionDAGBuilder.cpp). This results in fewer stack accesses and,
in some cases, reduced stack size as uselessly loaded values are no
longer stored back to spill locations in the stack.
No functional change with this patch; test case adjustments only.
llvm-svn: 183939
The pass emits a call to sqrt that has attribute "read-none". This call will be
converted to an ISD::FSQRT node during DAG construction, which will turn into
a mips native sqrt instruction.
llvm-svn: 183802
Previously LEA64_32r went through virtually the entire backend thinking it was
using 32-bit registers until its blissful illusions were cruelly snatched away
by MCInstLower and 64-bit equivalents were substituted at the last minute.
This patch makes it behave normally, and take 64-bit registers as sources all
the way through. Previous uses (for 32-bit arithmetic) are accommodated via
SUBREG_TO_REG instructions which make the types and classes agree properly.
llvm-svn: 183693
the Mips16 port. A few of the psuedos could either take signed
or unsigned arguments and I did not distinguish the case and improperly
rejected some valid cases that the assembler had previously accepted
when they were pure pseudos that expanded as assembly instructions.
llvm-svn: 183633
Since we have ARM unwind directive parser and assembler, we
can check the correctness in two stages:
1. From LLVM assembly (.ll) to ARM assembly (.s)
2. From ARM assembly (.s) to ELF object file (.o)
We already have several "*.s to *.o" test cases. This CL adds
some "*.ll to *.s" test cases and removes the redundant "*.ll to *.o"
test cases.
New test cases to check "*.ll to *.s" code generator:
- ehabi.ll: Check the correctness of the generated unwind directives.
- section-name.ll: Check the section name of functions.
Removed test cases:
- ehabi-mc-cantunwind.ll
(Covered by ehabi-cantunwind.ll, and eh-directive-cantunwind.s)
- ehabi-mc-compact-pr0.ll
(Covered by ehabi.ll, eh-compact-pr0.s, eh-directive-save.s, and
eh-directive-setfp.s)
- ehabi-mc-compact-pr1.ll
(Covered by ehabi.ll, eh-compact-pr1.s, eh-directive-save.s, and
eh-directive-setfp.s)
- ehabi-mc.ll
(Covered by ehabi.ll, and eh-directive-integrated-test.s)
- ehabi-mc-section-group.ll
(Covered by section-name.ll, and eh-directive-section-comdat.s)
- ehabi-mc-section.ll
(Covered by section-name.ll, and eh-directive-section.s)
- ehabi-mc-sh_link.ll
(Covered by eh-directive-text-section.s, and eh-directive-section.s)
llvm-svn: 183628
Changes to ARM unwind opcode assembler:
* Fix multiple .save or .vsave directives. Besides, the
order is preserved now.
* For the directives which will generate multiple opcodes,
such as ".save {r0-r11}", the order of the unwind opcode
is fixed now, i.e. the registers with less encoding value
are popped first.
* Fix the $sp offset calculation. Now, we can use the
.setfp, .pad, .save, and .vsave directives at any order.
Changes to test cases:
* Add test cases to check the order of multiple opcodes
for the .save directive.
* Fix the incorrect $sp offset in the test case. The
stack pointer offset specified in the test case was
incorrect. (Changed test cases: ehabi-mc-section.ll and
ehabi-mc.ll)
* The opcode to restore $sp are slightly reordered. The
behavior are not changed, and the new output is same
as the output of GNU as. (Changed test cases:
eh-directive-pad.s and eh-directive-setfp.s)
llvm-svn: 183627
instantiation issue with non-standard type.
Add a backend option to warn on a given stack size limit.
Option: -mllvm -warn-stack-size=<limit>
Output (if limit is exceeded):
warning: Stack size limit exceeded (<actual size>) in <functionName>.
The longer term plan is to hook that to a clang warning.
PR:4072
<rdar://problem/13987214>.
llvm-svn: 183595
On PPC32, [su]div,rem on i64 types are transformed into runtime library
function calls. As a result, they are not allowed in counter-based loops (the
counter-loops verification pass caught this error; this change fixes PR16169).
llvm-svn: 183581
We weren't computing structure size correctly and we were relying on
the original alloca instruction to compute the offset, which isn't
always reliable.
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
llvm-svn: 183568
Option: -mllvm -warn-stack-size=<limit>
Output (if limit is exceeded):
warning: Stack size limit exceeded (<actual size>) in <functionName>.
The longer term plan is to hook that to a clang warning.
PR:4072
<rdar://problem/13987214>
llvm-svn: 183552
My recent ARM FastISel patch exposed this bug:
http://llvm.org/bugs/show_bug.cgi?id=16178
The root cause is that it can't select integer sext/zext pre-ARMv6 and
asserts out.
The current integer sext/zext code doesn't handle other cases gracefully
either, so this patch makes it handle all sext and zext from i1/i8/i16
to i8/i16/i32, with and without ARMv6, both in Thumb and ARM mode. This
should fix the bug as well as make FastISel faster because it bails to
SelectionDAG less often. See fastisel-ext.patch for this.
fastisel-ext-tests.patch changes current tests to always use reg-imm AND
for 8-bit zext instead of UXTB. This simplifies code since it is
supported on ARMv4t and later, and at least on A15 both should perform
exactly the same (both have exec 1 uop 1, type I).
2013-05-31-char-shift-crash.ll is a bitcode version of the above bug
16178 repro.
fast-isel-ext.ll tests all sext/zext combinations that ARM FastISel
should now handle.
Note that my ARM FastISel enabling patch was reverted due to a separate
failure when dealing with MCJIT, I'll fix this second failure and then
turn FastISel on again for non-iOS ARM targets.
I've tested "make check-all" on my x86 box, and "lnt test-suite" on A15
hardware.
llvm-svn: 183551
Fix an assertion when the compiler encounters big constants whose bit width is
not a multiple of 64-bits.
Although clang would never generate something like this, the backend should be
able to handle any legal IR.
<rdar://problem/13363576>
llvm-svn: 183544
OpenBSD's stack smashing protection differs slightly from other
platforms:
1. The smash handler function is "__stack_smash_handler(const char
*funcname)" instead of "__stack_chk_fail(void)".
2. There's a hidden "long __guard_local" object that gets linked
into each executable and DSO.
Patch by Matthew Dempsky.
llvm-svn: 183533
Add earlyclobber constaints to prevent input register being allocated as
the output register because, according to Intel spec [1], "If any pair
of the index, mask, or destination registers are the same, this
instruction results a UD fault."
---
[1] http://software.intel.com/sites/default/files/319433-014.pdf
llvm-svn: 183327
The ARM backend did not expect LDRBi12 to hold a constant pool operand.
Allow for LLVM to deal with the instruction similar to how it deals with
LDRi12.
This fixes PR16215.
llvm-svn: 183238
The MOV64ri64i32 instruction required hacky MCInst lowering because it
was allocated as setting a GR64, but the eventual instruction ("movl")
only set a GR32. This converts it into a so-called "MOV32ri64" which
still accepts a (appropriate) 64-bit immediate but defines a GR32.
This is then converted to the full GR64 by a SUBREG_TO_REG operation,
thus keeping everyone happy.
This fixes a typo in the opcode field of the original patch, which
should make the legact JIT work again (& adds test for that problem).
llvm-svn: 183068
Namely, check if the target allows to fold more that one register in the
addressing mode and if yes, adjust the cost accordingly.
Prior to this commit, reg1 + scale * reg2 accesses were artificially preferred
to reg1 + reg2 accesses. Indeed, the cost model wrongly assumed that reg1 + reg2
needs a temporary register for the computation, whereas it was correctly
estimated for reg1 + scale * reg2.
<rdar://problem/13973908>
llvm-svn: 183021
Unlike most -- hopefully "all other", but I'm still checking -- memory
instructions we support, LOAD REVERSED and STORE REVERSED may access
the memory location several times. This means that they are not suitable
for volatile loads and stores.
This patch is a prerequisite for better atomic load and store support.
The same principle applies there: almost all memory instructions we
support are inherently atomic ("block concurrent"), but LOAD REVERSED
and STORE REVERSED are exceptions.
Other instructions continue to allow volatile operands. I will add
positive "allows volatile" tests at the same time as the "allows atomic
load or store" tests.
llvm-svn: 183002
Now that 3.3 is branched, we are re-enabling virtual registers to help
iron out bugs before the next release. Some of the post-RA passes do
not play well with virtual registers, so we disable them for now. The
needed functionality of the PrologEpilogInserter pass is copied to a
new backend-specific NVPTXPrologEpilog pass.
The test for this commit is not breaking the existing tests.
llvm-svn: 182998
The MOV64ri64i32 instruction required hacky MCInst lowering because it was
allocated as setting a GR64, but the eventual instruction ("movl") only set a
GR32. This converts it into a so-called "MOV32ri64" which still accepts a
(appropriate) 64-bit immediate but defines a GR32. This is then converted to
the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy.
llvm-svn: 182991
The pattern the test originally checked for doesn't occur on other -mcpu
settings. On atom it's still there though slightly differently scheduled.
llvm-svn: 182933
This test was failing on some hosts when an unexpected register was used for a
variable. This just extends the regexp to allow the new x86-64 registers.
llvm-svn: 182929
Instead of having a bunch of separate MOV8r0, MOV16r0, ... pseudo-instructions,
it's better to use a single MOV32r0 (which will expand to "xorl %reg, %reg")
and obtain other sizes with EXTRACT_SUBREG and SUBREG_TO_REG. The encoding is
smaller and partial register updates can sometimes be avoided.
Until recently, this sequence was a barrier to rematerialization though. That
should now be fixed so it's an appropriate time to make the change.
llvm-svn: 182928
The code to distinguish between unaligned and aligned addresses was
already there, so this is mostly just a switch-on-and-test process.
llvm-svn: 182920
For COFF and MachO, sections semantically have relocations that apply to them.
That is not the case on ELF.
In relocatable objects (.o), a section with relocations in ELF has offsets to
another section where the relocations should be applied.
In dynamic objects and executables, relocations don't have an offset, they have
a virtual address. The section sh_info may or may not point to another section,
but that is not actually used for resolving the relocations.
This patch exposes that in the ObjectFile API. It has the following advantages:
* Most (all?) clients can handle this more efficiently. They will normally walk
all relocations, so doing an effort to iterate in a particular order doesn't
save time.
* llvm-readobj now prints relocations in the same way the native readelf does.
* probably most important, relocations that don't point to any section are now
visible. This is the case of relocations in the rela.dyn section. See the
updated relocation-executable.test for example.
llvm-svn: 182908
Fixes PR16146: gdb.base__call-ar-st.exp fails after
pre-RA-sched=source fixes.
Patch by Xiaoyi Guo!
This also fixes an unsupported dbg.value test case. Codegen was
previously incorrect but the test was passing by luck.
llvm-svn: 182885
FastISel was only enabled for iOS ARM and Thumb2, this patch enables it
for ARM (not Thumb2) on Linux and NaCl.
Thumb2 support needs a bit more work, mainly around register class
restrictions.
The patch punts to SelectionDAG when doing TLS relocation on non-Darwin
targets. I will fix this and other FastISel-to-SelectionDAG failures in
a separate patch.
The patch also forces FastISel to retain frame pointers: iOS always
keeps them for backtracking (so emitted code won't change because of
this), but Linux was getting much worse code that was incorrect when
using big frames (such as test-suite's lencod). I'll also fix this in a
later patch, it will probably require a peephole so that FastISel
doesn't rematerialize frame pointers back-to-back.
The test changes are straightforward, similar to:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130513/174279.html
They also add a vararg test that got dropped in that change.
I ran all of test-suite on A15 hardware with --optimize-option=-O0 and
all the tests pass.
llvm-svn: 182877
This allows rematerialization during register coalescing to handle
more cases involving operations like SUBREG_TO_REG which might need to
be rematerialized using sub-register indices.
For example, code like:
v1(GPR64):sub_32 = MOVZ something
v2(GPR64) = COPY v1(GPR64)
should be convertable to:
v2(GPR64):sub_32 = MOVZ something
but previously we just gave up in places like this
llvm-svn: 182872
This patch adds support for the CRJ and CGRJ instructions. Support for
the immediate forms will be a separate patch.
The architecture has a large number of comparison instructions. I think
it's generally better to concentrate on using the "best" comparison
instruction first and foremost, then only use something like CRJ if
CR really was the natual choice of comparison instruction. The patch
therefore opportunistically converts separate CR and BRC instructions
into a single CRJ while emitting instructions in ISelLowering.
llvm-svn: 182764
When -ffast-math is in effect (on Linux, at least), clang defines
__FINITE_MATH_ONLY__ > 0 when including <math.h>. This causes the
preprocessor to include <bits/math-finite.h>, which renames the sqrt functions.
For instance, "sqrt" is renamed as "__sqrt_finite".
This patch adds the 3 new names in such a way that they will be treated
as equivalent to their respective original names.
llvm-svn: 182739
When expanding unaligned Altivec loads, we use the decremented offset trick to
prevent page faults. Unfortunately, if we have a sequence of consecutive
unaligned loads, this leads to suboptimal code generation because the 'extra'
load from the first unaligned load can be combined with the base load from the
second (but only if the decremented offset trick is not used for the first).
Search up and down the chain, through loads and token factors, looking for
consecutive loads, and if one is found, don't use the offset reduction trick.
These duplicate loads are later combined to yield the desired sequence (in the
future, we might want a more-powerful chain search, but that will require some
changes to allow the combiner routines to access the AA object).
This should complete the initial implementation of the optimized unaligned
Altivec load expansion. There is some refactoring that should be done, but
that will happen when the unaligned store expansion is added.
llvm-svn: 182719
The lvsl permutation control instruction is a function only of the alignment of
the pointer operand (relative to the 16-byte natural alignment of Altivec
vectors). As a result, multiple lvsl intrinsics where the operands differ by a
multiple of 16 can be combined.
llvm-svn: 182708
Altivec only directly supports aligned loads, but the loads have a strange
property: If given an unaligned address, they truncate the address to the next
lower aligned address, and load from there. This property, along with an extra
load and some special-purpose permutation-control instructions that generate
the appropriate permutations from the original unaligned address, allow
efficient lowering of aligned loads. This code uses the trick explained in the
Apple Velocity Engine optimization overview document to prevent the needed
extra load from possibly causing a page fault if the original address happens
to be aligned.
As noted in the FIXMEs, there are several additional optimizations that can be
performed to reduce the cost of these loads even more. These will be
implemented in future commits.
llvm-svn: 182691
Other than recognizing the attribute, the patch does little else.
It changes the branch probability analyzer so that edges into
blocks postdominated by a cold function are given low weight.
Added analysis and code generation tests. Added documentation for the
new attribute.
llvm-svn: 182638
This implements the @llvm.readcyclecounter intrinsic as the specific
MRC instruction specified in the ARM manuals for CPUs with the Power
Management extensions.
Older CPUs had slightly different methods which may also have to be
implemented eventually, but this should cover all v7 cases.
rdar://problem/13939186
llvm-svn: 182603
Now that the LiveDebugVariables pass is running *after* register
coalescing, the ConnectedVNInfoEqClasses class needs to deal with
DBG_VALUE instructions.
This only comes up when rematerialization during coalescing causes the
remaining live range of a virtual register to separate into two
connected components.
llvm-svn: 182592
pic calls. These need to be there so we don't try and use helper
functions when we call those.
As part of this, make sure that we properly exclude helper functions in pic
mode when indirect calls are involved.
llvm-svn: 182343
By default, a teq instruction is inserted after integer divide. No divide-by-zero
checks are performed if option "-mnocheck-zero-division" is used.
llvm-svn: 182306
Before this change, the SystemZ backend would use BRCL for all branches
and only consider shortening them to BRC when generating an object file.
E.g. a branch on equal would use the JGE alias of BRCL in assembly output,
but might be shortened to the JE alias of BRC in ELF output. This was
a useful first step, but it had two problems:
(1) The z assembler isn't traditionally supposed to perform branch shortening
or branch relaxation. We followed this rule by not relaxing branches
in assembler input, but that meant that generating assembly code and
then assembling it would not produce the same result as going directly
to object code; the former would give long branches everywhere, whereas
the latter would use short branches where possible.
(2) Other useful branches, like COMPARE AND BRANCH, do not have long forms.
We would need to do something else before supporting them.
(Although COMPARE AND BRANCH does not change the condition codes,
the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction
during codegen, so that we can safely lower it to a separate compare
and long branch where necessary. This is not a valid transformation
for the assembler proper to make.)
This patch therefore moves branch relaxation to a pre-emit pass.
For now, calls are still shortened from BRASL to BRAS by the assembler,
although this too is not really the traditional behaviour.
The first test takes about 1.5s to run, and there are likely to be
more tests in this vein once further branch types are added. The feeling
on IRC was that 1.5s is a bit much for a single test, so I've restricted
it to SystemZ hosts for now.
The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests.
A later patch will remove the {{g}}s from that directory.
llvm-svn: 182274
This converter currently only handles global variables in address space 0. For
these variables, they are promoted to address space 1 (global memory), and all
uses are updated to point to the result of a cvta.global instruction on the new
variable.
The motivation for this is address space 0 global variables are illegal since we
cannot declare variables in the generic address space. Instead, we place the
variables in address space 1 and explicitly convert the pointer to address
space 0. This is primarily intended to help new users who expect to be able to
place global variables in the default address space.
llvm-svn: 182254
Introduction:
In case when stack alignment is 8 and GPRs parameter part size is not N*8:
we add padding to GPRs part, so part's last byte must be recovered at
address K*8-1.
We need to do it, since remained (stack) part of parameter starts from
address K*8, and we need to "attach" "GPRs head" without gaps to it:
Stack:
|---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8 bytes...
[ [padding] [GPRs head] ] [ ------ Tail passed via stack ------ ...
FIX:
Note, once we added padding we need to correct *all* Arg offsets that are going
after padded one. That's why we need this fix: Arg offsets were never corrected
before this patch. See new test-cases included in patch.
We also don't need to insert padding for byval parameters that are stored in GPRs
only. We need pad only last byval parameter and only in case it outsides GPRs
and stack alignment = 8.
Though, stack area, allocated for recovered byval params, must satisfy
"Size mod 8 = 0" restriction.
This patch reduces stack usage for some cases:
We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be
"packed" with alignment 4 in some cases.
llvm-svn: 182237
We don't need to reject all inline asm as using the counter register (most does
not). Only those that explicitly clobber the counter register need to prevent
the transformation.
llvm-svn: 182191
The peephole tries to reorder MOV32r0 instructions such that they are
before the instruction that modifies EFLAGS.
The problem is that the peephole does not consider the case where the
instruction that modifies EFLAGS also depends on the previous state of
EFLAGS.
Instead, walk backwards until we find an instruction that has a def for
EFLAGS but does not have a use.
If we find such an instruction, insert the MOV32r0 before it.
If it cannot find such an instruction, skip the optimization.
llvm-svn: 182184
This patch matches GCC behavior: the code used to only allow unaligned
load/store on ARM for v6+ Darwin, it will now allow unaligned load/store
for v6+ Darwin as well as for v7+ on Linux and NaCl.
The distinction is made because v6 doesn't guarantee support (but LLVM
assumes that Apple controls hardware+kernel and therefore have
conformant v6 CPUs), whereas v7 does provide this guarantee (and
Linux/NaCl behave sanely).
The patch keeps the -arm-strict-align command line option, and adds
-arm-no-strict-align. They behave similarly to GCC's -mstrict-align and
-mnostrict-align.
I originally encountered this discrepancy in FastIsel tests which expect
unaligned load/store generation. Overall this should slightly improve
performance in most cases because of reduced I$ pressure.
llvm-svn: 182175
Dot4 now uses 8 scalar operands instead of 2 vectors one which allows register
coalescer to remove some unneeded COPY.
This patch also defines some structures/functions that can be used to handle
every vector instructions (CUBE, Cayman special instructions...) in a similar
fashion.
llvm-svn: 182126
Almost all instructions that takes a 128 bits reg as input (fetch, export...)
have the abilities to swizzle their argument and output. Instead of printing
default swizzle for each 128 bits reg, rename T*.XYZW to T* and let instructions
print potentially optimized swizzles themselves.
llvm-svn: 182124
Shuffles that only move an element into position 0 of the vector are common in
the output of the loop vectorizer and often generate suboptimal code when SSSE3
is not available. Lower them to vector shifts if possible.
We still prefer palignr over psrldq because it has higher throughput on
sandybridge.
llvm-svn: 182102
Previously, three instructions were needed:
trunc.w.s $f0, $f2
mfc1 $4, $f0
sw $4, 0($2)
Now we need only two:
trunc.w.s $f0, $f2
swc1 $f0, 0($2)
llvm-svn: 182053
Some IR-level instructions (such as FP <-> i64 conversions) are not chained
w.r.t. the mtctr intrinsic and yet may become function calls that clobber the
counter register. At the selection-DAG level, these might be reordered with the
mtctr intrinsic causing miscompiles. To avoid this situation, if an existing
preheader has instructions that might use the counter register, create a new
preheader for the mtctr intrinsic. This extra block will be remerged with the
old preheader at the MI level, but will prevent unwanted reordering at the
selection-DAG level.
llvm-svn: 182045
This is the second part of the change to always return "true"
offset values from getPreIndexedAddressParts, tackling the
case of "memrix" type operands.
This is about instructions like LD/STD that only have a 14-bit
field to encode immediate offsets, which are implicitly extended
by two zero bits by the machine, so that in effect we can access
16-bit offsets as long as they are a multiple of 4.
The PowerPC back end currently handles such instructions by
carrying the 14-bit value (as it will get encoded into the
actual machine instructions) in the machine operand fields
for such instructions. This means that those values are
in fact not the true offset, but rather the offset divided
by 4 (and then truncated to an unsigned 14-bit value).
Like in the case fixed in r182012, this makes common code
operations on such offset values not work as expected.
Furthermore, there doesn't really appear to be any strong
reason why we should encode machine operands this way.
This patch therefore changes the encoding of "memrix" type
machine operands to simply contain the "true" offset value
as a signed immediate value, while enforcing the rules that
it must fit in a 16-bit signed value and must also be a
multiple of 4.
This change must be made simultaneously in all places that
access machine operands of this type. However, just about
all those changes make the code simpler; in many cases we
can now just share the same code for memri and memrix
operands.
llvm-svn: 182032