This patch fixes the rldcl/rldicl/rldicr instruction emission. The issue is
the MDForm_1 instruction defines the PowerISA MB field from 'rldicl'
with the name MBE, but RLDCL/RLDICL/RLDICR definition uses as 'MB'.
It end up by generatint the 'rldicl' enconding at
'lib/Target/PowerPC/PPCGenMCCodeEmitter.inc' to use the fourth argument as the
third. The patch changes it by adjusting to use the fourth argument as
intended.
Fixes PR14180.
llvm-svn: 166770
instructions in a block. GetUnderlyingObject is more expensive than it looks as
it can, for instance, call SimplifyInstruction.
This might have some behavioural changes in odd corner cases, but only because
of some strange artefacts of the original implementation. If you were relying
on those, we can fix that by replacing this with a smarter algorithm. Change
passes the existing tests.
llvm-svn: 166754
As discussed on IRC, add VectorTargetTransform::getNumberOfParts
to provide a stable interface to the vector legalization splitting factor.
llvm-svn: 166751
This is needed so that perl's SHA can be compiled (otherwise
BBVectorize takes far too long to find its fixed point).
I'll try to come up with a reduced test case.
llvm-svn: 166738
This is the first of several steps to incorporate information from the new
TargetTransformInfo infrastructure into BBVectorize. Two things are done here:
1. Target information is used to determine if it is profitable to fuse two
instructions. This means that the cost of the vector operation must not
be more expensive than the cost of the two original operations. Pairs that
are not profitable are no longer considered (because current cost information
is incomplete, for intrinsics for example, equal-cost pairs are still
considered).
2. The 'cost savings' computed for the profitability check are also used to
rank the DAGs that represent the potential vectorization plans. Specifically,
for nodes of non-trivial depth, the cost savings is used as the node
weight.
The next step will be to incorporate the shuffle costs into the DAG weighting;
this will give the edges of the DAG weights as well. Once that is done, when
target information is available, we should be able to dispense with the
depth heuristic.
llvm-svn: 166716
Most places can use PrintFatalError as the unwinding mechanism was not
used for anything other than printing the error. The single exception
was CodeGenDAGPatterns.cpp, where intermediate errors during type
resolution were ignored to simplify incremental platform development.
This use is replaced by an error flag in TreePattern and bailout earlier
in various places if it is set.
llvm-svn: 166712
The isValueEqualityComparison() guard at the top of SimplifySwitch()
only applies to some of the possible transformations.
The newer transformations work just fine on large switches, and the
check on predecessor count is nonsensical.
llvm-svn: 166710
and also fixes the R_PPC64_TOC16 and R_PPC64_TOC16_DS relocation offset.
The 'nop' is needed so a restore TOC instruction (ld r2,40(r1)) can be placed
by the linker to correct restore the TOC of previous function.
Current code has two issues: it defines in PPCInstr64Bit.td file a LDinto_toc
and LDtoc_restore as a DSForm_1 with DS_RA=0 where it should be
DS=2 (the 8 bytes displacement of the TOC saving). It also wrongly emits a
MC intruction using an uint32_t value while the PPC::BL8_NOP_ELF
and PPC::BLA8_NOP_ELF are both uint64_t (because of the following 'nop').
This patch corrects the remaining ExecutionEngine using MCJIT:
ExecutionEngine/2002-12-16-ArgTest.ll
ExecutionEngine/2003-05-07-ArgumentTest.ll
ExecutionEngine/2005-12-02-TailCallBug.ll
ExecutionEngine/hello.ll
ExecutionEngine/hello2.ll
ExecutionEngine/test-call.ll
llvm-svn: 166682
structs having size 3, 5, 6, or 7. Such a struct must be passed and received
as right-justified within its register or memory slot. The problem is only
present for structs that are passed in registers.
Previously, as part of a patch handling all structs of size less than 8, I
added logic to rotate the incoming register so that the struct was left-
justified prior to storing the whole register. This was incorrect because
the address of the parameter had already been adjusted earlier to point to
the right-adjusted value in the storage slot. Essentially I had accidentally
accounted for the right-adjustment twice.
In this patch, I removed the incorrect logic and reorganized the code to make
the flow clearer.
The removal of the rotates changes the expected code generation, so test case
structsinregs.ll has been modified to reflect this. I also added a new test
case, jaggedstructs.ll, to demonstrate that structs of these sizes can now
be properly received and passed.
I've built and tested the code on powerpc64-unknown-linux-gnu with no new
regressions. I also ran the GCC compatibility test suite and verified that
earlier problems with these structs are now resolved, with no new regressions.
llvm-svn: 166680
This patch adds initial PPC64 TOC MC object creation using the small mcmodel
(a single 64K TOC) adding the some TOC relocations (R_PPC64_TOC,
R_PPC64_TOC16, and R_PPC64_TOC16DS).
The addition of 'undefinedExplicitRelSym' hook on 'MCELFObjectTargetWriter'
is meant to avoid the creation of an unreferenced ".TOC." symbol (used in
the .odp creation) as well to set the R_PPC64_TOC relocation target as the
temporary ".TOC." symbol. On PPC64 ABI, the R_PPC64_TOC relocation should
not point to any symbol.
llvm-svn: 166677
smaller integer loads and stores.
The high-level motivation is that the frontend sometimes generates
a single whole-alloca integer load or store during ABI lowering of
splittable allocas. We need to be able to break this apart in order to
see the underlying elements and properly promote them to SSA values. The
hope is that this fixes some performance regressions on x86-32 with the
new SROA pass.
Unfortunately, this causes quite a bit of churn in the test cases, and
bloats some IR that comes out. When we see an alloca that consists soley
of bits and bytes being extracted and re-inserted, we now do some
splitting first, before building widened integer "bucket of bits"
representations. These are always well folded by instcombine however, so
this shouldn't actually result in missed opportunities.
If this splitting of all-integer allocas does cause problems (perhaps
due to smaller SSA values going into the RA), we could potentially go to
some extreme measures to only do this integer splitting trick when there
are non-integer component accesses of an alloca, but discovering this is
quite expensive: it adds yet another complete walk of the recursive use
tree of the alloca.
Either way, I will be watching build bots and LNT bots to see what
fallout there is here. If anyone gets x86-32 numbers before & after this
change, I would be very interested.
llvm-svn: 166662
[register].field
The operator returns the value at the location pointed to by register plus the
offset of field within its structure or union. This patch only handles
immediate fields (i.e., [eax].4). The original displacement has to be a
MCConstantExpr as well.
Part of rdar://12470415 and rdar://12470514
llvm-svn: 166632
into a sbc with a positive number, the immediate should be complemented, not
negated. Also added a missing pattern for ARM codegen.
rdar://12559385
llvm-svn: 166613
When the trip count is -1, getSmallConstantTripMultiple could return zero,
and this would cause runtime loop unrolling to assert. Instead of returning
zero, one is now returned (consistent with the existing overflow cases).
Fixes PR14167.
llvm-svn: 166612
see the offsetof operator. Previously, we were matching something like MOVrm
in the front-end and later matching MOVrr in the back-end. This change makes
things more consistent. It also fixes cases where we can't match against a
memory operand as the source (test cases coming).
Part of rdar://12470317
llvm-svn: 166592
- If more than 1 elemennts are defined and target supports the vectorized
conversion, use the vectorized one instead to reduce the strength on
conversion operation.
llvm-svn: 166546
- As there's no 64-bit GPRs in 32-bit mode, a custom conversion from v2u32 to
v2f32 is added to improve the efficiency of the code generated.
llvm-svn: 166545
the difference from "int x" (which should go in registers and
"struct y {int x;}" (which should not).
Clang will be updated in the next patches.
llvm-svn: 166536
called. Provide an (asserting) definition of Operator's private destructor.
Remove destructors from all classes derived from Operator. We don't need them
for safety, because their implicit definitions would be ill-formed (they'd call
Operator's private destructor), and we don't need them to avoid emitting
vtables, because we don't do anything with Operator subclasses which would
trigger vtable instantiation.
The Operator hierarchy is still a complete disaster with regard to undefined
behavior, but this at least allows LLVM to link when using Clang's
-fcatch-undefined-behavior with a new vptr-based type checking mechanism.
llvm-svn: 166530
non-zero value as we don't know the actual value at this point. This is
necessary to get the matching correct in some cases. However, the actual value
set as the base register doesn't matter, since we're just matching not emitting.
llvm-svn: 166523
loads. It's not really profitable and may result in GVN going into an infinite
loop when it hits constructs like this:
%x = gep %some.type %x, ...
Found via an LTO build of LLVM.
llvm-svn: 166490
- Replace v4i8/v8i8 -> v8f32 DAG combine with custom lowering to reduce
DAG combine overhead.
- Extend the support to v4i16/v8i16 as well.
llvm-svn: 166487
for the PowerPC target, and factoring the results. This will ease future
maintenance of both subtargets.
PPCTargetLowering::LowerCall_Darwin_Or_64SVR4() has grown a lot of special-case
code for the different ABIs, making maintenance difficult. This is getting
worse as we repair errors in the 64-bit ELF ABI implementation, while avoiding
changes to the Darwin ABI logic. This patch splits the routine into
LowerCall_Darwin() and LowerCall_64SVR4(), allowing both versions to be
significantly simplified. I've factored out chunks of similar code where it
made sense to do so. I also performed similar factoring on
LowerFormalArguments_Darwin() and LowerFormalArguments_64SVR4().
There are no functional changes in this patch, and therefore no new test
cases have been developed.
Built and tested on powerpc64-unknown-linux-gnu with no new regressions.
llvm-svn: 166480
%V = mul i64 %N, 4
%t = getelementptr i8* bitcast (i32* %arr to i8*), i32 %V
into
%t1 = getelementptr i32* %arr, i32 %N
%t = bitcast i32* %t1 to i8*
incorporating the multiplication into the getelementptr.
This happens all the time in dragonegg, for example for
int foo(int *A, int N) {
return A[N];
}
because gcc turns this into byte pointer arithmetic before it hits the plugin:
D.1590_2 = (long unsigned int) N_1(D);
D.1591_3 = D.1590_2 * 4;
D.1592_5 = A_4(D) + D.1591_3;
D.1589_6 = *D.1592_5;
return D.1589_6;
The D.1592_5 line is a POINTER_PLUS_EXPR, which is turned into a getelementptr
on a bitcast of A_4 to i8*, so this becomes exactly the kind of IR that the
transform fires on.
An analogous transform (with no testcases!) already existed for bitcasts of
arrays, so I rewrote it to share code with this one.
llvm-svn: 166474
The CFG of the machine function needs to know that the targets of the indirect
branch are successors to the indirect branch.
<rdar://problem/12529625>
llvm-svn: 166448
Per the October 12, 2012 Proposal for annotated disassembly output sent out by
Jim Grosbach this set of changes implements this for X86 and arm. The llvm-mc
tool now has a -mdis option to produced the marked up disassembly and a couple
of small example test cases have been added.
rdar://11764962
llvm-svn: 166445
deterministic, replace it with a DenseMap<std::pair<unsigned, unsigned>,
PHINode*> (we already have a map from BasicBlock to unsigned).
<rdar://problem/12541389>
llvm-svn: 166435
Unreachable blocks can have invalid instructions. For example,
jump threading can produce self-referential instructions in
unreachable blocks. Also, we should not be spending time
optimizing unreachable code. Fixes PR14133.
llvm-svn: 166423
Patch by Quentin Colombet <qcolombet@apple.com>
Original description:
"""
The attached patch is the first step to have a better control on Oz related optimizations.
The Oz optimization level focuses on code size, thus I propose to add an attribute called ForceSizeOpt.
"""
llvm-svn: 166422
very small but very important bugfix:
bool shouldExplore(Use *U) {
Value *V = U->get();
if (isa<CallInst>(V) || isa<InvokeInst>(V))
[...]
should have read:
bool shouldExplore(Use *U) {
Value *V = U->getUser();
if (isa<CallInst>(V) || isa<InvokeInst>(V))
Fixes PR14143!
llvm-svn: 166407