Commit Graph

56768 Commits

Author SHA1 Message Date
Adhemerval Zanella ef206f19a4 PowerPC: add EmitTCEntry class for TOC creation
This patch replaces the EmitRawText by a EmitTCEntry class (specialized for
each Streamer) in PowerPC64 TOC entry creation.

llvm-svn: 165940
2012-10-15 15:43:14 +00:00
Kostya Serebryany b0e2506d97 [asan] make AddressSanitizer to be a FunctionPass instead of ModulePass. This will simplify chaining other FunctionPasses with asan. Also some minor cleanup
llvm-svn: 165936
2012-10-15 14:20:06 +00:00
Chandler Carruth 49c8eea3c0 Update the memcpy rewriting to fully support widened int rewriting. This
includes extracting ints for copying elsewhere and inserting ints when
copying into the alloca. This should fix the CanSROA assertion coming
out of Clang's regression test suite.

llvm-svn: 165931
2012-10-15 10:24:43 +00:00
Chandler Carruth 9d966a2002 Follow-up fix to r165928: handle memset rewriting for widened integers,
and generally clean up the memset handling. It had rotted a bit as the
other rewriting logic got polished more.

llvm-svn: 165930
2012-10-15 10:24:40 +00:00
Silviu Baranga b14097000b Fixed PR13938: the ARM backend was crashing because it couldn't select a VDUPLANE node with the vector input size different from the output size. This was bacause the BUILD_VECTOR lowering code didn't check that the size of the input vector was correct for using VDUPLANE.
llvm-svn: 165929
2012-10-15 09:41:32 +00:00
Chandler Carruth 435c4e0792 First major step toward addressing PR14059. This teaches SROA to handle
cases where we have partial integer loads and stores to an otherwise
promotable alloca to widen[1] those loads and stores to cover the entire
alloca and bitcast them into the appropriate type such that promotion
can proceed.

These partial loads and stores stem from an annoying confluence of ARM's
calling convention and ABI lowering and the FCA pre-splitting which
takes place in SROA. Clang lowers a { double, double } in-register
function argument as a [4 x i32] function argument to ensure it is
placed into integer 32-bit registers (a really unnerving implicit
contract between Clang and the ARM backend I would add). This results in
a FCA load of [4 x i32]* from the { double, double } alloca, and SROA
decomposes this into a sequence of i32 loads and stores. Inlining
proceeds, code gets folded, but at the end of the day, we still have i32
stores to the low and high halves of a double alloca. Widening these to
be i64 operations, and bitcasting them to double prior to loading or
storing allows promotion to proceed for these allocas.

I looked quite a bit changing the IR which Clang produces for this case
to be more friendly, but small changes seem unlikely to help. I think
the best representation we could use currently would be to pass 4 i32
arguments thereby avoiding any FCAs, but that would still require this
fix. It seems like it might eventually be nice to somehow encode the ABI
register selection choices outside of the parameter type system so that
the parameter can be a { double, double }, but the CC register
annotations indicate that this should be passed via 4 integer registers.

This patch does not address the second problem in PR14059, which is the
reverse: when a struct alloca is loaded as a *larger* single integer.

This patch also does not address some of the code quality issues with
the FCA-splitting. Those don't actually impede any optimizations really,
but they're on my list to clean up.

[1]: Pedantic footnote: for those concerned about memory model issues
here, this is safe. For the alloca to be promotable, it cannot escape or
have any use of its address that could allow these loads or stores to be
racing. Thus, widening is always safe.

llvm-svn: 165928
2012-10-15 08:40:30 +00:00
Chandler Carruth aa6afbb831 Hoist the canConvertValue predicate and the convertValue transform out
into static helper functions. They're really quite generic and are going
to be needed elsewhere shortly.

llvm-svn: 165927
2012-10-15 08:40:22 +00:00
Bill Wendling fbd38fe2e3 Add an enum for the return and function indexes into the AttrListPtr object. This gets rid of some magic numbers.
llvm-svn: 165924
2012-10-15 07:29:08 +00:00
Bill Wendling 79d45dbbf9 Use a ::get method to create the attribute from Attributes::AttrVals instead of a constructor.
llvm-svn: 165923
2012-10-15 06:53:28 +00:00
Bill Wendling 8c3e65db52 Move the AttributesImpl header file into the VMCore directory so that it can be opaque.
llvm-svn: 165920
2012-10-15 05:40:12 +00:00
Bill Wendling d079a446d7 Attributes Rewrite
Convert the internal representation of the Attributes class into a pointer to an
opaque object that's uniqued by and stored in the LLVMContext object. The
Attributes class then becomes a thin wrapper around this opaque
object. Eventually, the internal representation will be expanded to include
attributes that represent code generation options, etc.

llvm-svn: 165917
2012-10-15 04:46:55 +00:00
Meador Inge 40b6fac36c instcombine: Migrate strcmp and strncmp optimizations
This patch migrates the strcmp and strncmp optimizations from the
simplify-libcalls pass into the instcombine library call simplifier.

llvm-svn: 165915
2012-10-15 03:47:37 +00:00
Benjamin Kramer c5b0678cf8 Simplify code. No functionality change.
llvm-svn: 165904
2012-10-14 11:15:42 +00:00
Benjamin Kramer 650b1dbd56 Unquadratize SetVector removal loops in DSE.
Erasing from the beginning or middle of the vector is expensive, remove_if can
do it in linear time even though it's a bit ugly without lambdas.

No functionality change.

llvm-svn: 165903
2012-10-14 10:21:31 +00:00
Bill Wendling 5ae76a379c Remove dead methods.
llvm-svn: 165902
2012-10-14 09:21:44 +00:00
Bill Wendling 76d2cd2f60 Remove operator cast method in favor of querying with the correct method.
llvm-svn: 165899
2012-10-14 08:54:26 +00:00
Benjamin Kramer 6bbdf70818 Fix use after free when deleting attributes in a chained folding set.
Can't follow the intrusive linked list when the element is gone.

llvm-svn: 165898
2012-10-14 08:48:40 +00:00
Bill Wendling a0f3e8d1cb Don't use the new syntax just yet.
llvm-svn: 165897
2012-10-14 08:25:35 +00:00
Bill Wendling 2a3c1cca7d Remove the bitwise AND operators from the Attributes class. Replace it with the equivalent from the builder class.
llvm-svn: 165896
2012-10-14 07:52:48 +00:00
Bill Wendling 722b26c0f2 Remove the bitwise assignment OR operator from the Attributes class. Replace it with the equivalent from the builder class.
llvm-svn: 165895
2012-10-14 07:35:59 +00:00
Bill Wendling 5c407ed3ab Remove the bitwise OR operator from the Attributes class. Replace it with the equivalent from the builder class.
llvm-svn: 165894
2012-10-14 07:17:34 +00:00
Bill Wendling a05b043c4a Remove the bitwise XOR operator from the Attributes class. Replace it with the equivalent from the builder class.
llvm-svn: 165893
2012-10-14 06:56:13 +00:00
Bill Wendling 85a64c217f Remove the bitwise NOT operator from the Attributes class. Replace it with the equivalent from the builder class.
llvm-svn: 165892
2012-10-14 06:39:53 +00:00
Bill Wendling 1fcc82225a Decode the LLVM attributes from bitcode using the attributes builder.
llvm-svn: 165891
2012-10-14 04:10:01 +00:00
Bill Wendling abd5ba2523 Use builder to create alignment attributes. Remove dead function.
llvm-svn: 165890
2012-10-14 03:58:29 +00:00
Bill Wendling 9e1eb4d1b9 Don't pass in an Attributes object to something that expects an integral value.
llvm-svn: 165887
2012-10-14 03:27:15 +00:00
Benjamin Kramer 44e58f9eb1 Remove unused private field.
llvm-svn: 165881
2012-10-13 18:03:34 +00:00
Benjamin Kramer 35480284e7 X86: Disable long nops for all cpus prior to pentiumpro/i686.
llvm-svn: 165878
2012-10-13 17:28:35 +00:00
Jakob Stoklund Olesen ea82bd7f0d Drop <def,dead> flags when merging into an unused lane.
The new coalescer can merge a dead def into an unused lane of an
otherwise live vector register.

Clear the <dead> flag when that happens since the flag refers to the
full virtual register which is still live after the partial dead def.

This fixes PR14079.

llvm-svn: 165877
2012-10-13 17:26:47 +00:00
Meador Inge 174185084c instcombine: Migrate strchr and strrchr optimizations
This patch migrates the strchr and strrchr optimizations from the
simplify-libcalls pass into the instcombine library call simplifier.

llvm-svn: 165875
2012-10-13 16:45:37 +00:00
Meador Inge 7fb2f7378b instcombine: Migrate strcat and strncat optimizations
This patch migrates the strcat and strncat optimizations from the
simplify-libcalls pass into the instcombine library call simplifier.

llvm-svn: 165874
2012-10-13 16:45:32 +00:00
Meador Inge df796f893f Implement new LibCallSimplifier class
This patch implements the new LibCallSimplifier class as outlined in [1].
In addition to providing the new base library simplification infrastructure,
all the fortified library call simplifications were moved over to the new
infrastructure.  The rest of the library simplification optimizations will
be moved over with follow up patches.

NOTE: The original fortified library call simplifier located in the
SimplifyFortifiedLibCalls class was not removed because it is still
used by CodeGenPrepare.  This class will eventually go away too.

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052283.html

llvm-svn: 165873
2012-10-13 16:45:24 +00:00
Jakob Stoklund Olesen 2f6dfc7d0b Allow for loops in LiveIntervals::pruneValue().
It is possible that the live range of the value being pruned loops back
into the kill MBB where the search started. When that happens, make sure
that the beginning of KillMBB is also pruned.

Instead of starting a DFS at KillMBB and skipping the root of the
search, start a DFS at each KillMBB successor, and allow the search to
loop back to KillMBB.

This fixes PR14078.

llvm-svn: 165872
2012-10-13 16:15:31 +00:00
Benjamin Kramer ecd15d7f6c X86: Fix accidentally swapped operands.
llvm-svn: 165871
2012-10-13 12:50:19 +00:00
Chandler Carruth ba9319925e Teach SROA to cope with wrapper aggregates. These show up a lot in ABI
type coercion code, especially when targetting ARM. Things like [1
x i32] instead of i32 are very common there.

The goal of this logic is to ensure that when we are picking an alloca
type, we look through such wrapper aggregates and across any zero-length
aggregate elements to find the simplest type possible to form a type
partition.

This logic should (generally speaking) rarely fire. It only ends up
kicking in when an alloca is accessed using two different types (for
instance, i32 and float), and the underlying alloca type has wrapper
aggregates around it. I noticed a significant amount of this occurring
looking at stepanov_abstraction generated code for arm, and suspect it
happens elsewhere as well.

Note that this doesn't yet address truly heinous IR productions such as
PR14059 is concerning. Those result in mismatched *sizes* of types in
addition to mismatched access and alloca types.

llvm-svn: 165870
2012-10-13 10:49:33 +00:00
Chandler Carruth 482c61787c Speculatively harden the conversion logic. I have no idea if this will
help the dragonegg builders, and no test case at this point, but this
was one dimly plausible case I spotted by inspection. Hopefully will get
a testcase from those bots soon-ish, and will tidy this up with proper
testing.

llvm-svn: 165869
2012-10-13 10:49:30 +00:00
Benjamin Kramer d6b9362fc2 X86: Promote i8 cmov when both operands are coming from truncates of the same width.
X86 doesn't have i8 cmovs so isel would emit a branch. Emitting branches at this
level is often not a good idea because it's too late for many optimizations to
kick in. This solution doesn't add any extensions (truncs are free) and tries
to avoid introducing partial register stalls by filtering direct copyfromregs.

I'm seeing a ~10% speedup on reading a random .png file with libpng15 via
graphicsmagick on x86_64/westmere, but YMMV depending on the microarchitecture.

llvm-svn: 165868
2012-10-13 10:39:49 +00:00
Chandler Carruth 0fb8a7787e Silence a warning in -assert builds.
llvm-svn: 165867
2012-10-13 05:09:27 +00:00
Chandler Carruth 891fec0b56 Clean up how we rewrite loads and stores to the whole alloca. When these
are single value types, the load and store should be directly based upon
the alloca and then bitcasting can fix the type as needed afterward.
This might in theory improve some of the IR coming out of SROA, but
I don't expect big changes yet and don't have any test cases on hand.
This is really just a cleanup/refactoring patch. The next patch will
cause this code path to be hit a lot more, actually get SROA to promote
more allocas and include several more test cases.

llvm-svn: 165864
2012-10-13 02:41:05 +00:00
Chad Rosier 4996355592 [ms-inline asm] Remove the MatchInstruction() function. Previously, this was
the interface between the front-end and the MC layer when parsing inline
assembly.  Unfortunately, this is too deep into the parsing stack. Specifically,
we're unable to handle target-independent assembly (i.e., assembly directives,
labels, etc.).  Note the MatchAndEmitInstruction() isn't the correct
abstraction either.  I'll be exposing target-independent hooks shortly, so this
is really just a cleanup.

llvm-svn: 165858
2012-10-13 00:26:04 +00:00
Andrew Kaylor 4732872bd2 Check section type rather than assuming it's code when emitting sections while processing relocations.
llvm-svn: 165854
2012-10-12 23:53:16 +00:00
Manman Ren 7e48b252e7 ARM: tail-call inside a function where part of a byval argument is on caller's
local frame causes problem.

For example:
void f(StructToPass s) {
  g(&s, sizeof(s));
}
will cause problem with tail-call since part of s is passed via registers and
saved in f's local frame. When g tries to access s, part of s may be corrupted
since f's local frame is popped out before the tail-call.

The current fix is to disable tail-call if getVarArgsRegSaveSize is not 0 for
the caller. This is a conservative approach, if we can prove the address of
s or part of s is not taken and passed to g, it should be okay to perform
tail-call.

rdar://12442472

llvm-svn: 165853
2012-10-12 23:39:43 +00:00
Chad Rosier 4453e8453e [ms-inline asm] Capitalize per coding standard.
llvm-svn: 165847
2012-10-12 23:09:25 +00:00
Jim Grosbach 30af442a84 ARM: Mark VSELECT as 'expand'.
The backend already pattern matches to form VBSL when it can. We may want to
teach it to use the vbsl intrinsics at some point to prevent machine licm from
mucking with this, but using the Expand is completely correct.

http://llvm.org/bugs/show_bug.cgi?id=13831
http://llvm.org/bugs/show_bug.cgi?id=13961

Patch by Peter Couperus <peter.couperus@st.com>.

llvm-svn: 165845
2012-10-12 22:59:21 +00:00
Chad Rosier 2f480a8a50 [ms-inline asm] Use the new API introduced in r165830 in lieu of the
MapAndConstraints vector.  Also remove the unused Kind argument.

llvm-svn: 165833
2012-10-12 22:53:36 +00:00
Jakob Stoklund Olesen 1a87a29d08 Use a transposed algorithm for handleMove().
Completely update one interval at a time instead of collecting live
range fragments to be updated. This avoids building data structures,
except for a single SmallPtrSet of updated intervals.

Also share code between handleMove() and handleMoveIntoBundle().

Add support for moving dead defs across other live values in the
interval. The MI scheduler can do that.

llvm-svn: 165824
2012-10-12 21:31:57 +00:00
Jakob Stoklund Olesen 1a3eb878f6 Fix coalescing with IMPLICIT_DEF values.
PHIElimination inserts IMPLICIT_DEF instructions to guarantee that all
PHI predecessors have a live-out value. These IMPLICIT_DEF values are
not considered to be real interference when coalescing virtual
registers:

  %vreg1 = IMPLICIT_DEF
  %vreg2 = MOV32r0

When joining %vreg1 and %vreg2, the IMPLICIT_DEF instruction and its
value number should simply be erased since the %vreg2 value number now
provides a live-out value for the PHI predecesor block.

llvm-svn: 165813
2012-10-12 18:03:04 +00:00
Ulrich Weigand 9aa51d1a2c Fix big-endian codegen bug in DAGTypeLegalizer::ExpandRes_BITCAST
On PowerPC, a bitcast of <16 x i8> to i128 may run through a code
path in ExpandRes_BITCAST that attempts to do an intermediate
bitcast to a <4 x i32> vector, and then construct the Hi and Lo parts
of the resulting i128 by pairing up two of those i32 vector elements
each.  The code already recognizes that on a big-endian system, the
first two vector elements form the Hi part, and the final two vector
elements form the Lo part (vice-versa from the little-endian situation).

However, we also need to take endianness into account when forming each
of those separate pairs:  on a big-endian system, vector element 0 is
the *high* part of the pair making up the Hi part of the result, and
vector element 1 is the low part of the pair.  The code currently always
uses vector element 0 as the low part and vector element 1 as the high
part, as is appropriate for little-endian platforms only.

This patch fixes this by swapping the vector elements as they are
paired up as appropriate.

llvm-svn: 165802
2012-10-12 15:42:58 +00:00
Duncan Sands d5772de0eb Add powerpc-ibm-aix to Triple. Patch by Kai.
llvm-svn: 165792
2012-10-12 11:08:57 +00:00
Eric Christopher ca2ff70eb8 Indenting.
llvm-svn: 165785
2012-10-12 02:04:47 +00:00