Commit Graph

10863 Commits

Author SHA1 Message Date
Arnold Schwaighofer 77af0f6e82 ARM cost model: Account for zero cost scalar SROA instructions
By vectorizing a series of srl, or, ... instructions we have obfuscated the
intention so much that the backend does not know how to fold this code away.

radar://15336950

llvm-svn: 193573
2013-10-29 01:33:53 +00:00
Arnold Schwaighofer 86252451c4 SLPVectorizer: Use vector type for vectorized memory operations
No test case, because with the current cost model we don't see a difference.
An upcoming ARM memory cost model change will expose and test this bug.

radar://15332579

llvm-svn: 193572
2013-10-29 01:33:50 +00:00
Shuxin Yang 2e1890e18b Revert r193251 : Use address-taken to disambiguate global variable and indirect memops.
llvm-svn: 193489
2013-10-27 03:08:44 +00:00
Wan Xiaofei be640b28c0 Quick look-up for block in loop.
This patch implements quick look-up for block in loop by maintaining a hash set for blocks.
It improves the efficiency of loop analysis a lot, the biggest improvement could be 5-6%(458.sjeng).
Below are the compilation time for our benchmark in llc before & after the patch.

Benchmark	llc - trunk		llc - patched	
401.bzip2	0.339081	100.00%	0.329657	102.86%
403.gcc		19.853966	100.00%	19.605466	101.27%
429.mcf		0.049823	100.00%	0.048451	102.83%
433.milc	0.514898	100.00%	0.510217	100.92%
444.namd	1.109328	100.00%	1.103481	100.53%
445.gobmk	4.988028	100.00%	4.929114	101.20%
456.hmmer	0.843871	100.00%	0.825865	102.18%
458.sjeng	0.754238	100.00%	0.714095	105.62%
464.h264ref	2.9668		100.00%	2.90612		102.09%
471.omnetpp	4.556533	100.00%	4.511886	100.99%
bitmnp01	0.038168	100.00%	0.0357		106.91%
idctrn01	0.037745	100.00%	0.037332	101.11%
libquake2	3.78689		100.00%	3.76209		100.66%
libquake_	2.251525	100.00%	2.234104	100.78%
linpack		0.033159	100.00%	0.032788	101.13%
matrix01	0.045319	100.00%	0.043497	104.19%
nbench		0.333161	100.00%	0.329799	101.02%
tblook01	0.017863	100.00%	0.017666	101.12%
ttsprk01	0.054337	100.00%	0.053057	102.41%

Reviewer	: Andrew Trick <atrick@apple.com>, Hal Finkel <hfinkel@anl.gov>
Approver	: Andrew Trick <atrick@apple.com>
Test		: Pass make check-all & llvm test-suite

llvm-svn: 193460
2013-10-26 03:08:02 +00:00
Andrew Trick 57243da70f Fix SCEVExpander: don't try to expand quadratic recurrences outside a loop.
Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
(affecting trunk and 3.3)

When SCEV expands a recurrence outside of a loop it attempts to scale
by the stride of the recurrence. Chained recurrences don't work that
way. We could compute binomial coefficients, but would hve to
guarantee that the chained AddRec's are in a perfectly reduced form.

llvm-svn: 193438
2013-10-25 21:35:56 +00:00
Rafael Espindola 7749d7ccc7 Handle calls and invokes in GlobalStatus.
This patch teaches GlobalStatus to analyze a call that uses the global value as
a callee, not as an argument.

With this change internalize call handle the common use of linkonce_odr
functions. This reduces the number of linkonce_odr functions in a LTO build of
clang (checked with the emit-llvm gold plugin option) from 1730 to 60.

llvm-svn: 193436
2013-10-25 21:29:52 +00:00
Hal Finkel 02f562df43 LoopVectorizer: Don't attempt to vectorize extractelement instructions
The loop vectorizer does not currently understand how to vectorize
extractelement instructions. The existing check, which excluded all
vector-valued instructions, did not catch extractelement instructions because
it checked only the return value. As a result, vectorization would proceed,
producing illegal instructions like this:

  %58 = extractelement <2 x i32> %15, i32 0
  %59 = extractelement i32 %58, i32 0

where the second extractelement is illegal because its first operand is not a vector.

llvm-svn: 193434
2013-10-25 20:40:15 +00:00
Tom Stellard bc7d87f07c Inliner: Handle readonly attribute per argument when adding memcpy
Patch by: Vincent Lejeune

llvm-svn: 193356
2013-10-24 16:38:33 +00:00
Renato Golin 1ba143e140 Mark vector loops as already vectorized
Make sure we mark all loops (scalar and vector) when vectorizing,
so that we don't try to vectorize them anymore. Also, set unroll
to 1, since this is what we check for on early exit.

llvm-svn: 193349
2013-10-24 14:50:51 +00:00
Nuno Lopes 340b0463e6 fix PR17635: false positive with packed structures
LLVM optimizers may widen accesses to packed structures that overflow the structure itself, but should be in bounds up to the alignment of the object

llvm-svn: 193317
2013-10-24 09:17:24 +00:00
Juergen Ributzka d04d096ecf Fix a bug in LinearFunctionTestReplace that created invalid loop exit checks.
Reviewed by Andy

llvm-svn: 193303
2013-10-24 05:29:56 +00:00
Andrew Trick ada2356ac9 Clarify comments in genLoopLimit.
llvm-svn: 193292
2013-10-24 00:43:38 +00:00
Yuchen Wu 3197b25b27 Fixed comment typo in GCOVProfiling.cpp
llvm-svn: 193268
2013-10-23 20:35:00 +00:00
Shuxin Yang e4fb375995 Use address-taken to disambiguate global variable and indirect memops.
Major steps include:
 1). introduces a not-addr-taken bit-field in GlobalVariable
 2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable 
    dosen't have its address taken.
 3). AA use this info for disambiguation. 

llvm-svn: 193251
2013-10-23 17:28:19 +00:00
Eric Christopher 874fa0f6c7 Fix spelling, grammar, and match naming convention for test files.
llvm-svn: 193130
2013-10-21 23:14:06 +00:00
Tom Stellard e1631ddf93 SimplifyCFG: Don't duplicate calls to functions marked noduplicate v2
v2:
  - Use CI->cannotDuplicate()

llvm-svn: 193115
2013-10-21 20:07:30 +00:00
Matt Arsenault 404c60a7c3 Use more type helper functions
llvm-svn: 193109
2013-10-21 19:43:56 +00:00
Matt Arsenault fa64659bd8 Teach SimplifyCFG about address spaces
llvm-svn: 193104
2013-10-21 18:55:08 +00:00
Rafael Espindola 3d7fc25c7c Optimize more linkonce_odr values during LTO.
When a linkonce_odr value that is on the dso list is not unnamed_addr
we can still look to see if anything is actually using its address. If
not, it is safe to hide it.

This patch implements that by moving GlobalStatus to Transforms/Utils
and using it in Internalize.

llvm-svn: 193090
2013-10-21 17:14:55 +00:00
Michael Gottesman 63c63ac21e Fix the predecessor removal logic in r193045.
Additionally some small comment/stylistic fixes are included as well.

llvm-svn: 193068
2013-10-21 05:20:11 +00:00
Bill Wendling 90dd90afcb Don't eliminate a partially redundant load if it's in a landing pad.
A landing pad can be jumped to only by the unwind edge of an invoke
instruction. If we eliminate a partially redundant load in a landing pad, it
will create a basic block that violates this constraint. It then leads to other
problems down the line if it tries to merge that basic block with the landing
pad. Avoid this by not eliminating the load in a landing pad.

PR17621

llvm-svn: 193064
2013-10-21 04:09:17 +00:00
Michael Gottesman c024f3258a Teach simplify-cfg how to correctly create covered lookup tables for switches on iN with N >= 3.
One optimization simplify-cfg performs is the converting of switches to
lookup tables if the switch has > 4 cases. This is done by:

1. Finding the max/min case value and calculating the switch case range.
2. Create a lookup table basic block.
3. Perform a check in the switch's BB to see if the input value is in
the switch's case range. If the input value satisfies said predicate
branch to the lookup table BB, otherwise branch to the switch's default
destination BB using the default value as the result.

The conditional check consists of subtracting the min case value of the
table from any input iN value and then ensuring that said value is
unsigned less than the size of the lookup table represented as an iN
value.

If the lookup table is a covered lookup table, the size of the table will be N
which is 0 as an iN value. Thus the comparison will be an `icmp ult` of an iN
value against 0 which is always false yielding the incorrect result.

This patch fixes this problem by recognizing if we have a covered lookup table
and if we do, unconditionally jumps to the lookup table BB since the covering
property of the lookup table implies no input values could not be handled by
said BB.

rdar://15268442

llvm-svn: 193045
2013-10-20 07:04:37 +00:00
Bill Wendling 4fea22c63b Perform an intelligent splice of the predecessor with the single successor.
If the predecessor's being spliced into a landing pad, then we need the PHIs to
come first and the rest of the predecessor's code to come *after* the landing
pad instruction.

llvm-svn: 193035
2013-10-19 11:27:12 +00:00
Nadav Rotem 7f27e0b0ce Mark some command line flags as hidden
llvm-svn: 193013
2013-10-18 23:38:13 +00:00
Rafael Espindola 045a78fa7e Rename fields of GlobalStatus to match the coding style.
llvm-svn: 192910
2013-10-17 18:18:52 +00:00
Rafael Espindola 27797baee7 rename SafeToDestroyConstant to isSafeToDestroyConstant and clang-format.
llvm-svn: 192907
2013-10-17 18:06:32 +00:00
Rafael Espindola 026c9cbefe Simplify the interface of AnalyzeGlobal a bit and rename to analyzeGlobal.
No functionality change.

llvm-svn: 192906
2013-10-17 18:00:25 +00:00
Evgeniy Stepanov 21a9c93a4d [msan] Use zero-extension in shadow cast by default.
Switch to sign-extension in r192575 caused 7% perf loss on 482.sphinx3.

llvm-svn: 192882
2013-10-17 10:53:50 +00:00
Dmitry Vyukov b1ad5720a2 tsan: implement no_sanitize_thread attribute
If a function has no_sanitize_thread attribute,
do not instrument memory accesses in it.

llvm-svn: 192871
2013-10-17 07:20:06 +00:00
Arnold Schwaighofer a66582470b SLPVectorizer: Don't vectorize volatile memory operations
radar://15231682

Reapply r192799,
  http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang/builds/8226
showed that the bot is still broken even with this out.

llvm-svn: 192820
2013-10-16 17:52:40 +00:00
Arnold Schwaighofer 06a0324f6a Revert "SLPVectorizer: Don't vectorize volatile memory operations"
This speculatively reverts commit 192799. It might have broken a linux buildbot.

llvm-svn: 192816
2013-10-16 17:19:40 +00:00
Arnold Schwaighofer 5078ea2bd9 SLPVectorizer: Don't vectorize volatile memory operations
radar://15231682

llvm-svn: 192799
2013-10-16 16:09:00 +00:00
Kostya Serebryany d3d23bec66 [asan] Optimize accesses to global arrays with constant index
Summary:
Given a global array G[N], which is declared in this CU and has static initializer
avoid instrumenting accesses like G[i], where 'i' is a constant and 0<=i<N.
Also add a bit of stats.

This eliminates ~1% of instrumentations on SPEC2006
and also partially helps when asan is being run together with coverage.

Reviewers: samsonov

Reviewed By: samsonov

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1947

llvm-svn: 192794
2013-10-16 14:06:14 +00:00
Benjamin Kramer c97850be76 LoopVectorize: Properly reflect PODness in comments.
llvm-svn: 192717
2013-10-15 16:19:54 +00:00
Craig Topper ef9e993eaa Remove x86_sse42_crc32_64_8 intrinsic. It has no functional difference from x86_sse42_crc32_32_8 and was not mapped to a clang builtin. I'm not even sure why this form of the instruction is even called out explicitly in the docs. Also add AutoUpgrade support to convert it into the other intrinsic with appropriate trunc and zext.
llvm-svn: 192672
2013-10-15 05:20:47 +00:00
Rafael Espindola 8c1d78ad51 Remove lib/Transforms/Instrumentation/ProfilingUtils.*
They were leftover from the old profiling support.

Patch by Alastair Murray.

llvm-svn: 192605
2013-10-14 16:46:46 +00:00
Chris Lattner 94fc4bed1f Basic blocks typically have few predecessors. Use a SmallDenseMap to
avoid a heap allocation when this is the case.

llvm-svn: 192602
2013-10-14 16:05:55 +00:00
Evgeniy Stepanov be83d8f693 [msan] Instrument x86.*_cvt* intrinsics.
Currently MSan checks that arguments of *cvt* intrinsics are fully initialized.
That's too much to ask: some of them only operate on lower half, or even
quarter, of the input register.

llvm-svn: 192599
2013-10-14 15:16:25 +00:00
Evgeniy Stepanov 9b5517b127 [msan] Fix handling of scalar select of vectors.
llvm-svn: 192575
2013-10-14 09:52:09 +00:00
Arnold Schwaighofer 58864d2d5f SLPVectorizer: Sort PHINodes based on their opcode
Before this patch we relied on the order of phi nodes when we looked for phi
nodes of the same type. This could prevent vectorization of cases where there
was a phi node of a second type in between phi nodes of some type.

This is important for vectorization of an internal graphics kernel. On the test
suite + external on x86_64 (and on a run on armv7s) it showed no impact on
either performance or compile time.

radar://15024459

llvm-svn: 192537
2013-10-12 18:56:27 +00:00
Tobias Grosser 5cff1e2d78 LoopVectorize: Add missing INITIALIZE_PASS_DEPENDENCY macros
Contributed-by:  Peter Zotov  <whitequark@whitequark.org>
llvm-svn: 192536
2013-10-12 18:29:15 +00:00
Renato Golin dd943a8919 Better info when debugging vectorizer
llvm-svn: 192460
2013-10-11 16:14:39 +00:00
Shuxin Yang 1cab418ce2 Fix a bug in Dead Argument Elimination.
If a function seen at compile time is not necessarily the one linked to
the binary being built, it is illegal to change the actual arguments
passing to it. 

  e.g. 
   --------------------------
   void foo(int lol) {
     // foo() has linkage satisifying isWeakForLinker()
     // "lol" is not used at all.
   }

   void bar(int lo2) {
      // xform to foo(undef) is illegal, as compiler dose not know which
      // instance of foo() will be linked to the the binary being built.
      foo(lol2); 
   }
  -----------------------------

  Such functions can be captured by isWeakForLinker(). NOTE that
mayBeOverridden() is insufficient for this purpose as it dosen't include
linkage types like AvailableExternallyLinkage and LinkOnceODRLinkage.
Take link_odr* as an example, it indicates a set of *EQUIVALENT* globals
that can be merged at link-time. However, the semantic of 
*EQUIVALENT*-functions includes parameters. Changing parameters breaks
the assumption.

  Thank John McCall for help, especially for the explanation of subtle
difference between linkage types.

  rdar://11546243

llvm-svn: 192302
2013-10-09 17:21:44 +00:00
Arnold Schwaighofer 0caddfc731 LoopVectorize: External uses must use the last value in a reduction cycle
Otherwise, we don't perform operations that would have been performed on
the scalar version.

Fixes PR17498.

llvm-svn: 192133
2013-10-07 21:05:43 +00:00
Alexey Samsonov a1944e6d26 Revert r191834 until we measure the effect of this benchmarks and maybe find a better way to fix it
llvm-svn: 192121
2013-10-07 19:03:24 +00:00
Hal Finkel f5a3eaea55 UpdatePHINodes in BasicBlockUtils should not crash on duplicate predecessors
UpdatePHINodes has an optimization to reuse an existing PHI node, where it
first deletes all of its entries and then replaces them. Unfortunately, in the
case where we had duplicate predecessors (which are allowed so long as the
associated PHI entries have the same value), the loop removing the existing PHI
entries from the to-be-reused PHI would assert (if that PHI was not the one
which had the duplicates).

llvm-svn: 192001
2013-10-04 23:41:05 +00:00
Arnold Schwaighofer 698d4ac8a8 SLPVectorizer: Sort inputs to commutative binary operations
Sort the operands of the other entries in the current vectorization root
according to the first entry's operands opcodes.

%conv0 = uitofp ...
%load0 = load float ...

= fmul %conv0, %load0
= fmul %load0, %conv1
= fmul %load0, %conv2

Make sure that we recursively vectorize <%conv0, %conv1, %conv2> and <%load0,
%load0, %load0>.

This makes it more likely to obtain vectorizable trees. We have to be careful
when we sort that we don't destroy 'good' existing ordering implied by source
order.

radar://15080067

llvm-svn: 191977
2013-10-04 20:39:16 +00:00
Owen Anderson 5797bfd4a3 Pull fptrunc's upwards through selects when one of the select's selectands was a constant. This has a number of benefits, including producing small immediates (easier to materialize, smaller constant pools) as well as being more likely to allow the fptrunc to fuse with a preceding instruction (truncating selects are unusual).
llvm-svn: 191929
2013-10-03 21:08:05 +00:00
Rafael Espindola cda2911caa Optimize linkonce_odr unnamed_addr functions during LTO.
Generalize the API so we can distinguish symbols that are needed just for a DSO
symbol table from those that are used from some native .o.

The symbols that are only wanted for the dso symbol table can be dropped if
llvm can prove every other dso has a copy (linkonce_odr) and the address is not
important (unnamed_addr).

llvm-svn: 191922
2013-10-03 18:29:09 +00:00
Matt Arsenault bfa37e546d Make gep i8* X, -(ptrtoint Y) transform work with address spaces
llvm-svn: 191920
2013-10-03 18:15:57 +00:00