Fixes rdar:14036816, PR16130.
There is an opportunity to compute precise trip counts for 'or'
expressions and multi-exit loops.
rdar:14038809: Optimize trip count computation for multi-exit loops.
To do this we need to record the fact that ExitLimit assumes NSW. When
it does not we can safely assume that the loop trip count is the
minimum ExitLimt across all subexpressions and loop exits.
llvm-svn: 183060
Account for the cost of scaling factor in Loop Strength Reduce when rating the
formulae. This uses a target hook.
The default implementation of the hook is: if the addressing mode is legal, the
scaling factor is free.
<rdar://problem/13806271>
llvm-svn: 183045
Fixes PR16130 - clang produces incorrect code with loop/expression at -O2.
This is a 2+ year old bug that's now holding up the release. It's a
case where we knowingly made aggressive assumptions about undefined
behavior. These assumptions are wrong when SCEV is computing a
subexpression that does not directly control the branch. With this
fix, we avoid making assumptions in those cases but still optimize the
common case. SCEV's trip count computation for exits controlled by
'or' expressions is now analagous to the trip count computation for
loops with multiple exits. I had already fixed the multiple exit case
to be conservative.
llvm-svn: 182989
- llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic
by making the root of additional loop metadata.
- Loop::isAnnotatedParallel now looks for llvm.loop and associated
llvm.mem.parallel_loop_access
- document llvm.loop and update llvm.mem.parallel_loop_access
- add support for llvm.vectorizer.width and llvm.vectorizer.unroll
- document llvm.vectorizer.* metadata
- add utility class LoopVectorizerHints for getting/setting loop metadata
- use llvm.vectorizer.width=1 to indicate already vectorized instead of
already_vectorized
- update existing tests that used llvm.loop.parallel and
llvm.vectorizer.already_vectorized
Reviewed by: Nadav Rotem
llvm-svn: 182802
Other than recognizing the attribute, the patch does little else.
It changes the branch probability analyzer so that edges into
blocks postdominated by a cold function are given low weight.
Added analysis and code generation tests. Added documentation for the
new attribute.
llvm-svn: 182638
BitVector/SmallBitVector::reference::operator bool remain implicit since
they model more exactly a bool, rather than something else that can be
boolean tested.
The most common (non-buggy) case are where such objects are used as
return expressions in bool-returning functions or as boolean function
arguments. In those cases I've used (& added if necessary) a named
function to provide the equivalent (or sometimes negative, depending on
convenient wording) test.
One behavior change (YAMLParser) was made, though no test case is
included as I'm not sure how to reach that code path. Essentially any
comparison of llvm::yaml::document_iterators would be invalid if neither
iterator was at the end.
This helped uncover a couple of bugs in Clang - test cases provided for
those in a separate commit along with similar changes to `operator bool`
instances in Clang.
llvm-svn: 181868
the things, and renames it to CBindingWrapping.h. I also moved
CBindingWrapping.h into Support/.
This new file just contains the macros for defining different wrap/unwrap
methods.
The calls to those macros, as well as any custom wrap/unwrap definitions
(like for array of Values for example), are put into corresponding C++
headers.
Doing this required some #include surgery, since some .cpp files relied
on the fact that including Wrap.h implicitly caused the inclusion of a
bunch of other things.
This also now means that the C++ headers will include their corresponding
C API headers; for example Value.h must include llvm-c/Core.h. I think
this is harmless, since the C API headers contain just external function
declarations and some C types, so I don't believe there should be any
nasty dependency issues here.
llvm-svn: 180881
We switch the order of offset and field type to make TBAAStructType node
(name, parent node, offset) similar to scalar TBAA node (name, parent node).
TypeIsImmutable is added to TBAAStructTag node.
llvm-svn: 180654
The tag is of type TBAANode when flag EnableStructPathTBAA is off.
Move implementation of MDNode::getMostGenericTBAA to TypeBasedAliasAnalysis.cpp
since it depends on how to interprete the MDNodes for scalar TBAA and
struct-path aware TBAA.
llvm-svn: 180068
PR15000 has a testcase where the time to compile was bordering on 30s. When I
dropped the limit value to 100, it became a much more managable 6s. The compile
time seems to increase in a roughly linear fashion based on increasing the limit
value. (See the runtimes below.)
So, let's lower the limit to 100 so that they can get a more reasonable compile
time.
Limit Value Time
----------- ----
10 0.9744s
20 1.8035s
30 2.3618s
40 2.9814s
50 3.6988s
60 4.5486s
70 4.9314s
80 5.8012s
90 6.4246s
100 7.0852s
110 7.6634s
120 8.3553s
130 9.0552s
140 9.6820s
150 9.8804s
160 10.8901s
170 10.9855s
180 12.0114s
190 12.6816s
200 13.2754s
210 13.9942s
220 13.8097s
230 14.3272s
240 15.7753s
250 15.6673s
260 16.0541s
270 16.7625s
280 17.3823s
290 18.8213s
300 18.6120s
310 20.0333s
320 19.5165s
330 20.2505s
340 20.7068s
350 21.1833s
360 22.9216s
370 22.2152s
380 23.9390s
390 23.4609s
400 24.0426s
410 24.6410s
420 26.5208s
430 27.7155s
440 26.4142s
450 28.5646s
460 27.3494s
470 29.7255s
480 29.4646s
490 30.5001s
llvm-svn: 179713
This is basically the same fix in three different places. We use a set to avoid
walking the whole tree of a big ConstantExprs multiple times.
For example: (select cmp, (add big_expr 1), (add big_expr 2))
We don't want to visit big_expr twice here, it may consist of thousands of
nodes.
The testcase exercises this by creating an insanely large ConstantExprs out of
a loop. It's questionable if the optimizer should ever create those, but this
can be triggered with real C code. Fixes PR15714.
llvm-svn: 179458
On certain architectures we can support efficient vectorized version of
instructions if the operand value is uniform (splat) or a constant scalar.
An example of this is a vector shift on x86.
We can efficiently support
for (i = 0 ; i < ; i += 4)
w[0:3] = v[0:3] << <2, 2, 2, 2>
but not
for (i = 0; i < ; i += 4)
w[0:3] = v[0:3] << x[0:3]
This patch adds a parameter to getArithmeticInstrCost to further qualify operand
values as uniform or uniform constant.
Targets can then choose to return a different cost for instructions with such
operand values.
A follow-up commit will test this feature on x86.
radar://13576547
llvm-svn: 178807
This is a compile time optimization. Before the patch we would do two traversals
on each call to aliasGEP - one with a set size parameter one with UnknownSize.
We can do better by first checking the result of the alias query with
UnknownSize.
Only if this one returns MayAlias do we query a second time using size and type.
This recovers an about 7% compile time regression on spec/ammp.
radar://12349960
llvm-svn: 178045
Fixes PR15570: SEGV: SCEV back-edge info invalid after dead code removal.
Indvars creates a SCEV expression for the loop's back edge taken
count, then determines that the comparison is always true and
removes it.
When loop-unroll asks for the expression, it contains a NULL
SCEVUnknkown (as a CallbackVH).
forgetMemoizedResults should invalidate the loop back edges expression.
llvm-svn: 177986
Add "evaluate-tbaa" to print alias queries of loads/stores. Alias queries
between pointers do not include TBAA tags.
Add testing case for "placement new". TBAA currently says NoAlias.
llvm-svn: 177772
This handles the case where we have an inbounds GEP with alloca as the pointer.
This fixes the regression in PR12750 and rdar://13286434.
Note that we can also fix this by handling some GEP cases in isKnownNonNull.
llvm-svn: 177321
This pass hasn't been touched in two years & would fail with assertions against
the current debug info metadata format (the only test case for it still uses a
many-versions old debug info metadata format)
llvm-svn: 176707
The "invariant.load" metadata indicates the memory unit being accessed is immutable.
A load annotated with this metadata can be moved across any store.
As I am not sure if it is legal to move such loads across barrier/fence, this
change dose not allow such transformation.
rdar://11311484
Thank Arnold for code review.
llvm-svn: 176562
This adds minimalistic support for PHI nodes to llvm.objectsize() evaluation
fingers crossed so that it does break clang boostrap again..
llvm-svn: 176408
We make the cost for calling libm functions extremely high as emitting the
calls is expensive and causes spills (on x86) so performance suffers. We still
vectorize important calls like ceilf and friends on SSE4.1. and fabs.
Differential Revision: http://llvm-reviews.chandlerc.com/D466
llvm-svn: 176287
This problem is exposed by r171325 which is already reverted. It is rather
hard to fabricate a testing case without it.
r171325 should *NOT* be resurrected as it has a potential problem although
this problem dosen't directly contribute to PR14988.
The bug is tracked by:
- rdar://13063553, and
- http://llvm.org/bugs/show_bug.cgi?id=14988
Thank Arnold for coming up a better solution to this problem. After
comparing this solution and my original proposal, I decided to ditch mine.
llvm-svn: 176225
These are two related changes (one in llvm, one in clang).
LLVM:
- rename address_safety => sanitize_address (the enum value is the same, so we preserve binary compatibility with old bitcode)
- rename thread_safety => sanitize_thread
- rename no_uninitialized_checks -> sanitize_memory
CLANG:
- add __attribute__((no_sanitize_address)) as a synonym for __attribute__((no_address_safety_analysis))
- add __attribute__((no_sanitize_thread))
- add __attribute__((no_sanitize_memory))
for S in address thread memory
If -fsanitize=S is present and __attribute__((no_sanitize_S)) is not
set llvm attribute sanitize_S
llvm-svn: 176075
Check for reverse shuffles in the CostModel analysis pass and query
TargetTransform info accordingly. This allows us we can write test cases for
reverse shuffles.
radar://13171406
llvm-svn: 174932
This reverts r171041. This was a nice idea that didn't work out well.
Clang warnings need to be associated with warning groups so that they can
be selectively disabled, promoted to errors, etc. This simplistic patch didn't
allow for that. Enhancing it to provide some way for the backend to specify
a front-end warning type seems like overkill for the few uses of this, at
least for now.
llvm-svn: 174748
Adds a function to target transform info to query for the cost of address
computation. The cost model analysis pass now also queries this interface.
The code in LoopVectorize adds the cost of address computation as part of the
memory instruction cost calculation. Only there, we know whether the instruction
will be scalarized or not.
Increase the penality for inserting in to D registers on swift. This becomes
necessary because we now always assume that address computation has a cost and
three is a closer value to the architecture.
radar://13097204
llvm-svn: 174713
Prepare it for vectors of pointers and handle simple cases. We don't handle
complicated cases because accumulateConstantOffset bails on pointer vectors.
Fixes selfhost on i386.
llvm-svn: 174179
remaining use of AliasAnalysis concepts such as isIdentifiedObject to
prove pointer inequality.
@external_compare in test/Transforms/InstSimplify/compare.ll shows a simple
case where a noalias argument can be equal to a global variable address, and
while AliasAnalysis can get away with saying that these pointers don't alias,
instsimplify cannot say that they are not equal.
llvm-svn: 174122
We use constant folding to see if an intrinsic evaluates to the same value as a
constant that we know. If we don't take the undefinedness into account we get a
value that doesn't match the actual implementation, and miscompiled code.
This was uncovered by Chandler's simplifycfg changes.
llvm-svn: 173356
generic function calls and intrinsics. This is somewhat overlapping with
an existing intrinsic cost method, but that one seems targetted at
vector intrinsics. I'll merge them or separate their names and use cases
in a separate commit.
This sinks the test of 'callIsSmall' down into TTI where targets can
control it. The whole thing feels very hack-ish to me though. I've left
a FIXME comment about the fundamental design problem this presents. It
isn't yet clear to me what the users of this function *really* care
about. I'll have to do more analysis to figure that out. Putting this
here at least provides it access to proper analysis pass tools and other
such. It also allows us to more cleanly implement the baseline cost
interfaces in TTI.
With this commit, it is now theoretically possible to simplify much of
the inline cost analysis's handling of calls by calling through to this
interface. That conversion will have to happen in subsequent commits as
it requires more extensive restructuring of the inline cost analysis.
The CodeMetrics class is now really only in the business of running over
a block of code and aggregating the metrics on that block of code, with
the actual cost evaluation done entirely in terms of TTI.
llvm-svn: 173148
Previously we tried to infer it from the bit width size, with an added
IsIEEE argument for the PPC/IEEE 128-bit case, which had a default
value. This default value allowed bugs to creep in, where it was
inappropriate.
llvm-svn: 173138
is free. The whole CodeMetrics API should probably be reworked more, but
this is enough to allow deleting the duplicate code there for computing
whether an instruction is free.
All of the passes using this have been updated to pull in TTI and hand
it to the CodeMetrics stuff. Further, a dead CodeMetrics API
(analyzeFunction) is nuked for lack of users.
llvm-svn: 173036
analysis. How cute that it wasn't previously. ;]
Part of this confusion stems from the flattened header file tree. Thanks
to Benjamin for pointing out the goof on IRC, and we're considering
un-flattening the headers, so speak now if that would bug you.
llvm-svn: 173033
old CodeMetrics system. TTI has the specific advantage of being
extensible and customizable by targets to reflect target-specific cost
metrics.
llvm-svn: 173032
depend on and use other analyses (as long as they're either immutable
passes or CGSCC passes of course -- nothing in the pass manager has been
fixed here). Leverage this to thread TargetTransformInfo down through
the inline cost analysis.
No functionality changed here, this just threads things through.
llvm-svn: 173031
a dynamic analysis done on each call to the routine. However, now it can
use the standard pass infrastructure to reference other analyses,
instead of a silly setter method. This will become more interesting as
I teach it about more analysis passes.
This updates the two inliner passes to use the inline cost analysis.
Doing so highlights how utterly redundant these two passes are. Either
we should find a cheaper way to do always inlining, or we should merge
the two and just fiddle with the thresholds to get the desired behavior.
I'm leaning increasingly toward the latter as it would also remove the
Inliner sub-class split.
llvm-svn: 173030
lowered cost.
Currently, this is a direct port of the logic implementing
isInstructionFree in CodeMetrics. The hope is that the interface can be
improved (f.ex. supporting un-formed instruction queries) and the
implementation abstracted so that as we have test cases and target
knowledge we can expose increasingly accurate heuristics to clients.
I'll start switching existing consumers over and kill off the routine in
CodeMetrics in subsequent commits.
llvm-svn: 172998
Okay, here's how to reproduce the problem:
1) Build a Release (or Release+Asserts) version of clang in the normal way.
2) Using the clang & clang++ binaries from (1), build a Release (or
Release+Asserts) version of the same sources, but this time enable LTO ---
specify the `-flto' flag on the command line.
3) Run the ARC migrator tests:
$ arcmt-test --args -triple x86_64-apple-darwin10 -fsyntax-only -x objective-c++ ./src/tools/clang/test/ARCMT/cxx-rewrite.mm
You'll see that the output isn't correct (the whitespace is off).
The mis-compile is in the function `RewriteBuffer::RemoveText' in the
clang/lib/Rewrite/Core/Rewriter.cpp file. When that function and RewriteRope.cpp
are compiled with LTO and the `arcmt-test' executable is regenerated, you'll see
the error. When those files are not LTO'ed, then the output of the `arcmt-test'
is fine.
It is *really* hard to get a testcase out of this. I'll file a PR with what I
have currently.
--- Reverse-merging r172363 into '.':
U include/llvm/Analysis/MemoryBuiltins.h
U lib/Analysis/MemoryBuiltins.cpp
--- Reverse-merging r171325 into '.':
U test/Transforms/InstCombine/objsize.ll
G include/llvm/Analysis/MemoryBuiltins.h
G lib/Analysis/MemoryBuiltins.cpp
llvm-svn: 172756
Moving the X86CostTable to a common place, so that other back-ends
can share the code. Also simplifying it a bit and commoning up
tables with one and two types on operations.
llvm-svn: 172658
TargetTransformInfo rather than TargetLowering, removing one of the
primary instances of the layering violation of Transforms depending
directly on Target.
This is a really big deal because LSR used to be a "special" pass that
could only be tested fully using llc and by looking at the full output
of it. It also couldn't run with any other loop passes because it had to
be created by the backend. No longer is this true. LSR is now just
a normal pass and we should probably lift the creation of LSR out of
lib/CodeGen/Passes.cpp and into the PassManagerBuilder. =] I've not done
this, or updated all of the tests to use opt and a triple, because
I suspect someone more familiar with LSR would do a better job. This
change should be essentially without functional impact for normal
compilations, and only change behvaior of targetless compilations.
The conversion required changing all of the LSR code to refer to the TTI
interfaces, which fortunately are very similar to TargetLowering's
interfaces. However, it also allowed us to *always* expect to have some
implementation around. I've pushed that simplification through the pass,
and leveraged it to simplify code somewhat. It required some test
updates for one of two things: either we used to skip some checks
altogether but now we get the default "no" answer for them, or we used
to have no information about the target and now we do have some.
I've also started the process of removing AddrMode, as the TTI interface
doesn't use it any longer. In some cases this simplifies code, and in
others it adds some complexity, but I think it's not a bad tradeoff even
there. Subsequent patches will try to clean this up even further and use
other (more appropriate) abstractions.
Yet again, almost all of the formatting changes brought to you by
clang-format. =]
llvm-svn: 171735