There are currently two issues, of which I currently know, that prevent TBAA
from being correctly usable in CodeGen:
1. Stack coloring does not update TBAA when merging allocas. This is easy
enough to fix, but is not the largest problem.
2. CGP inserts ptrtoint/inttoptr pairs when sinking address computations.
Because BasicAA does not handle inttoptr, we'll often miss basic type punning
idioms that we need to catch so we don't miscompile real-world code (like LLVM).
I don't yet have a small test case for this, but this fixes self hosting a
non-asserts build of LLVM on PPC64 when using -enable-aa-sched-mi and -misched=shuffle.
llvm-svn: 200093
This option (which is !NDEBUG only) allows restricting the use of alias
analysis in DAGCombiner to a specific function. This has proved extremely
valuable to isolating bugs related to this feature, and mirrors the
misched-only-func option provided by the new instruction scheduler.
llvm-svn: 200088
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.
We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.
llvm-svn: 200058
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.
llvm-svn: 200034
DAGCombiner::GatherAllAliases, which is only used when AA used is enabled
during DAGCombine, had a fundamentally incorrect assumption for which this
change compensates. GatherAllAliases, which is used to find aliasing
predecessor chain nodes (so that a better chain can be selected for a load or
store to enable subsequent optimizations) assumed that walking up the chain
would always catch all possibly-aliasing loads and stores. This is not true: To
really find all aliases, we also need to search for aliases through the value
operand of a store, etc. Consider the following situation:
Token1 = ...
L1 = load Token1, %52
S1 = store Token1, L1, %51
L2 = load Token1, %52+8
S2 = store Token1, L2, %51+8
Token2 = Token(S1, S2)
L3 = load Token2, %53
S3 = store Token2, L3, %52
L4 = load Token2, %53+8
S4 = store Token2, L4, %52+8
If we search for aliases of S3 (which loads address %52), and we look only
through the chain, then we'll miss the trivial dependence on L1 (which loads
from %52). We then might change all loads and stores to use Token1 as their
chain operand, which could result in copying %53 into %52 before copying
%52 into %51 (which should happen first).
The problem is, however, that searching for such data dependencies can become
expensive, and the cost is not directly related to the chain depth. Instead,
we'll rule out such configurations by insisting that we've visited all chain
users (except for users of the original chain, which is not necessary). When
doing this, we need to look through nodes we don't care about (otherwise,
things like register copies will interfere with trivial use cases).
Unfortunately, I don't have a small test case for this problem. Creating the
underlying situation is not hard (a pair of memcpys will do it), but arranging
for the default instruction schedule to be incorrect is very fragile.
This unbreaks self hosting on PPC64 when using
-mllvm -combiner-global-alias-analysis -mllvm -combiner-alias-analysis.
llvm-svn: 200033
These transformations obviously won't work for indexed (pre/post-inc) loads and
stores. In practice, I'm not sure there is any benefit to enabling them for
indexed nodes because other transformations that these might enable likely also
won't handle indexed nodes.
I don't have an in-tree test case that hits this problem, but an upcoming bug
fix will make it much more likely.
llvm-svn: 200023
This pass identifies expensive constants to hoist and coalesces them to
better prepare it for SelectionDAG-based code generation. This works around the
limitations of the basic-block-at-a-time approach.
First it scans all instructions for integer constants and calculates its
cost. If the constant can be folded into the instruction (the cost is
TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
consider it expensive and leave it alone. This is the default behavior and
the default implementation of getIntImmCost will always return TCC_Free.
If the cost is more than TCC_BASIC, then the integer constant can't be folded
into the instruction and it might be beneficial to hoist the constant.
Similar constants are coalesced to reduce register pressure and
materialization code.
When a constant is hoisted, it is also hidden behind a bitcast to force it to
be live-out of the basic block. Otherwise the constant would be just
duplicated and each basic block would have its own copy in the SelectionDAG.
The SelectionDAG recognizes such constants as opaque and doesn't perform
certain transformations on them, which would create a new expensive constant.
This optimization is only applied to integer constants in instructions and
simple (this means not nested) constant cast experessions. For example:
%0 = load i64* inttoptr (i64 big_constant to i64*)
Reviewed by Eric
llvm-svn: 200022
There is no inline asm in a .s file. Therefore, there should be no logic to
handle it in the streamer. Inline asm only exists in bitcode files, so the
logic can live in the (long misnamed) AsmPrinter class.
llvm-svn: 200011
compile unit. Make these relocations on the platforms that need
relocations and add a routine to ensure that we don't put the
addresses in an offset table for split dwarf.
llvm-svn: 199990
This patch adds the target analysis passes (usually TargetTransformInfo) to the
codgen pipeline. We also expose now the AddAnalysisPasses method through the C
API, because the optimizer passes would also benefit from better target-specific
cost models.
Reviewed by Andrew Kaylor
llvm-svn: 199926
e.g. linkonce, to TargetMachine and set it when we've done so
for ELF targets currently. This involved making TargetMachine
non-const in a TLOF use and propagating that change around - I'm
open to other ideas.
This will be used in a future commit to handle emitting debug
information with ranges.
llvm-svn: 199871
This is a horrible bit of code. We're calling a simplification routine *in the middle* of type legalization. We tell the
simplification routine that it's running after legalization, but some of the types it will encounter will be illegal! The
fix is only to invoke the simplification if the types in question were legal, so that none of its invariants will be violated.
llvm-svn: 199847
This patch restores the ARM mode if the user's inline assembly
does not. In the object streamer, it ensures that instructions
following the inline assembly are encoded correctly and that
correct mapping symbols are emitted. For the asm streamer, it
emits a .arm or .thumb directive.
This patch does not ensure that the inline assembly contains
the ADR instruction to switch modes at runtime.
The problem we need to solve is code like this:
int foo(int a, int b) {
int r = a + b;
asm volatile(
".align 2 \n"
".arm \n"
"add r0,r0,r0 \n"
: : "r"(r));
return r+1;
}
If we compile this function in thumb mode then the inline assembly
will switch to arm mode. We need to make sure that we switch back to
thumb mode after emitting the inline assembly or we will incorrectly
encode the instructions that follow (i.e. the assembly instructions
for return r+1).
Based on patch by David Peixotto
Change-Id: Ib57f6d2d78a22afad5de8693fba6230ff56ba48b
llvm-svn: 199818
This actually totally breaks and causes the machine verifier to cry in several cases, one of which being:
%RAX<def> = COPY %RCX<kill>
%ECX<def> = COPY %EAX<kill>, %RAX<imp-use,kill>
These subregister copies are together identified as noops, so are both removed. However, the second one as it has an imp-use gets converted into a kill:
%ECX<def> = KILL %EAX<kill>, %RAX<imp-use,kill>
As the original COPY has been removed, the verifier goes into tears at the use of undefined EAX and RAX.
There are several hacky solutions to this hacky problem (which is all to do with imp-use/def weirdnesses), but the least hacky I've come up with is to *always* remove COPYs by converting to KILLs. KILLs are no-ops to the code generator so the generated code doesn't change (which is why they were partially used in the first place), but using them also keeps the def/use and imp-def/imp-use chains alive:
%RAX<def> = KILL %RCX<kill>
%ECX<def> = KILL %EAX<kill>, %RAX<imp-use,kill>
The patch passes all test cases including the ones that check the removal of MOVs in this circumstance, along with an extra test I added to check subregister behaviour (which made the machine verifier fall over before my patch).
The patch also adds some DEBUG() statements because the file hadn't got any.
llvm-svn: 199797
Fix a crash in SjLjEHPrepare::lowerIncomingArguments caused by treating
VectorType like an aggregate. It's first-class!
<rdar://problem/15854596>
llvm-svn: 199768
StackProtector keeps a ValueMap of alloca instructions to layout kind tags for
use by PEI and other later passes. When stack coloring replaces one alloca with
a bitcast to another one, the key replacement in this map does not work.
Instead, provide an interface to manage this updating directly. This seems like
an improvement over the old behavior, where the layout map would not get
updated at all when the stack slots were merged. In practice, however, there is
likely no observable difference because PEI only did anything special with
'large array' kinds, and if one large array is merged with another, than the
replacement should already have been a large array.
This is an attempt to unbreak the clang-x86_64-darwin11-RA builder.
llvm-svn: 199684
The way that stack coloring updated MMOs when merging stack slots, while
correct, is suboptimal, and is incompatible with the use of AA during
instruction scheduling. The solution, which involves the use of const_cast (and
more importantly, updating the IR from within an MI-level pass), obviously
requires some explanation:
When the stack coloring pass was originally committed, the code in
ScheduleDAGInstrs::buildSchedGraph tracked possible alias sets by using
GetUnderlyingObject, and all load/store and store/store memory control
dependencies where added between SUs at the object level (where only one
object, that returned by GetUnderlyingObject, was used to identify the object
associated with each MMO). When stack coloring merged stack slots, it would
replace MMOs derived from the remapped alloca with the alloca with which the
remapped alloca was being replaced. Because ScheduleDAGInstrs only used single
objects, and tracked alias sets at the object level, this was a fine solution.
In r169744, (Andy and) I updated the code in ScheduleDAGInstrs to use
GetUnderlyingObjects, and track alias sets using, potentially, multiple
underlying objects for each MMO. This was done, primarily, to provide the
ability to look through PHIs, and provide better scheduling for
induction-variable-dependent loads and stores inside loops. At this point, the
MMO-updating code in stack coloring became suboptimal, because it would clear
the MMOs for (i.e. completely pessimize) all instructions for which r169744
might help in scheduling. Updating the IR directly is the simplest fix for this
(and the one with, by far, the least compile-time impact), but others are
possible (we could give each MMO a small vector of potential values, or make
use of a remapping table, constructed from MFI, inside ScheduleDAGInstrs).
Unfortunately, replacing all MMO values derived from the remapped alloca with
the base replacement alloca fundamentally breaks our ability to use AA during
instruction scheduling (which is critical to performance on some targets). The
reason is that the original MMO might have had an offset (either constant or
dynamic) from the base remapped alloca, and that offset is not present in the
updated MMO. One possible way around this would be to use
GetPointerBaseWithConstantOffset, and update not only the MMO's value, but also
its offset based on the original offset. Unfortunately, this solution would
only handle constant offsets, and for safety (because AA is not completely
restricted to deducing relationships with constant offsets), we would need to
clear all MMOs without constant offsets over the entire function. This would be
an even worse pessimization than the current single-object restriction. Any
other solution would involve passing around a vector of remapped allocas, and
teaching AA to use it, introducing additional complexity and overhead into AA.
Instead, when remapping an alloca, we replace all IR uses of that alloca as
well (optionally inserting a bitcast as necessary). This is even more efficient
that the old MMO-updating code in the stack coloring pass (because it removes
the need to call GetUnderlyingObject on all MMO values), removes the
single-object pessimization in the default configuration, and enables the
correct use of AA during instruction scheduling (all without any additional
overhead).
LLVM now no longer miscompiles itself on x86_64 when using -enable-misched
-enable-aa-sched-mi -misched-bottomup=0 -misched-topdown=0 -misched=shuffle!
Fixed PR18497.
Because the alloca replacement is now done at the IR level, unless the MMO
directly refers to the remapped alloca, the change cannot be seen at the MI
level. As a result, there is no good way to fix test/CodeGen/X86/pr14090.ll.
llvm-svn: 199658
When using AA to break false chain dependencies, we need to track multiple
stores per object in ScheduleDAGInstrs. Historically, we tracked potential alias
chains at the object level, and so all loads of an object would retain
dependencies on any store to that object. With AA, however, this is not
sufficient: non-overlapping stores and loads to the same object all need to be
tested for dependencies separately, we cannot only test all loads to an object
against only the last store (see PR18497 for an explicit example).
To mitigate any unwelcome compile-time impact when not using AA, only one store
is kept in the list per object when not using AA.
This, along with a stack coloring change to come shortly, will provide a test
case, fix PR18497 (and allow LLVM to compile itself using -enable-aa-sched-mi
on x86-64).
llvm-svn: 199657
type units were enabled. The crux of the issue is that the
addDwarfTypeUnitType routine can end up being indirectly recursive. In
this case, the reference into the dense map (TU) became invalid by the
time we popped all the way back and used it to add the DIE type
signature.
Instead, use early return in the case where we can bypass the recursive
step and creating a type unit. Then use the pointer to the new type unit
to set up the DIE type signature in the case where we have to.
I tried really hard to reduce a testcase for this, but it's really
annoying. You have to get this to be mid-recursion when the densemap
grows. Even if we got a test case for this today, it'd be very unlikely
to continue exercising this pattern.
llvm-svn: 199630
There are two attempted optimisations in reMaterializeTrivialDef, trying to
avoid promoting the size of a register too much when rematerializing.
Unfortunately, both appear to be flawed. First, we see if the original register
would have worked, but this is inadequate. Consider:
v1 = SOMETHING (v1 is QQ)
v2:Q0 = COPY v1:Q1 (v1, v2 are QQ)
...
uses of v2
In this case even though v2 *could* be used directly as the output of
SOMETHING, this would set the wrong bits of the QQ register involved. The
correct rematerialization must be:
v2:Q0_Q1 = SOMETHING (v2 promoted to QQQ)
...
uses of v2:Q1_Q2
For the second optimisation, if the correct remat is "v2:idx = SOMETHING" then
we can't necessarily expect v2 itself to be valid for SOMETHING, but we do try
to hunt for a class between v1 and v2 that works. Unfortunately, this is also
wrong:
v1 = SOMETHING (v1 is QQ)
v2:Q0_Q1 = COPY v1 (v1 is QQ, v2 is QQQ)
...
uses of v2 as a QQQ
The canonical rematerialization here is "v2:Q0_Q1 = SOMETHING". However current
logic would decide that v2 could be a QQ (no interest is taken in later uses).
This patch, therefore, always accepts the widened register class without trying
to be clever. Generally there is no penalty to this (e.g. in the common GR32 <
GR64 case, expanding the width doesn't matter because it's not like you were
going to do anything else with the high bits of a GR32 register). It can
increase register pressure in cases like the ARM VFP regs though (multiple
non-overlapping but equivalent subregisters). This situation can be
spotted by the fact that both source and destination in the
not-quite-coalesced pair have a sub-register index and
rematerialisation is skipped in that situation.
Unfortunately, no in-tree targets actually expose this as far as I can tell
(there are so few isAsCheapAsAMove instructions for it to trigger on) so I've
been unable to produce a test. It was exposed in our ARM64 SPEC tests though,
and I will be adding a test there that we should be able to contribute
soon(TM).
rdar://problem/15775279
llvm-svn: 199376
This fixes a regression intruced by r199135.
Revision 199135 tried to simplify part of the logic in method
DAGCombiner::SimplifyVBinOp introducing calls to method BuildVectorSDNode::isConstant().
However, that revision wrongly changed the check performed by method
SimplifyVBinOp to identify dag nodes that can be folded.
Before revision 199135, that method only tried to simplify vector binary operations
if both operands were build_vector of Constant/ConstantFP/Undef only.
After revision 199135, method SimplifyVBinop tried to
simplify also vector binary operations with only one constant operand.
This fixes the problem restoring the old behavior of SimplifyVBinOp.
llvm-svn: 199328
MSVC on x64 requires that we create image relative symbol
references to refer to RTTI data. Seeing as how there is no way to
explicitly make reference to a given relocation type in LLVM IR, pattern
match expressions of the form &foo - &__ImageBase.
Differential Revision: http://llvm-reviews.chandlerc.com/D2523
llvm-svn: 199312
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.
Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:
define available_externally dllimport void @f() {}
@Var = dllexport global i32 1, align 4
Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.
llvm-svn: 199218
Sorry, I don't understand why the warning is generated (a gcc
bug?). Anyhow, the change should improve readablity. No functionality
change intended.
llvm-svn: 199214
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.
Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:
define available_externally dllimport void @f() {}
@Var = dllexport global i32 1, align 4
Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.
llvm-svn: 199204
When creating a virtual register for a def, the value type should be
used to pick the register class. If we only use the register class
constraint on the instruction, we might pick a too large register class.
Some registers can store values of different sizes. For example, the x86
xmm registers can hold f32, f64, and 128-bit vectors. The three
different value sizes are represented by register classes with identical
register sets: FR32, FR64, and VR128. These register classes have
different spill slot sizes, so it is important to use the right one.
The register class constraint on an instruction doesn't necessarily care
about the size of the value its defining. The value type determines
that.
This fixes a problem where InstrEmitter was picking 32-bit register
classes for 64-bit values on SPARC.
llvm-svn: 199187