that control, individually, all of the disparate things it was
controlling.
At the same time move a FIXME in the Hexagon port to a new
subtarget function that will enable a user of the machine
scheduler to avoid using the source scheduler for pre-RA-scheduling.
The FIXME would have this removed, but involves either testcase
changes or adding -pre-RA-sched=source to a few testcases.
llvm-svn: 231980
In theory this allows the compiler to skip materializing the array on
the stack. In practice clang often fails to do that, but that's a
different story. NFC.
llvm-svn: 231571
Level 1 should abort for all instructions but call/terminators/args.
Instead it was aborting only if the level was > 2
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230861
Summary:
Currently fast-isel-abort will only abort for regular instructions,
and just warn for function calls, terminators, function arguments.
There is already fast-isel-abort-args but nothing for calls and
terminators.
This change turns the fast-isel-abort options into an integer option,
so that multiple levels of strictness can be defined.
This will help no being surprised when the "abort" option indeed does
not abort, and enables the possibility to write test that verifies
that no intrinsics are forgotten by fast-isel.
Reviewers: resistor, echristo
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D7941
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230775
This change reverts the interesting parts of 226311 (and 227046). This change introduced two problems, and I've been convinced that an alternate approach is preferrable anyways.
The bugs were:
- Registery appears to require all users be within the same linkage unit. After this change, asking for "statepoint-example" in Transform/ would sometimes get you nullptr, whereas asking the same question in CodeGen would return the right GCStrategy. The correct long term fix is to get rid of the utter hack which is Registry, but I don't have time for that right now. 227046 appears to have been an attempt to fix this, but I don't believe it does so completely.
- GCMetadataPrinter::finishAssembly was being called more than once per GCStrategy. Each Strategy was being added to the GCModuleInfo multiple times.
Once I get time again, I'm going to split GCModuleInfo into the gc.root specific part and a GCStrategy owning Analysis pass. I'm probably also going to kill off the Registry. Once that's done, I'll move the new GCStrategyAnalysis and all built in GCStrategies into Analysis. (As original suggested by Chandler.) This will accomplish my original goal of being able to access GCStrategy from Transform/ without adding all of the builtin GCs to IR/.
llvm-svn: 227109
This mostly reverts commit r222062 and replaces it with a new enum. At
some point this enum will grow at least for other MSVC EH personalities.
Also beefs up the way we were sniffing the personality function.
Previously we would emit the Itanium LSDA despite using
__C_specific_handler.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D6987
llvm-svn: 226920
APIs and replace it and numerous booleans with an option struct.
The critical edge splitting API has a really large surface of flags and
so it seems worth burning a small option struct / builder. This struct
can be constructed with the various preserved analyses and then flags
can be flipped in a builder style.
The various users are now responsible for directly passing along their
analysis information. This should be enough for the critical edge
splitting to work cleanly with the new pass manager as well.
This API is still pretty crufty and could be cleaned up a lot, but I've
focused on this change just threading an option struct rather than
a pass through the API.
llvm-svn: 226456
Note: This change ended up being slightly more controversial than expected. Chandler has tentatively okayed this for the moment, but I may be revisiting this in the near future after we settle some high level questions.
Rather than have the GCStrategy object owned by the GCModuleInfo - which is an immutable analysis pass used mainly by gc.root - have it be owned by the LLVMContext. This simplifies the ownership logic (i.e. can you have two instances of the same strategy at once?), but more importantly, allows us to access the GCStrategy in the middle end optimizer. To this end, I add an accessor through Function which becomes the canonical way to get at a GCStrategy instance.
In the near future, this will allows me to move some of the checks from http://reviews.llvm.org/D6808 into the Verifier itself, and to introduce optimization legality predicates for some of the recent additions to InstCombine. (These will follow as separate changes.)
Differential Revision: http://reviews.llvm.org/D6811
llvm-svn: 226311
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.
Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.
llvm-svn: 226157
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.
This is in preparation for porting this analysis to the new pass
manager.
No functionality changed, and updates inbound for Clang and Polly.
llvm-svn: 226078
This fixes lots of generic CodeGen tests that use __gcc_personality_v0.
This suggests that using ExceptionHandling::MSVC was a mistake, and we
should instead classify each function by personality function. This
would, for example, allow us to LTO a binary containing uses of SEH and
Itanium EH.
llvm-svn: 226019
This option takes the name of the basic block you want to visualize
with -view-*-dags
Differential Revision: http://reviews.llvm.org/D6948
llvm-svn: 225953
This adds handling for ExceptionHandling::MSVC, used by the
x86_64-pc-windows-msvc triple. It assumes that filter functions have
already been outlined in either the frontend or the backend. Filter
functions are used in place of the landingpad catch clause type info
operands. In catch clause order, the first filter to return true will
catch the exception.
The C specific handler table expects the landing pad to be split into
one block per handler, but LLVM IR uses a single landing pad for all
possible unwind actions. This patch papers over the mismatch by
synthesizing single instruction BBs for every catch clause to fill in
the EH selector that the landing pad block expects.
Missing functionality:
- Accessing data in the parent frame from outlined filters
- Cleanups (from __finally) are unsupported, as they will require
outlining and parent frame access
- Filter clauses are unsupported, as there's no clear analogue in SEH
In other words, this is the minimal set of changes needed to write IR to
catch arbitrary exceptions and resume normal execution.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D6300
llvm-svn: 225904
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.
This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...
llvm-svn: 222334
thing we do inside selection dag. This code needs to be
migrated to queries on the function rather than global
data, but this organizes things before we start grabbing
the subtarget.
llvm-svn: 219271
In the X86 backend, matching an address is initiated by the 'addr' complex
pattern and its friends. During this process we may reassociate and-of-shift
into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
shift into the scale of the address.
However as demonstrated by the testcase, this can trigger CSE of not only the
shift and the AND which the code is prepared for but also the underlying load
node. In the testcase this node is sitting in the RecordedNode and MatchScope
data structures of the matcher and becomes a deleted node upon CSE. Returning
from the complex pattern function, we try to access it again hitting an assert
because the node is no longer a load even though this was checked before.
Now obviously changing the DAG this late is bending the rules but I think it
makes sense somewhat. Outside of addresses we prefer and-of-shift because it
may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
example because it create a non-canonical node). We currently don't recognize
addresses during DAGCombiner where arguably this canonicalization should be
performed. On the other hand, having this in the matcher allows us to cover
all the cases where an address can be used in an instruction.
I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible
for initiating the recursive CSE on users
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
is not strictly necessary since the shift is hooked into the visited user. Of
course it's safer to keep the DAG consistent at all times (e.g. for accurate
number of uses, etc.).
So rather than changing the fundamentals, I've decided to continue along the
previous patches and detect the CSE. This patch installs a very targeted
DAGUpdateListener for the duration of a complex-pattern match and updates the
matching state accordingly. (Previous patches used HandleSDNode to detect the
CSE but that's not practical here). The listener is only installed on X86.
I tested that there is no measurable overhead due to this while running
through the spec2k BC files with llc. The only thing we pay for is the
creation of the listener. The callback never ever triggers in spec2k since
this is a corner case.
Fixes rdar://problem/18206171
llvm-svn: 219009
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
llvm-svn: 218778
Summary: This patch introduces two new iterator ranges and updates existing code to use it. No functional change intended.
Test Plan: All tests (make check-all) still pass.
Reviewers: dblaikie
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4481
llvm-svn: 213474
string_ostream is a safe and efficient string builder that combines opaque
stack storage with a built-in ostream interface.
small_string_ostream<bytes> additionally permits an explicit stack storage size
other than the default 128 bytes to be provided. Beyond that, storage is
transferred to the heap.
This convenient class can be used in most places an
std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair
would previously have been used, in order to guarantee consistent access
without byte truncation.
The patch also converts much of LLVM to use the new facility. These changes
include several probable bug fixes for truncated output, a programming error
that's no longer possible with the new interface.
llvm-svn: 211749
We must validate the value type in TLI::getRegisterByName, because if we
don't and the wrong type was used with the IR intrinsic, then we'll assert
(because we won't be able to find a valid register class with which to
construct the requested copy operation). For PPC64, additionally, the type
information is necessary to decide between the 64-bit register and the 32-bit
subregister.
No functionality change.
llvm-svn: 208508
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
llvm-svn: 208104
define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.
Other sub-trees will follow.
llvm-svn: 206837
for use with C++11 range-based for-loops.
The gist of phase 1 is to remove the skipInstruction() and skipBundle()
methods from these iterators, instead splitting each iterator into a version
that walks operands, a version that walks instructions, and a version that
walks bundles. This has the result of making some "clever" loops in lib/CodeGen
more verbose, but also makes their iterator invalidation characteristics much
more obvious to the casual reader. (Making them concise again in the future is a
good motivating case for a pre-incrementing range adapter!)
Phase 2 of this undertaking with consist of removing the getOperand() method,
and changing operator*() of the operand-walker to return a MachineOperand&. At
that point, it should be possible to add range views for them that work as one
might expect.
llvm-svn: 203757
selection dag (PR19012)
In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo
to make sure that ESI isn't used as a base pointer register before we choose to
emit rep movs (which clobbers esi).
The problem is that MachineFrameInfo wouldn't know about dynamic allocas or
inline asm that clobbers the stack pointer until SelectionDAGBuilder has
encountered them.
This patch fixes the problem by checking for such things when building the
FunctionLoweringInfo.
Differential Revision: http://llvm-reviews.chandlerc.com/D2954
llvm-svn: 202930
Before this patch we used getIntImmCost from TargetTransformInfo to determine if
a load of a constant should be converted to just a constant, but the threshold
for this was set to an arbitrary value. This value works well for the two
targets (X86 and ARM) that implement this target-hook, but it isn't
target-independent at all.
Now targets have the possibility to decide directly if this optimization should
be performed. The default value is set to false to preserve the current
behavior. The target hook has been moved to TargetLowering, which removed the
last use and need of TargetTransformInfo in SelectionDAG.
llvm-svn: 200271
This is a very confusing option for a feature that will go away.
-enable-misched is exposed instead to help triage issues with the new
scheduler.
llvm-svn: 199133
This re-lands commit r196876, which was reverted in r196879.
The tests have been fixed to pass on platforms with a stack alignment
larger than 4.
Update to clang side tests will land shortly.
llvm-svn: 196939
For stack frames requiring realignment, three pointers may be needed:
- ebp to address incoming arguments
- esi (could be any callee-saved register) to address locals
- esp to address outgoing arguments
We would use esi unconditionally without verifying that it did not
conflict with inline assembly.
This change doesn't do the verification, it simply emits a fatal error
on functions that use stack realignment, dynamic SP adjustments, and
inline assembly.
Because stack realignment is common on Windows, we also no longer assume
that MS inline assembly clobbers esp. Instead, we analyze the inline
instructions for implicit definitions and check if esp is there. If so,
we require the use of a base pointer and consider it in the condition
above.
Mostly fixes PR16830, but we could try harder to find a non-conflicting
base pointer.
Reviewers: sunfish
Differential Revision: http://llvm-reviews.chandlerc.com/D1317
llvm-svn: 196876
Summary:
Moved the requirement for SelectionDAG::getConstant() to return legally
typed nodes slightly earlier. There were two optional DAGCombine passes
that were missed out and were required to produce type-legal DAGs.
Simplified a code-path in tryFoldToZero() to use SelectionDAG::getConstant().
This provides support for both promoted and expanded vector types whereas the
previous code only supported promoted vector types.
Fixes a "Type for zero vector elements is not legal" assertion detected by
an llvm-stress generated test.
Reviewers: resistor
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2251
llvm-svn: 195635
Improvements over r195317:
- Set/restore EnableFastISel flag instead of just running FastISel within
SelectAllBasicBlocks; the flag is checked in various places, and
FastISel won't run properly if those places don't do the right thing.
- Test looks for normal ISel versus FastISel behavior, and not
something more subtle that doesn't work everywhere.
Based on work by Andrea Di Biagio.
llvm-svn: 195491
It broke, at least, i686 target. It is reproducible with "llc -mtriple=i686-unknown".
FYI, it didn't appear to add either "-O0" or "-fast-isel".
llvm-svn: 195339
Summary:
When getConstant() is called for an expanded vector type, it is split into
multiple scalar constants which are then combined using appropriate build_vector
and bitcast operations.
In addition to the usual big/little endian differences, the case where the
element-order of the vector does not have the same endianness as the elements
themselves is also accounted for. For example, for v4i32 on big-endian MIPS,
the byte-order of the vector is <3210,7654,BA98,FEDC>. For little-endian, it is
<0123,4567,89AB,CDEF>.
Handling this case turns out to be a nop since getConstant() returns a splatted
vector (so reversing the element order doesn't change the value)
This fixes a number of cases in MIPS MSA where calling getConstant() during
operation legalization introduces illegal types (e.g. to legalize v2i64 UNDEF
into a v2i64 BUILD_VECTOR of illegal i64 zeros). It should also handle bigger
differences between illegal and legal types such as legalizing v2i64 into v8i16.
lowerMSASplatImm() in the MIPS backend no longer needs to avoid calling
getConstant() so this function has been updated in the same patch.
For the sake of transparency, the steps I've taken since the review are:
* Added 'virtual' to isVectorEltOrderLittleEndian() as requested. This revealed
that the MIPS tests were falsely passing because a polymorphic function was
not actually polymorphic in the reviewed patch.
* Fixed the tests that were now failing. This involved deleting the code to
handle the MIPS MSA element-order (which was previously doing an byte-order
swap instead of an element-order swap). This left
isVectorEltOrderLittleEndian() unused and it was deleted.
* Fixed build failures caused by rebasing beyond r194467-r194472. These build
failures involved the bset, bneg, and bclr instructions added in these commits
using lowerMSASplatImm() in a way that was no longer valid after this patch.
Some of these were fixed by calling SelectionDAG::getConstant() instead,
others were fixed by a new function getBuildVectorSplat() that provided the
removed functionality of lowerMSASplatImm() in a more sensible way.
Reviewers: bkramer
Reviewed By: bkramer
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1973
llvm-svn: 194811
Sometimes a copy from a vreg -> vreg sneaks into the middle of a terminator
sequence. It is safe to slice this into the stack protector success bb.
This fixes PR16979.
llvm-svn: 191260
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.
Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.
This should fix PR15840.
llvm-svn: 191165
Previously, generation of stack protectors was done exclusively in the
pre-SelectionDAG Codegen LLVM IR Pass "Stack Protector". This necessitated
splitting basic blocks at the IR level to create the success/failure basic
blocks in the tail of the basic block in question. As a result of this,
calls that would have qualified for the sibling call optimization were no
longer eligible for optimization since said calls were no longer right in
the "tail position" (i.e. the immediate predecessor of a ReturnInst
instruction).
Then it was noticed that since the sibling call optimization causes the
callee to reuse the caller's stack, if we could delay the generation of
the stack protector check until later in CodeGen after the sibling call
decision was made, we get both the tail call optimization and the stack
protector check!
A few goals in solving this problem were:
1. Preserve the architecture independence of stack protector generation.
2. Preserve the normal IR level stack protector check for platforms like
OpenBSD for which we support platform specific stack protector
generation.
The main problem that guided the present solution is that one can not
solve this problem in an architecture independent manner at the IR level
only. This is because:
1. The decision on whether or not to perform a sibling call on certain
platforms (for instance i386) requires lower level information
related to available registers that can not be known at the IR level.
2. Even if the previous point were not true, the decision on whether to
perform a tail call is done in LowerCallTo in SelectionDAG which
occurs after the Stack Protector Pass. As a result, one would need to
put the relevant callinst into the stack protector check success
basic block (where the return inst is placed) and then move it back
later at SelectionDAG/MI time before the stack protector check if the
tail call optimization failed. The MI level option was nixed
immediately since it would require platform specific pattern
matching. The SelectionDAG level option was nixed because
SelectionDAG only processes one IR level basic block at a time
implying one could not create a DAG Combine to move the callinst.
To get around this problem a few things were realized:
1. While one can not handle multiple IR level basic blocks at the
SelectionDAG Level, one can generate multiple machine basic blocks
for one IR level basic block. This is how we handle bit tests and
switches.
2. At the MI level, tail calls are represented via a special return
MIInst called "tcreturn". Thus if we know the basic block in which we
wish to insert the stack protector check, we get the correct behavior
by always inserting the stack protector check right before the return
statement. This is a "magical transformation" since no matter where
the stack protector check intrinsic is, we always insert the stack
protector check code at the end of the BB.
Given the aforementioned constraints, the following solution was devised:
1. On platforms that do not support SelectionDAG stack protector check
generation, allow for the normal IR level stack protector check
generation to continue.
2. On platforms that do support SelectionDAG stack protector check
generation:
a. Use the IR level stack protector pass to decide if a stack
protector is required/which BB we insert the stack protector check
in by reusing the logic already therein. If we wish to generate a
stack protector check in a basic block, we place a special IR
intrinsic called llvm.stackprotectorcheck right before the BB's
returninst or if there is a callinst that could potentially be
sibling call optimized, before the call inst.
b. Then when a BB with said intrinsic is processed, we codegen the BB
normally via SelectBasicBlock. In said process, when we visit the
stack protector check, we do not actually emit anything into the
BB. Instead, we just initialize the stack protector descriptor
class (which involves stashing information/creating the success
mbbb and the failure mbb if we have not created one for this
function yet) and export the guard variable that we are going to
compare.
c. After we finish selecting the basic block, in FinishBasicBlock if
the StackProtectorDescriptor attached to the SelectionDAGBuilder is
initialized, we first find a splice point in the parent basic block
before the terminator and then splice the terminator of said basic
block into the success basic block. Then we code-gen a new tail for
the parent basic block consisting of the two loads, the comparison,
and finally two branches to the success/failure basic blocks. We
conclude by code-gening the failure basic block if we have not
code-gened it already (all stack protector checks we generate in
the same function, use the same failure basic block).
llvm-svn: 188755
Teach the generic instruction selection helper functions to constrain
the register classes of their input operands. For non-physical register
references, the generic code needs to be careful not to mess that up
when replacing references to result registers. As the comment indicates
for MachineRegisterInfo::replaceRegWith(), it's important to call
constrainRegClass() first.
rdar://12594152
llvm-svn: 188593
Adds unit tests for it too.
Split BasicBlockUtils into an analysis-half and a transforms-half, and put the
analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable
into llvm::isPotentiallyReachable and move it into Analysis/CFG.
llvm-svn: 187283
Change the informal convention of DBG_VALUE machine instructions so that
we can express a register-indirect address with an offset of 0.
The old convention was that a DBG_VALUE is a register-indirect value if
the offset (operand 1) is nonzero. The new convention is that a DBG_VALUE
is register-indirect if the first operand is a register and the second
operand is an immediate. For plain register values the combination reg,
reg is used. MachineInstrBuilder::BuildMI knows how to build the new
DBG_VALUES.
rdar://problem/13658587
llvm-svn: 185966
Stop using the ISD::EXCEPTIONADDR and ISD::EHSELECTION when lowering
landing pad arguments. These nodes were previously legalized into
CopyFromReg nodes, but that never worked properly because the
CopyFromReg node weren't guaranteed to be scheduled at the top of the
basic block.
This meant the exception pointer and selector registers could be
clobbered before being copied to a virtual register.
This patch copies the two physical registers to virtual registers at
the beginning of the basic block, and lowers the landingpad instruction
directly to two CopyFromReg nodes reading the *virtual* registers. This
is safe because virtual registers don't get clobbered.
A future patch will remove the ISD::EXCEPTIONADDR and ISD::EHSELECTION
nodes.
llvm-svn: 185617
Stop using the ISD::EXCEPTIONADDR and ISD::EHSELECTION when lowering
landing pad arguments. These nodes were previously legalized into
CopyFromReg nodes, but that never worked properly because the
CopyFromReg node weren't guaranteed to be scheduled at the top of the
basic block.
This meant the exception pointer and selector registers could be
clobbered before being copied to a virtual register.
This patch copies the two physical registers to virtual registers at
the beginning of the basic block, and lowers the landingpad instruction
directly to two CopyFromReg nodes reading the *virtual* registers. This
is safe because virtual registers don't get clobbered.
A future patch will remove the ISD::EXCEPTIONADDR and ISD::EHSELECTION
nodes.
llvm-svn: 185595
Rather than using the full power of target-specific addressing modes in
DBG_VALUEs with Frame Indicies, simply use Frame Index + Offset. This
reduces the complexity of debug info handling down to two
representations of values (reg+offset and frame index+offset) rather
than three or four.
Ideally we could ensure that frame indicies had been eliminated by the
time we reached an assembly or dwarf generation, but I haven't spent the
time to figure out where the FIs are leaking through into that & whether
there's a good place to convert them. Some FI+offset=>reg+offset
conversion is done (see PrologEpilogInserter, for example) which is
necessary for some SelectionDAG assumptions about registers, I believe,
but it might be possible to make this a more thorough conversion &
ensure there are no remaining FIs no matter how instruction selection
is performed.
llvm-svn: 184066
The TargetLoweringInfo object is owned by the TargetMachine. In the future, the
TargetMachine object may change, which may also change the TargetLoweringInfo
object.
llvm-svn: 183356
register-indirect address with an offset of 0.
It used to be that a DBG_VALUE is a register-indirect value if the offset
(operand 1) is nonzero. The new convention is that a DBG_VALUE is
register-indirect if the first operand is a register and the second
operand is an immediate. For plain registers use the combination reg, reg.
rdar://problem/13658587
llvm-svn: 180816
A node's ordering is only propagated during legalization if (a) the new node does
not have an ordering (is not a CSE'd node), or (b) the new node has an ordering
that is higher than the node being legalized.
llvm-svn: 177465
This doesn't reset all of the target options within the TargetOptions
object. This is because some of those are ABI-specific and must be determined if
it's okay to change those on the fly.
llvm-svn: 176986
Summary:
Statistics are still available in Release+Asserts (any +Asserts builds),
and stats can also be turned on with LLVM_ENABLE_STATS.
Move some of the FastISel stats that were moved under DEBUG()
back out of DEBUG(), since stats are disabled across the board now.
Many tests depend on grepping "-stats" output. Move those into
a orig_dir/Stats/. so that they can be marked as unsupported
when building without statistics.
Differential Revision: http://llvm-reviews.chandlerc.com/D486
llvm-svn: 176733
SelectionDAGIsel::LowerArguments needs a function, not a basic block. So it
makes sense to pass it the function instead of extracting a basic-block from
the function and then tossing it. This is also more self-documenting (functions
have arguments, BBs don't).
In addition, added comments to a couple of Select* methods.
llvm-svn: 176305
fewer scalar integer (i32 or i64) arguments. It completely eliminates the need
for SDISel for trivial functions.
Also, add the new llc -fast-isel-abort-args option, which is similar to
-fast-isel-abort option, but for formal argument lowering.
llvm-svn: 176052
If the frame pointer is omitted, and any stack changes occur in the inline
assembly, e.g.: "pusha", then any C local variable or C argument references
will be incorrect.
I pass no judgement on anyone who would do such a thing. ;)
rdar://13218191
llvm-svn: 175334
If two functions require different features (e.g., `-mno-sse' vs. `-msse') then
we want to honor that, especially during LTO. We can do that by resetting the
subtarget's features depending upon the 'target-feature' attribute.
llvm-svn: 175314
function is successfully handled by fast-isel. That's because function
arguments are *always* handled by SDISel. Introduce FastLowerArguments to
allow each target to provide hook to handle formal argument lowering.
As a proof-of-concept, add ARMFastIsel::FastLowerArguments to handle
functions with 4 or fewer scalar integer (i8, i16, or i32) arguments. It
completely eliminates the need for SDISel for trivial functions.
rdar://13163905
llvm-svn: 174855
pass into the SelectionDAG itself rather than snooping on the
implementation of that pass as exposed by the TargetMachine. This
removes the last direct client of the ScalarTargetTransformInfo class
outside of the TTI pass implementation.
llvm-svn: 171625
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
This shouldn't affect codegen for -O0 compiles as tail call markers are not
emitted in unoptimized compiles. Testing with the external/internal nightly
test suite reveals no change in compile time performance. Testing with -O1,
-O2 and -O3 with fast-isel enabled did not cause any compile-time or
execution-time failures. All tests were performed on my x86 machine.
I'll monitor our arm testers to ensure no regressions occur there.
In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue
and objc_retainAutoreleaseReturnValue as tail calls unconditionally. While
it's theoretically true that this is just an optimization, it's an
optimization that we very much want to happen even at -O0, or else ARC
applications become substantially harder to debug.
Part of rdar://12553082
llvm-svn: 169796
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
This allows me to begin enabling (or backing out) misched by default
for one subtarget at a time. To run misched we typically want to:
- Disable SelectionDAG scheduling (use the source order scheduler)
- Enable more aggressive coalescing (until we decide to always run the coalescer this way)
- Enable MachineScheduler pass itself.
Disabling PostRA sched may follow for some subtargets.
llvm-svn: 167826
Also provide an MRI::getReservedRegs() function to access the frozen
register set, and isReserved() and isAllocatable() methods to test
individual registers.
The various implementations of TRI::getReservedRegs() are quite
complicated, and many passes need to look at the reserved register set.
This patch makes it possible for these passes to use the cached copy in
MRI, avoiding a lot of malloc traffic and repeated calculations.
llvm-svn: 165982
the case of multiple edges from one block to another.
A simple example is a switch statement with multiple values to the same
destination. The definition of an edge is modified from a pair of blocks to
a pair of PredBlock and an index into the successors.
Also set the weight correctly when building SelectionDAG from LLVM IR,
especially when converting a Switch.
IntegersSubsetMapping is updated to calculate the weight for each cluster.
llvm-svn: 162572
Fast isel doesn't currently have support for translating builtin function
calls to target instructions. For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization. Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel. <rdar://problem/12008746>
llvm-svn: 161232
When dumping the DAG for a fatal 'Cannot select' back-end error, also
provide the name of the function the construct is in. Useful when dealing
with large testcases, as the next step is to llvm-extract the function
in question to get a small(er) testcase.
llvm-svn: 160152
include/llvm/Analysis/DebugInfo.h to include/llvm/DebugInfo.h.
The reasoning is because the DebugInfo module is simply an interface to the
debug info MDNodes and has nothing to do with analysis.
llvm-svn: 159312
The X86 target is editing the selection DAG while isel is selecting
nodes following a topological ordering. When the DAG hacking triggers
CSE, nodes can be deleted and bad things happen.
llvm-svn: 155257
Now that multiple DAGUpdateListeners can be active at the same time,
ISelPosition can become a local variable in DoInstructionSelection.
We simply register an ISelUpdater with CurDAG while ISelPosition exists.
llvm-svn: 155249
Instead of passing listener pointers to RAUW, let SelectionDAG itself
keep a linked list of interested listeners.
This makes it possible to have multiple listeners active at once, like
RAUWUpdateListener was already doing. It also makes it possible to
register listeners up the call stack without controlling all RAUW calls
below.
DAGUpdateListener uses an RAII pattern to add itself to the SelectionDAG
list of active listeners.
llvm-svn: 155248
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
llvm-svn: 154011
ScheduleDAG is responsible for the DAG: SUnits and SDeps. It provides target hooks for latency computation.
ScheduleDAGInstrs extends ScheduleDAG and defines the current scheduling region in terms of MachineInstr iterators. It has access to the target's scheduling itinerary data. ScheduleDAGInstrs provides the logic for building the ScheduleDAG for the sequence of MachineInstrs in the current region. Target's can implement highly custom schedulers by extending this class.
ScheduleDAGPostRATDList provides the driver and diagnostics for current postRA scheduling. It maintains a current Sequence of scheduled machine instructions and logic for splicing them into the block. During scheduling, it uses the ScheduleHazardRecognizer provided by the target.
Specific changes:
- Removed driver code from ScheduleDAG. clearDAG is the only interface needed.
- Added enterRegion/exitRegion hooks to ScheduleDAGInstrs to delimit the scope of each scheduling region and associated DAG. They should be used to setup and cleanup any region-specific state in addition to the DAG itself. This is necessary because we reuse the same ScheduleDAG object for the entire function. The target may extend these hooks to do things at regions boundaries, like bundle terminators. The hooks are called even if we decide not to schedule the region. So all instructions in a block are "covered" by these calls.
- Added ScheduleDAGInstrs::begin()/end() public API.
- Moved Sequence into the driver layer, which is specific to the scheduling algorithm.
llvm-svn: 152208
This new scheduler plugs into the existing selection DAG scheduling framework. It is a top-down critical path scheduler that tracks register pressure and uses a DFA for pipeline modeling.
Patch by Sergei Larin!
llvm-svn: 149547
This SelectionDAG node will be attached to call nodes by LowerCall(),
and eventually becomes a MO_RegisterMask MachineOperand on the
MachineInstr representing the call instruction.
LowerCall() will attach a register mask that depends on the calling
convention.
llvm-svn: 148436
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.
For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.
llvm-svn: 146026
1. Added opcode BUNDLE
2. Taught MachineInstr class to deal with bundled MIs
3. Changed MachineBasicBlock iterator to skip over bundled MIs; added an iterator to walk all the MIs
4. Taught MachineBasicBlock methods about bundled MIs
llvm-svn: 145975
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.
One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.
llvm-svn: 145714
for a single miss and not all predecessor instructions that get selected by
the selection DAG instruction selector. This is still not exact (e.g., over
states misses when folded/dead instructions are present), but it is a step in
the right direction.
llvm-svn: 144832
Some code want to check that *any* call within a function has the 'returns
twice' attribute, not just that the current function has one.
llvm-svn: 142221
This is still a hack until we can teach tblgen to generate the
optional CPSR operand rather than an implicit CPSR def. But the
strangeness is now limited to the selection DAG. ADD/SUB MI's no
longer have implicit CPSR defs, nor do we allow flag setting variants
of these opcodes in machine code. There are several corner cases to
consider, and getting one wrong would previously lead to nasty
miscompilation. It's not the first time I've debugged one, so this
time I added enough verification to ensure it won't happen again.
llvm-svn: 140228
No functionality change. The hook makes it explicit which patterns
require "special" handling. i.e. it self-documents tblgen
deficiencies. I plan to add verification in ExpandISelPseudos and
Thumb2SizeReduce to catch any missing hasPostISelHooks. Otherwise it's
too fragile.
llvm-svn: 140160
Modified ARMISelLowering::AdjustInstrPostInstrSelection to handle the
full gamut of CPSR defs/uses including instructins whose "optional"
cc_out operand is not really optional. This allowed removal of the
hasPostISelHook to simplify the .td files and make the implementation
more robust.
Fixes rdar://10137436: sqlite3 miscompile
llvm-svn: 140134
Add a instruction flag: hasPostISelHook which tells the pre-RA scheduler to
call a target hook to adjust the instruction. For ARM, this is used to
adjust instructions which may be setting the 's' flag. ADC, SBC, RSB, and RSC
instructions have implicit def of CPSR (required since it now uses CPSR physical
register dependency rather than "glue"). If the carry flag is used, then the
target hook will *fill in* the optional operand with CPSR. Otherwise, the hook
will remove the CPSR implicit def from the MachineInstr.
llvm-svn: 138810
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
BranchProbabilityInfo (expect setEdgeWeight which is not available here).
Branch Weights are kept in MachineBasicBlocks. To turn off this analysis
set -use-mbpi=false.
llvm-svn: 133184
fix bugs exposed by the gcc dejagnu testsuite:
1. The load may actually be used by a dead instruction, which
would cause an assert.
2. The load may not be used by the current chain of instructions,
and we could move it past a side-effecting instruction. Change
how we process uses to define the problem away.
llvm-svn: 130018
less trivial things) into a dummy lea. Before we generated:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
leaq (%rax), %rax
ret
now we produce:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
ret
This is part of rdar://9289558
llvm-svn: 129662
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo. Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:
cmpb $0, 52(%rax)
je LBB4_2
instead of:
movb 52(%rax), %cl
cmpb $0, %cl
je LBB4_2
This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and
generating less code.
This was one of the biggest classes of missing load folding. Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.
llvm-svn: 129656
I'm tired of doing this manually for each checkout.
If anyone knows a better way debug isel for non-trivial tests feel
free to revert and let me know how to do it.
llvm-svn: 128132