Specifically:
- Calculate the loop pre-header once at the stat of HoistOutOfLoop, so:
- We don't-DFS walk the MachineDomTree if we aren't going to do anything
- Don't call getCurPreheader for each Scope
- Don't needlessly use a do-while loop
- Use early exit for Scopes.size() == 0
No functional changes intended.
llvm-svn: 228350
Add a command-line option to enable hoisting even cheap instructions (in
low-register-pressure situations). This is turned off by default, but has
proved useful for testing purposes.
llvm-svn: 225470
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.
This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...
llvm-svn: 222334
define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.
Other sub-trees will follow.
llvm-svn: 206837
operator* on the by-operand iterators to return a MachineOperand& rather than
a MachineInstr&. At this point they almost behave like normal iterators!
Again, this requires making some existing loops more verbose, but should pave
the way for the big range-based for-loop cleanups in the future.
llvm-svn: 203865
Post-RA LICM keeps three sets of registers: PhysRegDefs, PhysRegClobbers
and TermRegs. When it sees a definition of R it adds all aliases of R
to the corresponding set, so that when it needs to test for membership
it only needs to test a single register, rather than worrying about
aliases there too. E.g. the final candidate loop just has:
unsigned Def = Candidates[i].Def;
if (!PhysRegClobbers.test(Def) && ...) {
to test whether register Def is multiply defined.
However, there was also a shortcut in ProcessMI to make sure we didn't
add candidates if we already knew that they would fail the final test.
This shortcut was more pessimistic than the final one because it
checked whether _any alias_ of the defined register was multiply defined.
This is too conservative for targets that define register pairs.
E.g. on z, R0 and R1 are sometimes used as a pair, so there is a
128-bit register that aliases both R0 and R1. If a loop used
R0 and R1 independently, and the definition of R0 came first,
we would be able to hoist the R0 assignment (because that used
the final test quoted above) but not the R1 assignment (because
that meant we had two definitions of the paired R0/R1 register
and would fail the shortcut in ProcessMI).
This patch just uses the same check for the ProcessMI shortcut as
we use in the final candidate loop.
llvm-svn: 188774
This fixes some of the cycles between libCodeGen and libSelectionDAG. It's still
a complete mess but as long as the edges consist of virtual call it doesn't
cause breakage. BasicTTI did static calls and thus broke some build
configurations.
llvm-svn: 172246
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
No functional change intended.
Sorry for the churn. The iterator classes are supposed to help avoid
giant commits like this one in the future. The TableGen-produced
register lists are getting quite large, and it may be necessary to
change the table representation.
This makes it possible to do so without changing all clients (again).
llvm-svn: 157854
The getPointerRegClass() hook can return register classes that depend on
the calling convention of the current function (ptr_rc_tailcall).
So far, we have been able to infer the calling convention from the
subtarget alone, but as we add support for multiple calling conventions
per target, that no longer works.
Patch by Yiannis Tsiouris!
llvm-svn: 156328
Allow cheap instructions to be hoisted if they are register pressure
neutral or better. This happens if the instruction is the last loop use
of another virtual register.
Only expensive instructions are allowed to increase loop register
pressure.
llvm-svn: 154455
Hoisting a value that is used by a PHI in the loop will introduce a
copy because the live range is extended to cross the PHI.
The same applies to PHIs in exit blocks.
Also use this opportunity to make HasLoopPHIUse() non-recursive.
llvm-svn: 154454
This caused miscompilations on out-of-tree targets, and possibly i386 as
well.
I'll find some other way of hoisting %rip-relative loads from loops
containing calls.
llvm-svn: 150816
When using register masks, registers like %rip are clobbered by the
register mask. LICM should still be able to hoist instructions reading
%rip from a loop containing calls.
llvm-svn: 150288
Moving toward a uniform style of pass definition to allow easier target configuration.
Globally declare Pass ID.
Globally declare pass initializer.
Use INITIALIZE_PASS consistently.
Add a call to the initializer from CodeGen.cpp.
Remove redundant "createPass" functions and "getPassName" methods.
While cleaning up declarations, cleaned up comments (sorry for large diff).
llvm-svn: 150100
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.
For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.
llvm-svn: 146026
1. Added opcode BUNDLE
2. Taught MachineInstr class to deal with bundled MIs
3. Changed MachineBasicBlock iterator to skip over bundled MIs; added an iterator to walk all the MIs
4. Taught MachineBasicBlock methods about bundled MIs
llvm-svn: 145975
containing loop's header to see if that's a landing pad. If it is, then we don't
want to hoist instructions out of the loop and above the header.
llvm-svn: 141767
1. The speculation check may not have been performed if the BB hasn't had a load
LICM candidate.
2. If the candidate would be CSE'ed, then go ahead and speculatively LICM the
instruction even if it's in high register pressure situation.
llvm-svn: 141747
The blocks with invokes have branches to the dispatch block, because that more
correctly models the behavior of the CFG. The dispatch of course has edges to
the landing pads. Those landing pads could contain invokes, which then have
branches back to the dispatch. This creates a loop. The machine LICM pass looks
at this loop and thinks it can hoist elements out of it. But because the
dispatch is an alternate entry point into the program, the hoisted instructions
won't be executed.
I wasn't able to get a testcase which was small and could reproduce all of the
time. The function_try_block.cpp in llvm-test was where this showed up.
llvm-svn: 141726
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
TargetInstrInfo:
Change produceSameValue() to take MachineRegisterInfo as an optional argument.
When in SSA form, targets can use it to make more aggressive equality analysis.
Machine LICM:
1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead.
2. Fix a bug which prevent CSE of instructions which are not re-materializable.
3. Use improved form of produceSameValue.
ARM:
1. Teach ARM produceSameValue to look pass some PIC labels.
2. Look for operands from different loads of different constant pool entries
which have same values.
3. Re-implement PIC GA materialization using movw + movt. Combine the pair with
a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible
to re-materialize the instruction, allow machine LICM to hoist the set of
instructions out of the loop and make it possible to CSE them. It's a bit
hacky, but it significantly improve code quality.
4. Some minor bug fixes as well.
With the fixes, using movw + movt to materialize GAs significantly outperform the
load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap
and 176.gcc ~10%.
llvm-svn: 123905
- Initial register pressure in the loop should be all the live defs into the
loop. Not just those from loop preheader which is often empty.
- When an instruction is hoisted, update register pressure from loop preheader
to the original BB.
- Treat only use of a virtual register as kill since the code is still SSA.
llvm-svn: 116956
must be called in the pass's constructor. This function uses static dependency declarations to recursively initialize
the pass's dependencies.
Clients that only create passes through the createFooPass() APIs will require no changes. Clients that want to use the
CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
before parsing commandline arguments.
I have tested this with all standard configurations of clang and llvm-gcc on Darwin. It is possible that there are problems
with the static dependencies that will only be visible with non-standard options. If you encounter any crash in pass
registration/creation, please send the testcase to me directly.
llvm-svn: 116820
"long latency" enough to hoist even if it may increase spilling. Reloading
a value from spill slot is often cheaper than performing an expensive
computation in the loop. For X86, that means machine LICM will hoist
SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON
instructions.
- Enable register pressure aware machine LICM by default.
llvm-svn: 116781
perform initialization without static constructors AND without explicit initialization
by the client. For the moment, passes are required to initialize both their
(potential) dependencies and any passes they preserve. I hope to be able to relax
the latter requirement in the future.
llvm-svn: 116334
loop, for the reasons in the comments. This is a
major win on 253.perlbmk on ARM Darwin. I expect it
to be a good heuristic in general, but it's possible
some things will regress; I'll be watching.
7940152.
llvm-svn: 108792
register is not killed in the loop.
This fixes 188.ammp on ARM where the post-ra scheduler would grab a register
that looked available but wasn't.
A testcase would be huge and fragile, sorry.
llvm-svn: 101930
the live-in sets of BBs in the loop. Otherwise later pass may end up using the
registers and override the invariant. rdar://7852937
No reasonablly sized test case possible.
llvm-svn: 101626