Original commit message:
Allow up to 64 functional units per processor itinerary.
This patch changes the type used to hold the FU bitset from unsigned to uint64_t.
This will be needed for some upcoming PowerPC itineraries.
llvm-svn: 159027
With regunit liveness permanently enabled, this function would always
return true.
Also remove now obsolete code for checking physreg interference.
llvm-svn: 159006
This fixes PR5997.
These transforms were disabled because codegen couldn't deal with other
uses of trunc(x). This is now handled by the peephole pass.
This causes no regressions on x86-64.
llvm-svn: 159003
Original message:
Performance optimizations:
- SwitchInst: case values stored separately from Operands List. It allows to make faster access to individual case value numbers or ranges.
- Optimized IntItem, added APInt value caching.
- Optimized IntegersSubsetGeneric: added optimizations for cases when subset is single number or when subset consists from single numbers only.
llvm-svn: 158997
fail. Original commit message:
Performance optimizations:
- SwitchInst: case values stored separately from Operands List. It allows to make faster access to individual case value numbers or ranges.
- Optimized IntItem, added APInt value caching.
- Optimized IntegersSubsetGeneric: added optimizations for cases when subset is single number or when subset consists from single numbers only.
On my machine these optimizations gave about 4-6% of compile-time improvement.
llvm-svn: 158986
- SwitchInst: case values stored separately from Operands List. It allows to make faster access to individual case value numbers or ranges.
- Optimized IntItem, added APInt value caching.
- Optimized IntegersSubsetGeneric: added optimizations for cases when subset is single number or when subset consists from single numbers only.
On my machine these optimizations gave about 4-6% of compile-time improvement.
llvm-svn: 158979
This makes it explicit when ScoreboardHazardRecognizer will be used.
"GenericItineraries" would only make sense if it contained real
itinerary values and still required ScoreboardHazardRecognizer.
llvm-svn: 158963
The code in X86TargetLowering::LowerEH_RETURN() assumes that a frame
pointer exists, but the frame pointer was forced by the presence of
llvm.eh.unwind.init which isn't guaranteed.
If llvm.eh.unwind.init is actually required in functions calling
eh.return (is it?), we should diagnose that instead of emitting bad
machine code.
This should fix the dragonegg-x86_64-linux-gcc-4.6-test bot.
llvm-svn: 158961
This is a minor drive-by fix with no robust way to unit test.
As an example see neon-div.ll:
SU(16): %Q8<def> = VMOVLsv4i32 %D17, pred:14, pred:%noreg, %Q8<imp-use,kill>
val SU(1): Latency=2 Reg=%Q8
...should be latency=1
llvm-svn: 158960
Minor drive by fix to cleanup latency computation. Calling
getOperandLatency with a deliberately incorrect operand index does not
give you the latency you want.
llvm-svn: 158959
boolean flag to an enum: { Fast, Standard, Strict } (default = Standard).
This option controls the creation by optimizations of fused FP ops that store
intermediate results in higher precision than IEEE allows (E.g. FMAs). The
behavior of this option is intended to match the behaviour specified by a
soon-to-be-introduced frontend flag: '-ffuse-fp-ops'.
Fast mode - allows formation of fused FP ops whenever they're profitable.
Standard mode - allow fusion only for 'blessed' FP ops. At present the only
blessed op is the fmuladd intrinsic. In the future more blessed ops may be
added.
Strict mode - allow fusion only if/when it can be proven that the excess
precision won't effect the result.
Note: This option only controls formation of fused ops by the optimizers. Fused
operations that are explicitly requested (e.g. FMA via the llvm.fma.* intrinsic)
will always be honored, regardless of the value of this option.
Internally TargetOptions::AllowExcessFPPrecision has been replaced by
TargetOptions::AllowFPOpFusion.
llvm-svn: 158956
to be generic across architectures. It has the
following description in the gnu sources:
Negate the immediate constant
Several Architectures such as x86 have local implementations
of operand modifier 'n' which go beyond the above description
slightly. This won't affect them.
Affected files:
lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
Added 'n' to the switch cases.
test/CodeGen/Generic/asm-large-immediate.ll
Generic compiled test (x86 for me)
test/CodeGen/Mips/asm-large-immediate.ll
Mips compiled version of the generic one
Contributer: Jack Carter
llvm-svn: 158939
to be generic across architectures. It has the
following description in the gnu sources:
Substitute immediate value without immediate syntax
Several Architectures such as x86 have local implementations
of operand modifier 'c' which go beyond the above description
slightly. To make use of the generic modifiers without overriding
local implementation one can make a call to the base class method
for AsmPrinter::PrintAsmOperand() in the locally derived method's
"default" case in the switch statement. That way if it is already
defined locally the generic version will never get called.
This change is needed when test/CodeGen/generic/asm-large-immediate.ll
failed on a native Mips board. The test was assuming a generic
implementation was in place.
Affected files:
lib/Target/Mips/MipsAsmPrinter.cpp:
Changed the default case to call the base method.
lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
Added 'c' to the switch cases.
test/CodeGen/Mips/asm-large-immediate.ll
Mips compiled version of the generic one
Contributer: Jack Carter
llvm-svn: 158925
- provide more extensive set of functions to detect library allocation functions (e.g., malloc, calloc, strdup, etc)
- provide an API to compute the size and offset of an object pointed by
Move a few clients (GVN, AA, instcombine, ...) to the new API.
This implementation is a lot more aggressive than each of the custom implementations being replaced.
Patch reviewed by Nick Lewycky and Chandler Carruth, thanks.
llvm-svn: 158919
Makefiles, the CMake files in every other part of the LLVM tree, and
sanity.
This should also restore the output tree structure of all the unit
tests, sorry for breaking that, and thanks for letting me know.
The fundamental change is to put a CMakeLists.txt file in the unittest
directory, with a single test binary produced from it. This has several
advantages:
- No more weird directory stripping in the unittest macro, allowing it
to be used more readily in other projects.
- No more directory prefixes on all the source files.
- Allows correct and precise use of LLVM's per-directory dependency
system.
- Allows use of the checking logic for source files that have not been
added to the CMake build. This uncovered a file being skipped with
CMake in LLVM and one in Clang's unit tests.
- Makes Specifying conditional compilation or other custom logic for JIT
tests easier.
It did require adding the concept of an explicit 'optional' source file
to the CMake build so that the missing-file check can skip cases where
the file is *supposed* to be missing. =]
This is another chunk of refactoring the CMake build in order to make it
usable for other clients like CompilerRT / ASan / TSan.
Note that this is interdependent with a Clang CMake change.
llvm-svn: 158909
_umodsi3 libcalls if they have the same arguments. This optimization
was apparently broken if one of the node was replaced in place.
rdar://11714607
llvm-svn: 158900
facilities.
This was only used in one place in LLVM, and was used pervasively (but
with different code!) in Clang. It has no advantages over the standard
CMake facilities and in some cases disadvantages.
llvm-svn: 158889
Live intervals for regunits and virtual registers are stored separately,
and physreg live intervals are going away.
To visit the live ranges of all virtual registers, use this pattern
instead:
for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
unsigned Reg = TargetRegisterInfo::index2VirtReg(i);
if (MRI->reg_nodbg_empty(Reg))
continue;
llvm-svn: 158879
Stop depending on the LiveIntervalUnions in RegAllocBase, they are about
to be removed.
The changes are mostly replacing register alias iterators with regunit
iterators, and querying LiveRegMatrix instrad of RegAllocBase.
InterferenceCache is converted to work with per-regunit
LiveIntervalUnions, and it checks fixed regunit interference separately,
using the fixed live intervals provided by LiveIntervalAnalysis.
The local splitting helper calcGapWeights() is also considering fixed
regunit interference which is kept on the side now.
llvm-svn: 158867
That is a DenseMap iterator keyed by pointers, so the iteration order is
nondeterministic.
I would like to replace the DenseMap with an IndexedMap which doesn't
allow iteration.
llvm-svn: 158856
that are generated by TableGen and are already available in
MipsGenRegisterInfo.inc. Suggested by Jakob Stoklund Olesen.
Also, fix bug in function DecodeAFGR64RegisterClass.
Patch by Vladimir Medic.
llvm-svn: 158846
This is supported by gcc and clang, but guarded by a macro for MSVC 2008.
The extern template declaration is not necessary but generally good
form. It can avoid extra instantiations of the template methods
defined inline.
The EXTERN_TEMPLATE_INSTANTIATION macro could probably be generalized to
handle multiple template parameters if someone thinks it's worthwhile.
llvm-svn: 158840
Regunit live ranges are computed on demand, so when mi-sched calls
handleMove, some regunits may not have live ranges yet.
That makes updating them easier: Just skip the non-existing ranges. They
will be computed correctly from the rescheduled machine code when they
are needed.
llvm-svn: 158831
There is a pretty staggering amount of this in LLVM's header files, this
is not all of the instances I'm afraid. These include all of the
functions that (in my build) are used by a non-static inline (or
external) function. Specifically, these issues were caught by the new
'-Winternal-linkage-in-inline' warning.
I'll try to just clean up the remainder of the clearly redundant "static
inline" cases on functions (not methods!) defined within headers if
I can do so in a reliable way.
There were even several cases of a missing 'inline' altogether, or my
personal favorite "static bool inline". Go figure. ;]
llvm-svn: 158800
I'll admit I'm not entirely satisfied with this change, but it seemed
the cleanest option. Other suggestions quite welcome
The issue is that the traits specializations have static methods which
return the typedef'ed PHI_iterator type. In both the IR and MI layers
this is typedef'ed to a custom iterator class defined in an anonymous
namespace giving the types and the functions returning them internal
linkage. However, because the traits specialization is defined in the
'llvm' namespace (where it has to be, specialized template lives there),
and is in turn used in the templated implementation of the SSAUpdater.
This led to the linkage conflict that Clang now warns about.
The simplest solution to me was just to define the PHI_iterator as
a nested class inside the trait specialization. That way it still
doesn't get scoped widely, it can't be accidentally reused somewhere,
etc. This is a little gross just because nested class definitions are
a little gross, but the alternatives seem more ad-hoc.
llvm-svn: 158799
The TEST_F macros actually declare *subclasses* of the test fixtures.
Even if they didn't we don't want them to declare external functions.
The entire unit test, including both the fixture class and the fixture
test cases should be wrapped in the anonymous namespace.
This issue was caught by the new '-Winternal-linkage-in-inline' warning.
llvm-svn: 158798
-stable-loops enables a new algorithm for generating the Loop
forest. It differs from the original algorithm in a few respects:
- Not determined by use-list order.
- Initially guarantees RPO order of block and subloops.
- Linear in the number of CFG edges.
- Nonrecursive.
I didn't want to change the LoopInfo API yet, so the block lists are
still inclusive. This seems strange to me, and it means that building
LoopInfo is not strictly linear, but it may not be a problem in
practice. At least the block lists start out in RPO order now. In the
future we may add an attribute or wrapper analysis that allows other
passes to assume RPO order.
The primary motivation of this work was not to optimize LoopInfo, but
to allow reproducing performance issues by decomposing the compilation
stages. I'm often unable to do this with the current LoopInfo, because
the loop tree order determines Loop pass order. Serializing the IR
tends to invert the order, which reverses the optimization order. This
makes it nearly impossible to debug interdependent loop optimizations
such as LSR.
I also believe this will provide more stable performance results across time.
llvm-svn: 158790
The implementation only needs inclusion from LoopInfo.cpp and
MachineLoopInfo.cpp. Clients of the interface should only include the
interface. This makes the interface readable and speeds up rebuilds
after modifying the implementation.
llvm-svn: 158787
llvm::RawMemoryObject handles empty ranges just fine, and the assert can
be triggered in the wild by e.g. invoking clang with a file that
included an empty pre-compiled header file when clang has been built
with assertions enabled. Without assertions enabled, clang will properly
report that the empty file is not a valid PCH.
llvm-svn: 158769
When LiveIntervals is tracking fixed interference in regunits, make sure
to update those intervals as well. Currently guarded by -live-regunits.
llvm-svn: 158766
ensureAlignment() in MachineFunction). Also, drop setMaxAlignment() in
favor of this new function. This creates a main entry point to setting
MaxAlignment, which will be helpful for future work. No functionality
change intended.
llvm-svn: 158758
This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or
FSUB + FMUL. The combines are performed when:
(a) Either
AllowExcessFPPrecision option (-enable-excess-fp-precision for llc)
OR
UnsafeFPMath option (-enable-unsafe-fp-math)
are set, and
(b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of
the FADD/FSUB, and
(c) The FMUL only has one user (the FADD/FSUB).
If your target has fast FMA instructions you can make use of these combines by
overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for
types supported by your FMA instruction, and adding patterns to match ISD::FMA
to your FMA instructions.
llvm-svn: 158757
The PPC::EXTSW instruction preserves the low 32 bits of its input, just
like some of the x86 instructions. Use it to reduce register pressure
when the low 32 bits have multiple uses.
This requires a small change to PeepholeOptimizer since EXTSW takes a
64-bit input register.
This is related to PR5997.
llvm-svn: 158743
StringMap suffered from the same bug as DenseMap: when you explicitly
construct it with a small number of buckets, you can arrange for the
tombstone-based growth path to be followed when the number of buckets
was less than '8'. In that case, even with a full map, it would compare
'0' as not less than '0', and refuse to grow the table, leading to
inf-loops trying to find an empty bucket on the next insertion. The fix
is very simple: use '<=' as the comparison. The same fix was applied to
DenseMap as well during its recent refactoring.
Thanks to Alex Bolz for the great report and test case. =]
llvm-svn: 158725
For processors with the G5-like instruction-grouping scheme, this helps avoid
early group termination due to a write-after-write dependency within the group.
It should also help on pipelined embedded cores.
On POWER7, over the test suite, this gives an average 0.5% speedup. The largest
speedups are:
SingleSource/Benchmarks/Stanford/Quicksort - 33%
MultiSource/Applications/d/make_dparser - 21%
MultiSource/Benchmarks/FreeBench/analyzer/analyzer - 12%
MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft - 12%
Largest slowdowns:
SingleSource/Benchmarks/Stanford/Bubblesort - 23%
MultiSource/Benchmarks/Prolangs-C++/city/city - 21%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 16%
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode - 13%
llvm-svn: 158719