This came up when a change in block placement formed a cmov and slowed down a
hot loop by 50%:
ucomisd (%rdi), %xmm0
cmovbel %edx, %esi
cmov is a really bad choice in this context because it doesn't get branch
prediction. If we emit it as a branch, an out-of-order CPU can do a better job
(if the branch is predicted right) and avoid waiting for the slow load+compare
instruction to finish. Of course it won't help if the branch is unpredictable,
but those are really rare in practice.
This patch uses a dumb conservative heuristic, it turns all cmovs that have one
use and a direct memory operand into branches. cmovs usually save some code
size, so we disable the transform in -Os mode. In-Order architectures are
unlikely to benefit as well, those are included in the
"predictableSelectIsExpensive" flag.
It would be better to reuse branch probability info here, but BPI doesn't
support select instructions currently. It would make sense to use the same
heuristics as the if-converter pass, which does the opposite direction of this
transform.
Test suite shows a small improvement here and there on corei7-level machines,
but the actual results depend a lot on the used microarchitecture. The
transformation is currently disabled by default and available by passing the
-enable-cgp-select2branch flag to the code generator.
Thanks to Chandler for the initial test case to him and Evan Cheng for providing
me with comments and test-suite numbers that were more stable than mine :)
llvm-svn: 156234
This will be used to determine whether it's profitable to turn a select into a
branch when the branch is likely to be predicted.
Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM.
I'm not entirely happy with the name of this flag, suggestions welcome ;)
llvm-svn: 156233
which is leading to swapping in some cases when it didn't before. We need to see if we can make this change
without leading to a massive compile-time bloat.
llvm-svn: 156229
to get a const char* if necessary.
This avoids unnecessary conversions when we want to use the result of getName as
a StringRef.
Part of rdar://10796159
llvm-svn: 156227
This is still a topological ordering such that every register class gets
a smaller enum value than its sub-classes.
Placing the smaller spill sizes first makes a difference for the
super-register class bit masks. When looking for a super-register class,
we usually want the smallest possible kind of super-register. That is
now available as the first bit set in the bit mask.
llvm-svn: 156222
No one was using it and Locker(pthread_mutex_t *) immediately asserts for
pthread_mutex_t's that don't come from a Mutex anyway. Rather than try to make
that work, we should maintain the Mutex abstraction and not pass around the
platform implementation...
Make Mutex::Locker::Lock take a Mutex & or a Mutex *, and remove the constructor
taking a pthread_mutex_t *. You no longer need to call Mutex::GetMutex to pass
your mutex to a Locker (you can't in fact, since I made it private.)
llvm-svn: 156221
We want the representative register class to contain the largest
super-registers available. This makes the function less sensitive to the
register class numbering.
llvm-svn: 156220
Sema::ConvertToIntegralOrEnumerationType() from PartialDiagnostics to
abstract "diagnoser" classes. Not much of a win here, but we're
-several PartialDiagnostics.
llvm-svn: 156217
In file included from ../lib/Target/NVPTX/VectorElementize.cpp:53:
../lib/Target/NVPTX/NVPTX.h:44:3: warning: default label in switch which covers all enumeration values [-Wcovered-switch-default]
default: assert(0 && "Unknown condition code");
^
1 warning generated.
The prevailing pattern in LLVM is to not use a default label, and instead to
use llvm_unreachable to denote that the switch in fact covers all return paths
from the function.
llvm-svn: 156209
add a new Region::block_iterator which actually iterates over the basic
blocks of the region.
The old iterator, now call 'block_node_iterator' iterates over
RegionNodes which contain a single basic block. This works well with the
GraphTraits-based iterator design, however most users actually want an
iterator over the BasicBlocks inside these RegionNodes. Now the
'block_iterator' is a wrapper which exposes exactly this interface.
Internally it uses the block_node_iterator to walk all nodes which are
single basic blocks, but transparently unwraps the basic block to make
user code simpler.
While this patch is a bit of a wash, most of the updates are to internal
users, not external users of the RegionInfo. I have an accompanying
patch to Polly that is a strict simplification of every user of this
interface, and I'm working on a pass that also wants the same simplified
interface.
This patch alone should have no functional impact.
llvm-svn: 156202
% PYTHONPATH=./build/Debug/LLDB.framework/Resources/Python ; ./build/Debug//LLDB.framework/Resources/Python/lldb/macosx/crashlog.py -i ~/Downloads/crashes2/*.crash )
then you get an interactive prompt where you can search for data within all crash logs. For example you can do:
% list
which will list all crash logs
And you can search for all images given an image basename, or full path:
% image LLDB
% image /Applications/Xcode.app/Contents/SharedFrameworks/LLDB.framework/Versions/A/LLDB
% image LLDB.framework
Which would all produce an output listing like:
40CD4430-7D27-3248-BE4C-71B1F36FC5D0 (1.132 - 132) /Applications/Xcode.app/Contents/SharedFrameworks/LLDB.framework/Versions/A/LLDB, __TEXT=[0x000000011f8bc000 - 0x0000000120d3efbf)
B727A528-FF1F-3B20-9E4F-BBE96C7D922D (1.136 - 136) /Applications/Xcode.app/Contents/SharedFrameworks/LLDB.framework/Versions/A/LLDB, __TEXT=[0x000000011e7f7000 - 0x000000011fc7ff87)
4D6F8DC2-5757-39C7-96B0-1A5B5171DC6B (1.137 - 137) /Applications/Xcode.app/Contents/SharedFrameworks/LLDB.framework/Versions/A/LLDB, __TEXT=[0x000000012bd7f000 - 0x000000012d1fcfef)
FBF8786F-92B9-31E3-8BCD-A82148338966 (1.137 - 137) /Applications/Xcode.app/Contents/SharedFrameworks/LLDB.framework/Versions/A/LLDB, __TEXT=[0x0000000122d78000 - 0x00000001241f5fd7)
7AE082E3-3BB7-3F64-A308-063E559DFC45 (1.143 - 143) /Applications/Xcode.app/Contents/SharedFrameworks/LLDB.framework/Versions/A/LLDB, __TEXT=[0x0000000119b8d000 - 0x000000011b02ef5f)
7AE082E3-3BB7-3F64-A308-063E559DFC45 (1.143 - 143) /Applications/Xcode.app/Contents/SharedFrameworks/LLDB.framework/Versions/A/LLDB, __TEXT=[0x0000000111497000 - 0x0000000112938f5f)
7AE082E3-3BB7-3F64-A308-063E559DFC45 (1.143 - 143) /Applications/Xcode.app/Contents/SharedFrameworks/LLDB.framework/Versions/A/LLDB, __TEXT=[0x0000000116680000 - 0x0000000117b21f5f)
llvm-svn: 156201
The new target machines are:
nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX
The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.
NV_CONTRIB
llvm-svn: 156196
Rework the Host.cpp::ThreadNameAccessor to use ThreadSafeSTLMap - we've got it so we might as well use it. Also works around a problem with the
Mutex::Locker class raising fallacious asserts in debug mode when used with pthread_mutex_t's that weren't backed by Mutex objects.
llvm-svn: 156193
that bridging between the two is free. Saves ~4k of code size,
although I don't see any measurable performance difference
(unfortunately).
llvm-svn: 156187
getTypeSourceInfo() without checking for NULL.
FieldDecls may have NULL TypeSourceInfo, and in
fact some FieldDecls generated by Clang -- and
all FieldDecls generated by LLDB -- have no
TypeSourceInfo.
This patch makes IsTailPaddedMemberArray check
for NULL.
llvm-svn: 156186