This showed up the first time rend() was called on a bundled instruction
in the Mips backend.
Also avoid dereferencing end() in bundle_iterator::operator++().
We still don't have a place to put unit tests for this stuff.
llvm-svn: 158310
only using the linkage.
Use and test both, documenting that considering the visibility and linkage
of template parameters is a difference from gcc.
llvm-svn: 158309
We turned off the CMN instruction because it had semantics which we weren't
getting correct. If we are comparing with an immediate, then it's okay to use
the CMN instruction.
<rdar://problem/7569620>
llvm-svn: 158302
such as "protocol" and "expression" being implicitly turned into links to
mistakenly-generated Doxygen pages:
- Escaping @ symbols when Doxygen would otherwise incorrectly interpret them;
- Escaping # symbols when they're not intended as explicit Doxygen link
requests, such as when discussing preprocessor directives;
- In one odd case, unescaping @ in @__experimental_modules_import, because
Doxygen wrote '\@' to the output in that case, causing the example in the
description of ImportDecl to be wrong; and
- Fixing a typo: @breif -> @brief.
llvm-svn: 158299
This saves a cast, and zext is more expensive on platforms with subreg support
than trunc is. This occurs in the BSD implementation of memchr(3), see PR12750.
On the synthetic benchmark from that bug stupid_memchr and bsd_memchr have the
same performance now when not inlining either function.
stupid_memchr: 323.0us
bsd_memchr: 321.0us
memchr: 479.0us
where memchr is the llvm-gcc compiled bsd_memchr from osx lion's libc. When
inlining is enabled bsd_memchr still regresses down to llvm-gcc memchr time,
I haven't fully understood the issue yet, something is grossly mangling the
loop after inlining.
llvm-svn: 158297
Over the entire test-suite, this has an insignificantly negative average
performance impact, but reduces some of the worst slowdowns from the
anti-dep. change (r158294).
Largest speedups:
SingleSource/Benchmarks/Stanford/Quicksort - 28%
SingleSource/Benchmarks/Stanford/Towers - 24%
SingleSource/Benchmarks/Shootout-C++/matrix - 23%
MultiSource/Benchmarks/SciMark2-C/scimark2 - 19%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15%
(matrix and automotive-bitcount were both in the top-5 slowdown list from the
anti-dep. change)
Largest slowdowns:
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26%
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21%
SingleSource/Benchmarks/CoyoteBench/lpbench - 20%
MultiSource/Applications/d/make_dparser - 16%
llvm-svn: 158296
Using 'all' instead of 'critical' would be better because it would make it easier to
satisfy the bundling constraints, but, as noted in the FIXME, that is currently not
possible with the crs.
This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups:
SingleSource/Benchmarks/Shootout-C++/moments - 40%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26%
SingleSource/Benchmarks/McGill/misr - 23%
MultiSource/Applications/JM/ldecod/ldecod - 22%
Largest slowdowns:
SingleSource/Benchmarks/Shootout-C++/matrix - -29%
SingleSource/Benchmarks/Shootout-C++/ary3 - -22%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18%
SingleSource/Benchmarks/Shootout-C++/ary - -17%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15%
llvm-svn: 158294
We need an efficient mechanism to determine whether a defaulted default
constructor is constexpr, in order to determine whether a class is a literal
type, so keep the incrementally-built form on CXXRecordDecl. Remove the
on-demand computation of same, so that we only have one method for determining
whether a default constructor is constexpr. This doesn't affect correctness,
since default constructor lookup is much simpler than selecting a constructor
for copying or moving.
We don't need a corresponding mechanism for defaulted copy or move constructors,
since they can't affect whether a type is a literal type. Conversely, checking
whether such functions are constexpr can require non-trivial effort, so we defer
such checks until the copy or move constructor is required.
Thus we now only compute whether a copy or move constructor is constexpr on
demand, and only compute whether a default constructor is constexpr in advance.
This is unfortunate, but seems like the best solution.
llvm-svn: 158290
an explicitly-defaulted default constructor would be constexpr. This is
necessary in weird (but well-formed) cases where a class has more than one copy
or move constructor.
Cleanup of now-unused parts of CXXRecordDecl to follow.
llvm-svn: 158289
The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
would leave self-moves in the final assembly. Replacing those patterns with ones
based on the SUBREG builtins yields better-looking code.
Thanks to Jakob and Owen for their suggestions in this matter.
llvm-svn: 158283
Tail merging had been disabled on PPC because it would disturb bundling decisions
made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions
are made during post-RA scheduling, and tail merging is generally beneficial (the
average test-suite speedup is insignificantly positive).
Largest test-suite speedups:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23%
SingleSource/Benchmarks/Shootout-C++/ary - 21%
SingleSource/Benchmarks/Stanford/Queens - 17%
Largest slowdowns:
MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22%
MultiSource/Applications/JM/ldecod/ldecod - 14%
MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9%
This is improved by using full (instead of just critical) anti-dependency breaking,
but doing so still causes miscompiles and so cannot yet be enabled by default.
llvm-svn: 158259
in the same line do not override getting a cursor for the previous declaration.
e.g:
int x, y;
@synthesize prop1, prop2;
pointing at 'x'/'prop1' would give 'y'/'prop2' because their source ranges overlap.
rdar://11361113
llvm-svn: 158258
The LiveRegMatrix represents the live range of assigned virtual
registers in a Live interval union per register unit. This is not
fundamentally different from the interference tracking in RegAllocBase
that both RABasic and RAGreedy use.
The important differences are:
- LiveRegMatrix tracks interference per register unit instead of per
physical register. This makes interference checks cheaper and
assignments slightly more expensive. For example, the ARM D7 reigster
has 24 aliases, so we would check 24 physregs before assigning to one.
With unit-based interference, we check 2 units before assigning to 2
units.
- LiveRegMatrix caches regmask interference checks. That is currently
duplicated functionality in RABasic and RAGreedy.
- LiveRegMatrix is a pass which makes it possible to insert
target-dependent passes between register allocation and rewriting.
Such passes could tweak the register assignments with interference
checking support from LiveRegMatrix.
Eventually, RABasic and RAGreedy will be switched to LiveRegMatrix.
llvm-svn: 158255