GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).
llvm-svn: 154285
Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.
llvm-svn: 154284
An MDNode has a list of MDNodeOperands allocated directly after it as part of
its allocation. Therefore, the Parent of the MDNodeOperands can be found by
walking back through the operands to the beginning of that list. Mark the first
operand's value pointer as being the 'first' operand so that we know where the
beginning of said list is.
This saves a *lot* of space during LTO with -O0 -g flags.
llvm-svn: 154280
shuffle node because it could introduce new shuffle nodes that were not
supported efficiently by the target.
2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
second shuffle reverses the transformation of the first shuffle.
llvm-svn: 154266
reciprocal if converting to the reciprocal is exact. Do it even if inexact
if -ffast-math. This substantially speeds up ac.f90 from the polyhedron
benchmarks.
llvm-svn: 154265
optimizers could do this for us, but expecting partial SROA of classes
with template methods through cloning is probably expecting too much
heroics. With this change, the begin/end pointer pairs which indicate
the status of each loop iteration are actually passed directly into each
layer of the combine_data calls, and the inliner has a chance to see
when most of the combine_data function could be deleted by inlining.
Similarly for 'length'.
We have to be careful to limit the places where in/out reference
parameters are used as those will also defeat the inliner / optimizers
from properly propagating constants.
With this change, LLVM is able to fully inline and unroll the hash
computation of small sets of values, such as two or three pointers.
These now decompose into essentially straight-line code with no loops or
function calls.
There is still one code quality problem to be solved with the hashing --
LLVM is failing to nuke the alloca. It removes all loads from the
alloca, leaving only lifetime intrinsics and dead(!!) stores to the
alloca. =/ Very unfortunate.
llvm-svn: 154264
speculate. Without this, loop rotate (among many other places) would
suddenly stop working in the presence of debug info. I found this
looking at loop rotate, and have augmented its tests with a reduction
out of a very hot loop in yacr2 where failing to do this rotation costs
sometimes more than 10% in runtime performance, perturbing numerous
downstream optimizations.
This should have no impact on performance without debug info, but the
change in performance when debug info is enabled can be extreme. As
a consequence (and this how I got to this yak) any profiling of
performance problems should be treated with deep suspicion -- they may
have been wildly innacurate of debug info was enabled for profiling. =/
Just a heads up.
llvm-svn: 154263
The tLDRr instruction with the last register operand set to the zero register
prints in assembly as if no register was specified, and the assembler encodes
it as a tLDRi instruction with a zero immediate. With the integrated assembler,
that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which
is broken. Emit the instruction as tLDRi with a zero immediate. I don't
know if there's a good way to write a testcase for this. Suggestions welcome.
Opportunities for follow-up work:
1) The asm printer should complain if a non-optional register operand is set
to the zero register, instead of silently dropping it.
2) The integrated assembler should complain in the same situation, instead of
silently emitting the operand as "r0".
llvm-svn: 154261
Grouped unrolling means that we unroll a loop such that the different instances
of a certain statement are scheduled right after each other, but we do
not generate any vector code. The idea here is that we can schedule the
bb vectorizer right afterwards and use it heuristics to decide when
vectorization should be performed.
llvm-svn: 154251
nanoseconds in 32-bit expression would cause pthread_cond_timedwait
to time out immediately. Add explicit casts to the TimeValue::TimeValue
ctor that takes a struct timeval and change the NanoSecsPerSec etc
constants defined in TimeValue to be uint64_t so any other calculations
involving these should be promoted to 64-bit even when lldb is built
for 32-bit.
<rdar://problem/11204073>, <rdar://problem/11179821>, <rdar://problem/11194705>.
llvm-svn: 154250
- The [class.protected] restriction is non-trivial for any instance
member, even if the access lacks an object (for example, if it's
a pointer-to-member constant). In this case, it is equivalent to
requiring the naming class to equal the context class.
- The [class.protected] restriction applies to accesses to constructors
and destructors. A protected constructor or destructor can only be
used to create or destroy a base subobject, as a direct result.
- Several places were dropping or misapplying object information.
The standard could really be much clearer about what the object type is
supposed to be in some of these accesses. Usually it's easy enough to
find a reasonable answer, but still, the standard makes a very confident
statement about accesses to instance members only being possible in
either pointer-to-member literals or member access expressions, which
just completely ignores concepts like constructor and destructor
calls, using declarations, unevaluated field references, etc.
llvm-svn: 154248
Cygwin-1.7 supports dw2. Some recent mingw distros support one, too.
I have confirmed test-suite/SingleSource/Benchmarks/Shootout-C++/except.cpp can pass on Cygwin.
llvm-svn: 154247
a hello world executable from atoms. There is still much to be flushed out.
Added one test case, test/darwin/hello-world.objtxt, which exercises the
darwin platform.
Added -platform option to lld-core tool to dynamically select platform.
llvm-svn: 154242
by default.
This is a behaviour configurable in the MCAsmInfo. I've decided to turn
it on by default in (possibly optimistic) hopes that most assemblers are
reasonably sane. If this proves a problem, switching to default seems
reasonable.
I'm not sure if this is the opportune place to test, but it seemed good
to make sure it was tested somewhere.
llvm-svn: 154235
However, the '-x' option has special handling and wasn't following this
paradigm. Fix it to do so by claiming the arg as we parse the '-x' option.
rdar://11203340
llvm-svn: 154231
spin up a temporary "private state thread" that will respond to events from the lower level process plugins. This check-in should work to do
that, but it is still buggy. However, if you don't call functions on the private state thread, these changes make no difference.
This patch also moves the code in the AppleObjCRuntime step-through-trampoline handler that might call functions (in the case where the debug
server doesn't support the memory allocate/deallocate packet) out to a safe place to do that call.
llvm-svn: 154230
In a few cases clang emitted a rather content-free diagnostic: 'parse error'.
This change replaces two actual cases (template parameter parsing and K&R
parameter declaration parsing) with more specific diagnostics and removes a
third dead case of this in the BalancedDelimiterTracker (the ctor already
checked the invariant necessary to ensure that the diag::parse_error was never
actually used).
llvm-svn: 154224