We don't handle array destructors correctly yet, but we now apply the same
hack (explicitly destroy the first element, implicitly invalidate the rest)
for multidimensional arrays that we already use for linear arrays.
<rdar://problem/12858542>
llvm-svn: 170000
call sites as tail calls unconditionally. While it's theoretically true that
this is just an optimization, it's an optimization that we very much want to
happen even at -O0, or else ARC applications become substantially harder to
debug. See r169796 for the llvm/fast-isel side of things.
rdar://12553082
llvm-svn: 169996
is switched of by about 0.8% (tested with int i<N>).
Additionally, this puts computing the diagnostic class into the hot
path more when parsing, in preparation for upcoming optimizations
in this area.
llvm-svn: 169976
Add -fslp-vectorize (with -ftree-slp-vectorize as an alias for gcc compatibility)
to provide a way to enable the basic-block vectorization pass. This uses the same
acronym as gcc, superword-level parallelism (SLP), also common in the literature,
to refer to basic-block vectorization.
Nadav suggested this as a follow-up to the adding of -fvectorize.
llvm-svn: 169909
Instead of doing a binary search over the whole diagnostic table (which weighs
a whopping 48k on x86_64), use the existing enums to compute the index in the
table. This avoids loading any unneeded data from the table and avoids littering
CPU caches with it. This code is in a hot path for code with many diagnostics.
1% speedup on -fsyntax-only gcc.c, which emits a lot of warnings.
llvm-svn: 169890
Summary:
Also rename DumpDeclarator() to dumpDecl(). Once Decl dumping is added, these will be the two main methods of the class, so this is just for consistency in naming.
There was a DumpStmt() method already, but there was no point in having it, so I have merged it into VisitStmt(). Similarly, DumpExpr() is merged into VisitExpr().
Reviewers: alexfh
Reviewed By: alexfh
CC: cfe-commits, alexfh
Differential Revision: http://llvm-reviews.chandlerc.com/D156
llvm-svn: 169865
a file or directory, allowing just a stat call if a file descriptor
is not needed.
Doing just 'stat' is faster than 'open/fstat/close'.
This has the effect of cutting down system time for validating the input files of a PCH.
llvm-svn: 169831
We don't want to relax all instructions in
$ clang -c test.s
since most users don't pass -O when using the driver to assemble.
On the other hand, -save-temps should not change the output unnecessary, so in
$ clang -c test.c -save-temps
we should relax all instructions.
llvm-svn: 169815
definition, rather than at the end of the definition of the set of nested
classes. We still defer checking of the user-specified exception specification
to the end of the nesting -- we can't check that until we've parsed the
in-class initializers for non-static data members.
llvm-svn: 169805
Note that there is no test suite update. This was found by a couple of
tests failing when the test suite was run on a powerpc64 host (thanks
Roman!). The tests don't specify a triple, which might seem surprising
for a codegen test. But in fact, these tests don't even inspect their
output. Not at all. I could add a bunch of triples to these tests so
that we'd get the test coverage for normal builds, but really someone
needs to go through and add actual *tests* to these tests. =[ The ones
in question are:
test/CodeGen/bitfield-init.c
test/CodeGen/union.c
llvm-svn: 169694
both LE and BE targets.
AFAICT, Clang get's this correct for PPC64. I've compared it to GCC 4.8
output for PPC64 (thanks Roman!) and to my limited ability to read power
assembly, it looks functionally equivalent. It would be really good to
fill in the assertions on this test case for x86-32, PPC32, ARM, etc.,
but I've reached the limit of my time and energy... Hopefully other
folks can chip in as it would be good to have this in place to test any
subsequent changes.
To those who care about PPC64 performance, a side note: there is some
*obnoxiously* bad code generated for these test cases. It would be worth
someone's time to sit down and teach the PPC backend to pattern match
these IR constructs better. It appears that things like '(shr %foo,
<imm>)' turn into 'rldicl R, R, 64-<imm>, <imm>' or some such. They
don't even get combined with other 'rldicl' instructions *immediately
adjacent*. I'll add a couple of these patterns to the README, but
I think it would be better to look at all the patterns produced by this
and other bitfield access code, and systematically build up a collection
of patterns that efficiently reduce them to the minimal code.
llvm-svn: 169693
This was an egregious bug due to the several iterations of refactorings
that took place. Size no longer meant what it original did by the time
I finished, but this line of code never got updated. Unfortunately we
had essentially zero tests for this in the regression test suite. =[
I've added a PPC64 run over the bitfield test case I've been primarily
using. I'm still looking at adding more tests and making sure this is
the *correct* bitfield access code on PPC64 linux, but it looks pretty
close to me, and it is *worlds* better than before this patch as it no
longer asserts! =] More commits to follow with at least additional tests
and maybe more fixes.
Sorry for the long breakage due to this....
llvm-svn: 169691