Add a method to calculate the number of arguments given QualType
expnads to. Use this method in ClangToLLVMArgMapping calculation.
This number may be cached in CodeGenTypes for efficiency, if needed.
llvm-svn: 218623
This takes a single argument convertible to T, and
- if the Optional has a value, returns the existing value,
- otherwise, constructs a T from the argument and returns that.
Inspired by std::experimental::optional from the "Library Fundamentals" C++ TS.
llvm-svn: 218618
Summary:
This change introduces DynMatcherInterface and changes the internal
representation of DynTypedMatcher and Matcher<T> to use a generic
interface instead.
It removes unnecessary indirections and virtual function calls when
converting matchers by implicit and dynamic casts.
DynTypedMatcher now remembers the stricter type in the chain of casts
and checks it before calling into DynMatcherInterface.
This change improves our clang-tidy related benchmark by ~14%.
Also, it opens the door for more optimizations of this kind that are
coming in future changes.
As a side effect of removing these template instantiations, it also
speeds up compilation of Dynamic/Registry.cpp by ~17% and reduces the number of
symbols generated by ~30%.
Reviewers: klimek
Subscribers: klimek, cfe-commits
Differential Revision: http://reviews.llvm.org/D5485
llvm-svn: 218616
Hoist the logic which determines the way QualType is expanded
into a separate method. Remove a bunch of copy-paste and simplify
getTypesFromArgs() / ExpandTypeFromArgs() / ExpandTypeToArgs() methods.
llvm-svn: 218615
This is just a optimization to save the compile time and execution time
for runtime alias checks if the user guarantees no aliasing all together.
llvm-svn: 218613
This patch improves the target-specific cost model to better handle signed
division by a power of two. The immediate result is that this enables the SLP
vectorizer to do a better job.
http://reviews.llvm.org/D5469
PR20714
llvm-svn: 218607
There will be multiple TypeUnits in an unlinked object that will be extracted
from different sections. Now that we have DWARFUnitSection that is supposed
to represent an input section, we need a DWARFUnitSection<TypeUnit> per
input .debug_types section.
Once this is done, the interface is homogenous and we can move the Section
parsing code into DWARFUnitSection.
This is a respin of r218513 that got reverted because it broke some builders.
This new version features an explicit move constructor for the DWARFUnitSection
class to workaround compilers unable to generate correct C++11 default
constructors.
Reviewers: samsonov, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5482
llvm-svn: 218606
* Detect Android toolchain target arch and set correct runtime library name.
* Merged a lot of Android and non-Android code paths.
* Android is only supported in standalone build of compiler-rt now.
* Linking lsan-common in ASan-Android (makes lsan annotations work).
* Relying on -fsanitize=address linker flag when building tests (again,
unification with non-Android path).
* Runtime library moved from lib/asan to lib/linux.
llvm-svn: 218605
Runtime unrolling will create a prologue to execute the extra
iterations which is can't divided by the unroll factor. It
generates an if-then-else sequence to jump into a factor -1
times unrolled loop body, like
extraiters = tripcount % loopfactor
if (extraiters == 0) jump Loop:
if (extraiters == loopfactor) jump L1
if (extraiters == loopfactor-1) jump L2
...
L1: LoopBody;
L2: LoopBody;
...
if tripcount < loopfactor jump End
Loop:
...
End:
It means if the unroll factor is 4, the loop body will be 7
times unrolled, 3 are in loop prologue, and 4 are in the loop.
This commit is to use a loop to execute the extra iterations
in prologue, like
extraiters = tripcount % loopfactor
if (extraiters == 0) jump Loop:
else jump Prol
Prol: LoopBody;
extraiters -= 1 // Omitted if unroll factor is 2.
if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2.
if (tripcount < loopfactor) jump End
Loop:
...
End:
Then when unroll factor is 4, the loop body will be copied by
only 5 times, 1 in the prologue loop, 4 in the original loop.
And if the unroll factor is 2, new loop won't be created, just
as the original solution.
llvm-svn: 218604
Fixes incorrect codegen when devirtualization is aborted due to covariant return types.
Differential Revision: http://reviews.llvm.org/D5321
llvm-svn: 218602
The ldrexd and strexd instructions are undefined for the ARMv7M
architecture, so we cannot use them to implement the
__sync_fetch_and_*_8 builtins. There is no other way to implement
these without OS support, so this patch #ifdef's these functions out
for M-class architectures.
There are no tests as I cannot find any existing tests for these
builtins.
I used the __ARM_ARCH_PROFILE predefine because __ARM_FEATURE_LDREX is
deprecated and not set by clang.
llvm-svn: 218601
nodes, and rely exclusively on its logic. This removes a ton of
duplication from the blend lowering and centralizes it in one place.
One downside is that it requires a bunch of hacks to make this work with
the current legalization framework. We have to manually speculate one
aspect of legalizing VSELECT nodes to get everything to work nicely
because the existing legalization framework isn't *actually* bottom-up.
The other grossness is that we somewhat duplicate the analysis of
constant blends. I'm on the fence here. If reviewers thing this would
look better with VSELECT when it has constant operands dumping over tho
VECTOR_SHUFFLE, we could go that way. But it would be a substantial
change because currently all of the actual blend instructions are
matched via patterns in the TD files based around VSELECT nodes (despite
them not being perfect fits for that). Suggestions welcome, but at least
this removes the rampant duplication in the backend.
llvm-svn: 218600
works, as do breakpoints, run and pause, display zeroth frame.
See
http://reviews.llvm.org/D5503
for a fuller description of the changes in this commit.
llvm-svn: 218596
X86 target-specific DAG combining that tried to convert VSELECT nodes
into VECTOR_SHUFFLE nodes that it "knew" would lower into
immediate-controlled blend nodes.
Turns out, we have perfectly good lowering of all these VSELECT nodes,
and indeed that lowering already knows how to handle lowering through
BLENDI to immediate-controlled blend nodes. The code just wasn't getting
used much because this thing forced the world to go through the vector
shuffle lowering. Yuck.
This also exposes that I was too aggressive in avoiding domain crossing
in v218588 with that lowering -- when the other option is to expand into
two 128-bit vectors, it is worth domain crossing. Restore that behavior
now that we have nice tests covering it.
The test updates here fall into two camps. One is where previously we
ended up with an unsigned encoding of the blend operand and now we get
a signed encoding. In most of those places there were elaborate comments
explaining exactly what these operands really mean. Rather than that,
just switch these tests to use the nicely decoded comments that make it
obvious that the final shuffle matches.
The other updates are just removing pointless domain crossing by
blending integers with PBLENDW rather than BLENDPS.
llvm-svn: 218589
crossing and generally work more like the blend emission code in the new
vector shuffle lowering.
My goal is to have the new vector shuffle lowering just produce VSELECT
nodes that are either matched here to BLENDI or are legal and matched in
the .td files to specific blend instructions. That seems much cleaner as
there are other ways to produce a VSELECT anyways. =]
No *observable* functionality changed yet, mostly because this code
appears to be near-dead. The behavior of this lowering routine did
change though. This code being mostly dead and untestable will change
with my next commit which will also point some new tests at it.
llvm-svn: 218588
AVX-512.
There is no interesting logic yet. Everything ends up eventually
delegating to the generic code to split the vector and shuffle the
halves. Interestingly, that logic does a significantly better job of
lowering all of these types than the generic vector expansion code does.
Mostly, it lets most of the cases fall back to nice AVX2 code rather
than all the way back to SSE code paths.
Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next
up will be to incrementally add direct support for the basic instruction
set to each type (adding tests first).
llvm-svn: 218585
assertion, making the name generic, and improving the documentation.
Step 1 in adding very primitive support for AVX-512. No functionality
changed yet.
llvm-svn: 218584
vectors.
Someone will need to build the AVX512 lowering, which should follow
AVX1 and AVX2 *very* closely for AVX512F and AVX512BW resp. I've added
a dummy test which is a port of the v8f32 and v8i32 tests from AVX and
AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this
is enough information for someone to implement proper lowering here. If
not, I'll be happy to help, but right now the AVX-512 support isn't
a priority for me.
llvm-svn: 218583
lowerings.
This was hopelessly broken. First, the x86 backend wants '-1' to be the
element value representing true in a boolean vector, and second the
operand order for VSELECT is backwards from the actual x86 instructions.
To make matters worse, the backend is just using '-1' as the true value
to get the high bit to be set. It doesn't actually symbolically map the
'-1' to anything. But on x86 this isn't quite how it works: there *only*
the high bit is relevant. As a consequence weird non-'-1' values like
0x80 actually "work" once you flip the operands to be backwards.
Anyways, thanks to Hal for helping me sort out what these *should* be.
llvm-svn: 218582
A new thread arriving while a pending signal notification
is outstanding will (1) add the new thread to the list of
stops expected before the deferred signal notification is
fired, (2) send a stop request for the new thread, and
(3) track the new thread as currently running.
llvm-svn: 218578
Clang uses two types to talk about a C++ class, the
NonVirtualBaseLLVMType and the LLVMType. Previously, we would allow one
of these to be packed and the other not.
This is problematic. If both don't agree on a common subset of fields,
then routines like getLLVMFieldNo will point to the wrong field. Solve
this by copying the 'packed'-ness of the complete type to the
non-virtual subobject. For this to work, we need to take into account
the non-virtual subobject's size and alignment when we are computing the
layout of the complete object.
This fixes PR21089.
llvm-svn: 218577
new vector shuffle target DAG combines -- it helps to actually test for
the value you want rather than just using an integer in a boolean
context.
Have I mentioned that I loathe implicit conversions recently? :: sigh ::
llvm-svn: 218576
of widening masks.
We can't widen a zeroing mask unless both elements that would be merged
are either zeroed or undef. This is the only way to widen a mask if it
has a zeroed element.
Also clean up the code here by ordering the checks in a more logical way
and by using the symoblic values for undef and zero. I'm actually torn
on using the symbolic values because the existing code is littered with
the assumption that -1 is undef, and moreover that entries '< 0' are the
special entries. While that works with the values given to these
constants, using the symbolic constants actually makes it a bit more
opaque why this is the case.
llvm-svn: 218575