Empty records do not always have size equivalent to their alignment.
They only do so when their alignment is at least as large as the minimum
empty struct size: 1 byte in C++ and 4 bytes in C.
llvm-svn: 218661
a flawed direction and causing miscompiles. Read on for details.
Fundamentally, the premise of this patch series was to map
VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because
we are going to *have* to lower to VSELECT nodes for some blends to
trigger the instruction selection patterns of variable blend
instructions. This doesn't actually work out so well.
In order to match performance with the existing VECTOR_SHUFFLE
lowering code, we would need to re-slice the blend in order to fit it
into either the integer or floating point blends available on the ISA.
When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources)
this works well because the X86 backend ensures that these types of
operands to VSELECT get sign extended into '-1' and '0' for true and
false, allowing us to re-slice the bits in whatever granularity without
changing semantics.
However, if the VSELECT condition comes from some other source, for
example code lowering vector comparisons, it will likely only have the
required bit set -- the high bit. We can't blindly slice up this style
of VSELECT. Reid found some code using Halide that triggers this and I'm
hopeful to eventually get a test case, but I don't need it to understand
why this is A Bad Idea.
There is another aspect that makes this approach flawed. When in
VECTOR_SHUFFLE form, we have very distilled information that represents
the *constant* blend mask. Converting back to a VSELECT form actually
can lose this information, and so I think now that it is better to treat
this as VECTOR_SHUFFLE until the very last moment and only use VSELECT
nodes for instruction selection purposes.
My plan is to:
1) Clean up and formalize the target pre-legalization DAG combine that
converts a VSELECT with a constant condition operand into
a VECTOR_SHUFFLE.
2) Remove any fancy lowering from VSELECT during *legalization* relying
entirely on the DAG combine to catch cases where we can match to an
immediate-controlled blend instruction.
One additional step that I'm not planning on but would be interested in
others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV
which encodes a fully legalized VSELECT node. Then it would be easy to
write isel patterns only in terms of this to ensure VECTOR_SHUFFLE
legalization only ever forms the fully legalized construct and we can't
cycle between it and VSELECT combining.
llvm-svn: 218658
The sign-/zero-extension of the loaded value can be performed by the memory
instruction for free. If the result of the load has only one use and the use is
a sign-/zero-extend, then we emit the proper load instruction. The extend is
only a register copy and will be optimized away later on.
Other instructions that consume the sign-/zero-extended value are also made
aware of this fact, so they don't fold the extend too.
This fixes rdar://problem/18495928.
llvm-svn: 218653
MSC17, aka VS2012, cannot compile it.
clang/include/clang/ASTMatchers/ASTMatchersInternal.h(387) : error C4519: default template arguments are only allowed on a class template
clang/include/clang/ASTMatchers/ASTMatchersInternal.h(443) : see reference to class template instantiation 'clang::ast_matchers::internal::Matcher<T>' being compiled
llvm-svn: 218648
the user level. It adds the ability to invent new stepping modes implemented by python classes,
and to view the current thread plan stack and to some extent alter it.
I haven't gotten to documentation or tests yet. But this should not cause any behavior changes
if you don't use it, so its safe to check it in now and work on it incrementally.
llvm-svn: 218642
Clang warns (treated as error by default, but still ignored in system headers)
when passing non-POD arguments to variadic functions, and generates a trap
instruction to crash the program if that code is ever run.
Unfortunately, MSVC happily generates code for such calls without a warning,
and there is code in system headers that use it.
This makes Clang not insert the trap instruction when in -fms-compatibility
mode, while still generating the warning/error message.
Differential Revision: http://reviews.llvm.org/D5492
llvm-svn: 218640
The thread resume block is executed in the normal flow of thread
state queued event processing. The tests verify that it is executed
when we track the thread to be stopped and skipped when we track
it to already be running.
llvm-svn: 218638
No functionality change.
Makes the code more compact (see the FMA part).
This needs a new type attribute MemOpFrag in X86VectorVTInfo. For now I only
defined this in the simple cases. See the commment before the attribute.
Diff of X86.td.expanded before and after is empty except for the appearance of
the new attribute.
llvm-svn: 218637
This is the last piece of CGCall code that had implicit assumptions about
the order in which Clang arguments are translated to LLVM ones (positions
of inalloca argument, sret, this, padding arguments etc.) Now all of
this data is encapsulated in ClangToLLVMArgsMapping. If this information
would be required somewhere else, this class can be moved to a separate
header or pulled into CGFunctionInfo.
llvm-svn: 218634
Summary:
This patch adds support for the general dynamic TLS access model for X86_64 (see www.akkadia.org/drepper/tls.pdf).
To properly support TLS, the patch also changes the __tls_get_addr atom to be a shared library atom instead of a regularly defined atom (the previous lld approach). This closely models the reality of a function that will be resolved at runtime by the dynamic linker and loader itself (ld.so). I was tempted to force LLD to link against ld.so itself to resolve these symbols, but since GNU ld does not need the ld.so library to resolve this symbol, I decided to mimic its behavior and keep hardwired a definition of __tls_get_addr in the lld code.
This patch also moves some important logic that previously was only available to the MIPS lld backend to be used to all ELF backends. This logic, which now lives in the DefaultLayout class, will monitor which external (shared lib) symbols are really imported by the current module and will only populate the dynamic symbol table with used symbols, as opposed to the previous approach of dumping all shared lib symbols in the dynamic symbol table. This is important to this patch to avoid __tls_get_addr from getting injected into all dynamic symbol tables.
By solving the previous problem of always adding __tls_get_addr, now the produced symbol tables are slightly smaller. But this impacted several tests that relied on hardwired/predefined sizes of the symbol table, requiring this patch to update such tests.
Test Plan: Added a LIT test case that exercises a simple use case of TLS variable in a shared library.
Reviewers: ruiu, rafael, Bigcheese, shankarke
Reviewed By: Bigcheese, shankarke
Subscribers: emaste, shankarke, joerg, kledzik, mcrosier, llvm-commits
Projects: #lld
Differential Revision: http://reviews.llvm.org/D5505
llvm-svn: 218633
map, this makes sure that we can compile the same code for two different
ABIs (hard and soft float) in the same module.
Update one testcase accordingly (and fix some confusing naming) and
add a new testcase as well with the ordering swapped which would
highlight the problem.
llvm-svn: 218632
Also added a test for the reset handling. The reset/state clearing happens
as a processed queue event. The only diff vs. standard processing is that
the exec clears the queue before queueing the activity to clear internal state.
i.e. once we get an exec, we really stop doing any other queue-based activity.
llvm-svn: 218629
Primarily refines all of the instructions with accurate latency
and micro-op information. Refinements largely focus on the NEON
instructions.
Additionally, a few advanced features are modeled, including
forwarding for MAC instructions and hazards for floating point SQRT
and DIV.
Lastly, the issue-width is reduced to three so that the scheduler
will better accommodate the narrower decode and dispatch width.
llvm-svn: 218627
The contract of this function seems problematic (fallback in either
direction seems like it could produce bugs in one client or another),
but here's some tests for its current behavior, at least. See the
commit/review thread of r218187 for more discussion.
llvm-svn: 218626
As PR20495 demonstrates, Clang currenlty infers the CUDA target (host/device,
etc) for implicit members (constructors, etc.) incorrectly. This causes errors
and even assertions in Clang when compiling code (assertions in C++11 mode where
implicit move constructors are added into the mix).
Fix the problem by inferring the target from the methods the implicit member
should call (depending on its base classes and fields).
llvm-svn: 218624
Add a method to calculate the number of arguments given QualType
expnads to. Use this method in ClangToLLVMArgMapping calculation.
This number may be cached in CodeGenTypes for efficiency, if needed.
llvm-svn: 218623
This takes a single argument convertible to T, and
- if the Optional has a value, returns the existing value,
- otherwise, constructs a T from the argument and returns that.
Inspired by std::experimental::optional from the "Library Fundamentals" C++ TS.
llvm-svn: 218618
Summary:
This change introduces DynMatcherInterface and changes the internal
representation of DynTypedMatcher and Matcher<T> to use a generic
interface instead.
It removes unnecessary indirections and virtual function calls when
converting matchers by implicit and dynamic casts.
DynTypedMatcher now remembers the stricter type in the chain of casts
and checks it before calling into DynMatcherInterface.
This change improves our clang-tidy related benchmark by ~14%.
Also, it opens the door for more optimizations of this kind that are
coming in future changes.
As a side effect of removing these template instantiations, it also
speeds up compilation of Dynamic/Registry.cpp by ~17% and reduces the number of
symbols generated by ~30%.
Reviewers: klimek
Subscribers: klimek, cfe-commits
Differential Revision: http://reviews.llvm.org/D5485
llvm-svn: 218616
Hoist the logic which determines the way QualType is expanded
into a separate method. Remove a bunch of copy-paste and simplify
getTypesFromArgs() / ExpandTypeFromArgs() / ExpandTypeToArgs() methods.
llvm-svn: 218615
This is just a optimization to save the compile time and execution time
for runtime alias checks if the user guarantees no aliasing all together.
llvm-svn: 218613