When combining consecutive loads+inserts into a single vector load,
we should keep the alignment of the base load. Doing otherwise can, and does,
lead to using overly aligned instructions. In the included test case, for
example, using a 32-byte vmovaps on a 16-byte aligned value. Oops.
rdar://19190968
llvm-svn: 224746
Previously I tried to plug musttail into the existing vararg lowering
code. That turned out to be a mistake, because non-vararg calls use
significantly different register lowering, even on x86. For example, AVX
vectors are usually passed in registers to normal functions and memory
to vararg functions. Now musttail uses a completely separate lowering.
Hopefully this can be used as the basis for non-x86 perfect forwarding.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D6156
llvm-svn: 224745
Since these are all created in the DenseMap before they are referenced,
there's no problem with pointer validity by the time it's required. This
removes another use of DeleteContainerSeconds/manual memory management
which I'm cleaning up from time to time.
llvm-svn: 224744
Followup to r224294:
ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.
llvm-svn: 224743
a time into a partition iterator and a Partition class.
There is a lot of knock-on simplification that this enables, largely
stemming from having a Partition object to refer to in lots of helpers.
I've only done a minimal amount of that because enoguh stuff is changing
as-is in this commit.
This shouldn't change any observable behavior. I've worked hard to
preserve the *exact* traversal semantics which were originally present
even though some of them make no sense. I'll be changing some of this in
subsequent commits now that the logic is carefully factored into
a reusable place.
The primary motivation for this change is to break the rewriting into
phases in order to support more intelligent rewriting. For example, I'm
planning to change how split loads and stores are rewritten to remove
the significant overuse of integer bit packing in the resulting code and
allow more effective secondary splitting of aggregates. For any of this
to work, they have to share the exact traversal logic.
llvm-svn: 224742
Summary:
MSAN and ASAN also replace new/delete which leads to a link error in these tests. Currently they are unsupported but I think it would be useful if these tests could run with sanitizers.
This patch creates a support header that consolidates the new/delete replacement functionality and checking.
When we are using sanitizers new and delete are no longer replaced and the checks always return true.
Reviewers: mclow.lists, danalbert, jroelofs, EricWF
Reviewed By: EricWF
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D6562
llvm-svn: 224741
Take two disjoint Loops L1 and L2.
LoopSimplify fails to simplify some loops (e.g. when indirect branches
are involved). In such situations, it can happen that an exit for L1 is
the header of L2. Thus, when we create PHIs in one of such exits we are
also inserting PHIs in L2 header.
This could break LCSSA form for L2 because these inserted PHIs can also
have uses in L2 exits, which are never handled in the current
implementation. Provide a fix for this corner case and test that we
don't assert/crash on that.
Differential Revision: http://reviews.llvm.org/D6624
rdar://problem/19166231
llvm-svn: 224740
This allows us to generate debug info for extremely advanced code such as
typedef struct { long int a; int b;} S;
int foo(S s) {
return s.b;
}
which at -O1 on x86_64 is codegen'd into
define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 {
ret i32 %s.coerce1, !dbg !24
}
with this patch we emit the following debug info for this
TAG_formal_parameter [3]
AT_location( 0x00000000
0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004
0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 )
AT_name( "s" )
AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" )
Thanks to chandlerc, dblaikie, and echristo for their feedback on all
previous iterations of this patch!
llvm-svn: 224739
Previously we assumed the section name had the form .text$foo, which is
what we used to do for inline functions. If the dollar wasn't present,
we'd put unwind data in the .pdata and .xdata sections for the main
.text section, which is incorrect.
Fixes PR22001.
llvm-svn: 224738
Summary:
Protect CommonFlags singleton by adding const qualifier to
common_flags() accessor. The only ways to modify the flags are
SetCommonFlagsDefaults(), ParseCommonFlagsFromString() and
OverrideCommonFlags() functions, which are only supposed to be
called during initialization.
Test Plan: regression test suite
Reviewers: kcc, eugenis, glider
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6741
llvm-svn: 224736
Summary:
In order to fully replace the testit script we need to update LIT so it provides the same functionality.
This patch adds a number of different configuration options to LIT to do that. It also adds documentation for all of the command line parameters that LIT supports.
Generic options added:
- `libcxx_headers`
- `libcxx_library`
- `compile_flags`
Generic options modified:
- `link_flags`: Changed from overriding the default args to adding extra args instead (to match compile flags)
- `use_sanitizer`: Renamed from `llvm_use_sanitizer`
Please see the added documentation for more information about the switches. As for the actual documentation I'm not sure if it should be kept in libc++ forever since it adds an undue maintenance burden, but I think it should be added for the time being while the changes are new. I'm verify unskilled with HTML so if the documentation needs any changes please let me know.
Hopefully this will kill testit.
Reviewers: jroelofs, mclow.lists, danalbert
Reviewed By: danalbert
Subscribers: alexfh, cfe-commits
Differential Revision: http://reviews.llvm.org/D5877
llvm-svn: 224728
Currently, when ctpop is supported for scalar types, the expansion of
@llvm.ctpop.vXiY uses vector element extractions, insertions and individual
calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used
for the scalar calls.
Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY
expansion in some cases by using a using a vector parallel bit twiddling
approach, based on:
v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = ((v + (v >> 4) & 0xF0F0F0F)
v = v + (v >> 8)
v = v + (v >> 16)
v = v & 0x0000003F
(from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)
When scalar ctpop isn't supported, the approach above performs better for
v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop
is supported, this approach performs ~2x better for v8i32.
Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop
support with -march=core-avx2.
== [x86_64h - new]
v8i32: 0.661685
v4i32: 0.514678
v4i64: 0.652009
v2i64: 0.324289
== [x86_64h - old]
v8i32: 1.29578
v4i32: 0.528807
v4i64: 0.65981
v2i64: 0.330707
== [x86_64 - new]
v8i32: 1.003
v4i32: 0.656273
v4i64: 1.11711
v2i64: 0.754064
== [x86_64 - old]
v8i32: 2.34886
v4i32: 1.72053
v4i64: 1.41086
v2i64: 1.0244
More work for other vector types will come next.
llvm-svn: 224725
The default value of Opts.Trigraphs now no longer depends solely on the
language input kind, so move the code out of setLangDefaults(). Also make
sure that Opts.MSVCCompat is set before the Trigraph code runs.
Related to PR21974.
llvm-svn: 224719
The lit.cfg files only add .cpp to suffixes, so these tests used to never run,
oops. (Also tweak to of these tests in minor ways to make the actually pass.)
llvm-svn: 224718
As mentioned in
https://code.google.com/p/address-sanitizer/issues/detail?id=365, when the
re-exec that adds the required DYLD_INSERT_LIBRARIES variable fails, ASan
currently continues to run, but things are broken (some memory can be
overwritten, interceptors don't work, ...). This patch aborts if the execv()
fails and prints an error message that DYLD_INSERT_LIBRARIES is required. It
also removes the "alllow_reexec" flag, since using it causes the same issues.
Reviewed at http://reviews.llvm.org/D6752
llvm-svn: 224712
NULL handler
Per
https://developer.apple.com/library/mac/documentation/Performance/Reference/GCD_libdispatch_Ref/index.html,
the dispatch_source_set_cancel_handler() API *can* be called with a NULL
handler. In that case, the libdispatch removes an already existing cancellation
handler, if there was one. ASan's interceptor always creates a new block that
always tries to call the original handler. In case the original block is NULL,
a segmentation fault happens. Let's fix that by not wrapping a NULL-block at
all.
It looks like all the other libdispatch APIs (which we intercept) do *not*
allow NULL. So it's really only the dispatch_source_set_cancel_handler one that
needs this fix.
Reviewed at http://reviews.llvm.org/D6747
llvm-svn: 224711
This completes the compact unwind support for x86 targets.
I'm still skipping the UNWIND_X86_64_MODE_STACK_IND encodings for
x86_64 right now because clang was emitting bad data for this form
until it was fixed in r217020 circa Sep 2014.
arm64 parsing still needs to be added.
llvm-svn: 224698
The ASan test/asan/TestCases/log-path_test.cc testcase uses /INVALID as an invalid path and expects that the program will not be allowed to create or write to that file. This actually is a valid writable path on one of my setups. Let's make the path more invalid.
Reviewed at http://reviews.llvm.org/D6727
llvm-svn: 224694
Extend the existing code which handles this for zext. This makes this
more useful for targets with ZeroOrNegativeOne BooleanContent and
obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne)
since the constant will now be shrunk to i1.
llvm-svn: 224691
* Remove the embedded directive undefined behavior by moving the
the #ifdef out of the macro arguments. [-Wembedded-directive]
* Remove the local variable shadowing warning by renaming
frameInfo in UnwindLevel1-gcc-ext.c. [-Wshadow]
* Explicitly cast the function pointer to void pointer to avoid
the comparison between function pointer and void pointer.
[-Wpedantic]
llvm-svn: 224690
Most of the changes are to the FuncUnwinders class -- as we've added
more types of unwind information, the way this class was written was
making it a mess to maintain. Instead of trying to keep one
"non-call site" unwind plan and one "call site" unwind plan, track
all the different types of unwind plans we can possibly retrieve for
each function and have the call-site/non-call-site accessor methods
retrieve those.
Add a real "fast unwind plan" for x86_64 / i386 -- when doing an
unwind through a function, this only has to read the first 4 bytes
to tell if the function has a standard prologue sequence. If so,
we can use the architecture default unwind plan to backtrace
through this function. If we try to retrieve the save location for
other registers later on, a real unwind plan will be used. This
one is just for doing fast backtraces.
Change the compact unwind plan importer to fill in the valid address
range it is valid for.
Compact unwind, in theory, may have multiple entries for a single
function. The FuncUnwinders rewrite includes the start of supporting
this correctly. In practice compact unwind encodings are used for
the entire range of the function today -- in fact, sometimes the same
encoding is used for multiple functions that have the same unwind
rules. But I want to handle a single function that has multiple
different compact unwind UnwindPlans eventually.
llvm-svn: 224689
This reapplies r224503 along with a fix for compiling Fortran by having the
clang driver invoke gcc (see r224546, where it was reverted). I have added
a testcase for that as well.
Original commit message:
It is often convenient to use -save-temps to collect the intermediate
results of a compilation, e.g., when triaging a bug report. Besides the
temporary files for preprocessed source and assembly code, this adds the
unoptimized bitcode files as well.
This adds a new BackendJobAction, which is mostly mechanical, to run after
the CompileJobAction. When not using -save-temps, the BackendJobAction is
combined into one job with the CompileJobAction, similar to the way the
integrated assembler is handled. I've implemented this entirely as a
driver change, so under the hood, it is just using -disable-llvm-optzns
to get the unoptimized bitcode.
Based in part on a patch by Steven Wu.
rdar://problem/18909437
llvm-svn: 224688