dealing with module level emission. Currently this is using
the Triple to determine, but eventually the logic should
probably migrate to TLOF.
llvm-svn: 228332
After r228258, Clang started emitting C++ EH IR that LLVM wasn't ready
to deal with, even when exceptions were disabled with /EHs-. This time,
make /EHs- turn off -fexceptions while still emitting exceptional
constructs in functions using __try. Since Sema rejects C++ exception
handling constructs before CodeGen, landingpads should only appear in
such functions as the result of a __try.
llvm-svn: 228329
PowerPC supports pre-increment load/store instructions (except for Altivec/VSX
vector load/stores). Using these on embedded cores can be very important, but
most loops are not naturally set up to use them. We can often change that,
however, by placing loops into a non-canonical form. Generically, this means
transforming loops like this:
for (int i = 0; i < n; ++i)
array[i] = c;
to look like this:
T *p = array[-1];
for (int i = 0; i < n; ++i)
*++p = c;
the key point is that addresses accessed are pulled into dedicated PHIs and
"pre-decremented" in the loop preheader. This allows the use of pre-increment
load/store instructions without loop peeling.
A target-specific late IR-level pass (running post-LSR), PPCLoopPreIncPrep, is
introduced to perform this transformation. I've used this code out-of-tree for
generating code for the PPC A2 for over a year. Somewhat to my surprise,
running the test suite + externals on a P7 with this transformation enabled
showed no performance regressions, and one speedup:
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk
-2.32514% +/- 1.03736%
So I'm going to enable it on everything for now. I was surprised by this
because, on the POWER cores, these pre-increment load/store instructions are
cracked (and, thus, harder to schedule effectively). But seeing no regressions,
and feeling that it is generally easier to split instructions apart late than
it is to combine them late, this might be the better approach regardless.
In the future, we might want to integrate this functionality into LSR (but
currently LSR does not create new PHI nodes, so (for that and other reasons)
significant work would need to be done).
llvm-svn: 228328
PowerPC supports pre-increment floating-point load/store instructions, both r+r
and r+i, and we had patterns for them, but they were not marked as legal. Mark
them as legal (and add a test case).
llvm-svn: 228327
The combine that forms extloads used to be disabled on vector types,
because "None of the supported targets knows how to perform load and
sign extend on vectors in one instruction."
That's not entirely true, since at least SSE4.1 X86 knows how to do
those sextloads/zextloads (with PMOVS/ZX).
But there are several aspects to getting this right.
First, vector extloads are controlled by a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).
The interesting optimization enables folding of s/zextloads to illegal
(splittable) vector types, expanding them into smaller legal extloads.
It's not ideal (it introduces some legalization-like behavior in the
combine) but it's better than the obvious alternative: form illegal
extloads, and later try to split them up. If you do that, you might
generate extloads that can't be split up, but have a valid ext+load
expansion. At vector-op legalization time, it's too late to generate
this kind of code, so you end up forced to scalarize. It's better to
just avoid creating egregiously illegal nodes.
This optimization is enabled unconditionally on X86.
Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added
FIXME).
Also note that the existing combine that forms extloads is now also
enabled on legal vectors. This doesn't have a big effect on X86
(because sext+load is usually combined to sext_inreg+aextload).
On ARM it fires on some rare occasions; that's for a separate commit.
Differential Revision: http://reviews.llvm.org/D6904
llvm-svn: 228325
The return value's address must be returned in %rax.
i.e. the callee needs to copy the sret argument (%rdi)
into the return value (%rax).
This probably won't manifest as a bug when the caller is LLVM-compiled
code. But it is an ABI guarantee and tools expect it.
llvm-svn: 228321
Summary:
This commit adds a new open flag File::eOpenOptionCloseOnExec (i.e., O_CLOEXEC), and adds it to
the list of flags when opening log files (#ifndef windows). A regression test is included.
Reviewers: vharron, clayborg
Subscribers: lldb-commits
Differential Revision: http://reviews.llvm.org/D7412
llvm-svn: 228310
We should be setting UnrollingPreferences::MaxCount to MAX_UINT instead
of UnrollingPreferences::Count.
Count is a 'forced unrolling factor', while MaxCount sets an upper
limit to the unrolling factor.
Setting Count to MAX_UINT was causing the loop in the testcase to be
unrolled 15 times, when it only had a maximum of 4 iterations.
llvm-svn: 228303
The llvm.SI.end.cf intrinsic is used to mark the end of if-then blocks,
if-then-else blocks, and loops. It is responsible for updating the
exec mask to re-enable threads that had been masked during the preceding
control flow block. For example:
s_mov_b64 exec, 0x3 ; Initial exec mask
s_mov_b64 s[0:1], exec ; Saved exec mask
v_cmpx_gt_u32 exec, s[2:3], v0, 0 ; llvm.SI.if
do_stuff()
s_or_b64 exec, exec, s[0:1] ; llvm.SI.end.cf
The bug fixed by this patch was one where the llvm.SI.end.cf intrinsic
was being inserted into the header of loops. This would happen when
an if block terminated in a loop header and we would end up with
code like this:
s_mov_b64 exec, 0x3 ; Initial exec mask
s_mov_b64 s[0:1], exec ; Saved exec mask
v_cmpx_gt_u32 exec, s[2:3], v0, 0 ; llvm.SI.if
do_stuff()
LOOP: ; Start of loop header
s_or_b64 exec, exec, s[0:1] ; llvm.SI.end.cf <-BUG: The exec mask has the
same value at the beginning of each loop
iteration.
do_stuff();
s_cbranch_execnz LOOP
The fix is to create a new basic block before the loop and insert the
llvm.SI.end.cf there. This way the exec mask is restored before the
start of the loop instead of at the beginning of each iteration.
llvm-svn: 228302
Patch by Kit Barton.
Add the vector count leading zeros instruction for byte, halfword,
word, and doubleword sizes. This is a fairly straightforward addition
after the changes made for vpopcnt:
1. Add the correct definitions for the various instructions in
PPCInstrAltivec.td
2. Make the CTLZ operation legal on vector types when using P8Altivec
in PPCISelLowering.cpp
Test Plan
Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the
instructions are being generated when the CTLZ operation is used in
LLVM.
Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s
and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively.
llvm-svn: 228301
Summary:
This makes clang-tidy merge the default set of checks with the one
provided in the configuration file instead of just using the checks from the
config file. This adds a way to modify the default set of checks while the
previous behavior required to always define the set of checks completely.
Reviewers: djasper
Reviewed By: djasper
Subscribers: curdeius, cfe-commits
Differential Revision: http://reviews.llvm.org/D7434
llvm-svn: 228298
Implement a BITCAST dag combine to transform i32->mmx conversion patterns
into a X86 specific node (MMX_MOVW2D) and guarantee that moves between
i32 and x86mmx are better handled, i.e., don't use store-load to do the
conversion..
llvm-svn: 228293
Avoid regression in previously supported MMX code by adding different
combinations of tests which exercise MMX bitcasts. Small improvements
to these patterns should come next.
llvm-svn: 228292
by manually adding __asan_mz_* to the generated interface functions list.
Declaring these functions in asan_interface_internal.h doesn't work quite well:
their prototypes must match the prototypes of zone functions in malloc/malloc.h,
but some of the types (e.g. malloc_zone_t and size_t) aren't available in
asan_interface_internal.h
llvm-svn: 228290
Summary:
Detect constructors taking a single std::initializer_list even when it
is instantiation-dependent.
Reviewers: djasper
Reviewed By: djasper
Subscribers: curdeius, cfe-commits
Differential Revision: http://reviews.llvm.org/D7431
llvm-svn: 228289
This patch includes following changes:
Fix comments and code style
Add new tests for many commands
Improve existing tests
Merge MiProgramArgsTestCase and MiExecTestCase
Improve runCmd of MiTestCaseBase: add exactly option
Improve test example (make it more complicated)
Patch from ki.stfu. I xfailed some tests on Linux and also
added an empty command after -gdb-exit as it blocks without
it.
Patch was reviewed in http://reviews.llvm.org/D7410.
llvm-svn: 228286
Summary:
Mac-only tests were leaving .d.$$$$ dependency files left in the test tree. This happened
because:
1) "make clean" was invoked although no tests in the folder were run
2) The Makefile had main.d as a dependency, which meant it tried to compile the source file (to extract
dependencies).
3) The compile failed (does not compile on linux) and left behind a main.d.$$$$ temporary file.
4) "make clean" tried to clean up these temporary files, but the expansion $(wildcard *.d.[0-9]*)
was performed before the temporary file was created, so this file was not caught.
I fix this by giving all the temporary files deterministic names, which makes it easier to clean
them up afterwards. I also make the rules for generating the dependency files more robust and
less likely to leak files in the first place.
Reviewers: vharron, zturner
Subscribers: lldb-commits
Differential Revision: http://reviews.llvm.org/D7406
llvm-svn: 228285
Add a new _LIBCPP_UNUSED define in __config, which can be used to
indicate explicitly unused items, and apply it to the __imp__ field of
__libcpp_refstring.
Somebody who knows about Microsoft C++ and IBM C++ should fill in the
unused attribute syntax appropriate for those compilers, if there is
any.
Differential Revision: http://reviews.llvm.org/D6836
llvm-svn: 228281