This partially fixes weird looking load scheduling
in memcpy test. The load clustering doesn't seem
particularly smart, but this method seems to be partially
deprecated so it might not be worth trying to fix.
llvm-svn: 214943
This currently has a noticable effect on the kernel argument loads.
LDS and global loads are more problematic, I think because of how copies
are currently inserted to ensure that the address is a VGPR.
llvm-svn: 214942
This was coming in weird debug info that had variables (and hence
debug_locs) but was in GMLT mode (because it was missing the 13th field
of the compile_unit metadata) so no ranges were constructed. We should
always have at least one range for any CU with a debug_loc in it -
because the range should cover the debug_loc.
The assertion just ensures that the "!= 1" range case inside the
subsequent loop doesn't get entered for the case where there are no
ranges at all, which should never reach here in the first place.
llvm-svn: 214939
CommandReturnObject. Otherwise, all the overridden command
can do is say it overrode the command, not say what it did...
Also removed the duplicate definition of CommandOverrideCallback
from the private interfaces.
Now to figure out how to get this through the SB API's...
<rdar://problem/17911629>
llvm-svn: 214938
Without the 13th field, the "emission kind" field defaults to 0 (which
is not equal to either of the values of the emission kind enum (1 ==
full debug info, 2 == line tables only)).
In this particular instance, the comparison with "FullDebugInfo" was
done when adding elements to the ranges list - so for these test cases
no values were added to the ranges list.
This got weirder when emitting debug_loc entries as the addresses should
be relative to the range of the CU if the CU has only one range (the
reasonable assumption is that if we're emitting debug_loc lists for a CU
that CU has at least one range - but due to the above situation, it has
zero) so the ranges were emitted relative to the start of the section
rather than relative to the start of the CU's singular range.
Fix these tests by accounting for the difference in the description of
debug_loc entries (in some cases making the test ignorant to these
differences, in others adding the extra label difference expression,
etc) or the presence/absence of high/low_pc on the CU, and add the 13th
field to their CUs to enable proper "full debug info" emission here.
In a future commit I'll fix up a bunch of other test cases that are not
so rigorously depending on this behavior, but still doing similarly
weird things due to the missing 13th field.
llvm-svn: 214937
This simplifies construction and usage while making the data structure
smaller. It was a holdover from the days when we didn't have a separate
DebugLocList and all we had was a flat list of DebugLocEntries.
llvm-svn: 214933
The MS mangling scheme apparently has separate manglings for type and
non-type parameter packs when they are empty. Match template arguments
with parameters during mangling; check the parameter to see if it was
destined to hold type-ish things or nontype-ish things.
Differential Revision: http://reviews.llvm.org/D4792
llvm-svn: 214932
This reverts r214893, re-applying r214881 with the test case relaxed a bit to
satiate the build bots.
POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:
POP {r3}
ADD sp, #offset
BX r3
This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:
MOV ip, r3
POP {r3}
ADD sp, #offset
MOV lr, r3
MOV r3, ip
BX lr
http://reviews.llvm.org/D4748
llvm-svn: 214928
mode.
This will cause -verify mode to report failure when RuntimeDyld encounters an
internal error (e.g. overflows in relocation computations). Previously we had
let these errors slip past unreported.
llvm-svn: 214925
This escapes any backslashes in the executable path and fixes an issue
with a trailing quote when the main file name had to be quoted during
printing.
It's impossible to test this without putting backslashes or quotes into
the executable path, so I didn't add automated tests.
The crash diagnostics are still only useful if you're using bash on
Windows, though. This should probably be writing a batch file instead.
llvm-svn: 214924
Commits r213915 and r214718 fix recognition of shuffle masks for vmrg*
and vpku*um instructions for a little-endian target, by swapping the
input arguments. The vsldoi instruction requires similar treatment,
and also needs its shift count adjusted for little endian.
Reviewed by Ulrich Weigand.
This is a bug fix candidate for release 3.5 (and hopefully the last of
those for PowerPC).
llvm-svn: 214923
This is mostly a cleanup, but it changes a fairly old behavior.
Every "real" LTO user was already disabling the silly internalize pass
and creating the internalize pass itself. The difference with this
patch is for "opt -std-link-opts" and the C api.
Now to get a usable behavior out of opt one doesn't need the funny
looking command line:
opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts
llvm-svn: 214919
a test case.
We also miscompile this test case which is showing a serious flaw in the
single-input v8i16 shuffle code. I've left the specific instruction
checks FIXME-ed out until I can address the bug in the single-input
code, but I wanted to separate out a significant functionality change to
produce correct code from a very simple and targeted crasher fix.
The miscompile problem stems from keeping track of inputs by value
rather than by index. As a consequence of doing this, we can't reliably
update those inputs because they might swap and we can't detect this
without copying the mask.
The blend code now uses indices for the input lists and this seems
strictly better. It also should make it easier to sort things and do
other cleanups. I think the time has come to simplify The Great Lambda
here.
llvm-svn: 214914
Vector clocks is the most actively allocated object in tsan runtime.
Current internal allocator is not scalable enough to handle allocation
of clocks in scalable way (too small caches). This changes transforms
clocks to 2-level array with 512-byte blocks. Since all blocks are of
the same size, it's possible to cache them more efficiently in per-thread caches.
llvm-svn: 214912
to instruct the code generator to not enforce a higher alignment
than the given number (of bytes) when accessing memory via an opaque
pointer or reference. Patch reviewed by John McCall (with post-commit
review pending). rdar://16254558
llvm-svn: 214911
This allows accessing an SReg subregister with a normal subregister
index, instead of getting a machine verifier error.
Also be sure to include all of these subregisters in SReg_32.
This fixes inferring SGPR instead of SReg when finding a
super register class.
llvm-svn: 214901
The test produces lines that start with "<word>: " which confuses the
buildbot log parser. Disable the test until either the test is fixed
or the buildbot can deal with the undesired output.
llvm.org/pr20545
llvm-svn: 214900
`BasicBlockFwdRefs` (and `BlockAddrFwdRefs` before it) was being emptied
in a non-deterministic order. When predicting use-list order I've
worked around this another way, but even when parsing lazily (and we
can't recreate use-list order) use-lists should be deterministic.
Make them so by using a side-queue of functions with forward-referenced
blocks that gets visited in order.
llvm-svn: 214899
Optimize the following IR:
%1 = tail call noalias i8* @calloc(i64 1, i64 4)
%2 = bitcast i8* %1 to i32*
; This store is dead and should be removed
store i32 0, i32* %2, align 4
Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store. If the store is to an out of bounds address, it is undefined and thus also removable.
Reviewed By: nicholas
Differential Revision: http://reviews.llvm.org/D3942
llvm-svn: 214897
Allow vector fabs operations on bitcasted constant integer values to be optimized
in the same way that we already optimize scalar fabs.
So for code like this:
%bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000
%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast)
%ret = bitcast <2 x float> %fabs to i64
Instead of generating something like this:
movabsq (constant pool loadi of mask for sign bits)
vmovq (move from integer register to vector/fp register)
vandps (mask off sign bits)
vmovq (move vector/fp register back to integer return register)
We should generate:
mov (put constant value in return register)
I have also removed a redundant clause in the first 'if' statement:
N0.getOperand(0).getValueType().isInteger()
is the same thing as:
IntVT.isInteger()
Testcases for x86 and ARM added to existing files that deal with vector fabs.
One existing testcase for x86 removed because it is no longer ideal.
For more background, please see:
http://reviews.llvm.org/D4770
And:
http://llvm.org/bugs/show_bug.cgi?id=20354
Differential Revision: http://reviews.llvm.org/D4785
llvm-svn: 214892
Note that similar to palingr, we could further optimize these to emit
shufflevector when the shift count is <=64. This however does not
change the overall design that unlike palignr we would still need the LLVM
intrinsic corresponding to this intruction to handle the >64 cases. (palignr
uses the psrldq intrinsic in this case.)
llvm-svn: 214891
This is similar to what I did with the two-source permutation recently. (It's
almost too similar so that we should consider generating the masking variants
with some tablegen help.)
Both encoding and intrinsic tests are added as well. For the latter, this is
what the IR that the intrinsic test on the clang side generates.
Part of <rdar://problem/17688758>
llvm-svn: 214890
This controls the number of operands in the disassembler's x86OperandSets
table. The entries describe how the operand is encoded and its type.
Not to surprisingly 5 operands is insufficient for AVX512. Consider
VALIGNDrrik in the next patch. These are its operand specifiers:
{ /* 328 */
{ ENCODING_DUP, TYPE_DUP1 },
{ ENCODING_REG, TYPE_XMM512 },
{ ENCODING_WRITEMASK, TYPE_VK8 },
{ ENCODING_VVVV, TYPE_XMM512 },
{ ENCODING_RM_CD64, TYPE_XMM512 },
{ ENCODING_IB, TYPE_IMM8 },
},
llvm-svn: 214889
This was currently part of lowering to PALIGNR with some special-casing to
make interlane shifting work. Since AVX512F has interlane alignr (valignd/q)
and AVX512BW has vpalignr we need to support both of these *at the same time*,
e.g. for SKX.
This patch breaks out the common code and then add support to check both of
these lowering options from LowerVECTOR_SHUFFLE.
I also added some FIXMEs where I think the AVX512BW and AVX512VL additions
should probably go.
llvm-svn: 214888
They have different semantics (valign is interlane while palingr is intralane)
and palingr is still needed even in the AVX512 context. According to the
latest spec AVX512BW provides these.
llvm-svn: 214887
The packed integer pattern becomes the DAG pattern for rri and the packed
float, another Pat<> inside the multiclass.
No functional change.
llvm-svn: 214885