Before this change, method 'isShuffleMaskLegal' didn't know that shuffles
implementing a 'movhlps' operation were perfectly legal for SSE targets.
This patch adds the missing check for 'isMOVHLPSMask' inside method
'isShuffleMaskLegal' to fix the problem.
The reason why it is important to do this is because the DAGCombiner
conservatively avoids combining a pair of shuffles if the resulting shuffle
node has an illegal mask. Before this patch, shuffles with a MOVHLPS mask were
wrongly considered not to be legal. This was the root cause of some poor-code
generation bugs.
llvm-svn: 213137
This re-enables some #if 0'd code (since 2010) in the Path unittests
and makes at least a weak effort at testing sys::path's rbegin/rend.
This change was inspired by some test failures near uses of rbegin and
rend here:
http://lab.llvm.org:8011/builders/clang-x86_64-linux-vg/builds/3209
The "valgrind was whining" comment looked promising in terms of a
simpler to debug case of the same errors. However, it appears that the
valgrind complaints the comment was referring to are distinct from the
ones in the frontend, since this updated test isn't complaining for me
under valgrind.
In any case, the disabled tests weren't helping anybody.
llvm-svn: 213125
This was an oversight in the original support. As it is, I stuffed this
bit into the alignment. The alignment is stored in log2 form, so it
doesn't need more than 5 bits, given that Value::MaximumAlignment is 1
<< 29.
Reviewers: nicholas
Differential Revision: http://reviews.llvm.org/D3943
llvm-svn: 213118
In the original version of the patch the behaviour was like described in
the comment. This behaviour was changed before committing it without
updating the comment.
llvm-svn: 213117
On Windows, wildcard expansion isn't performed by the shell, but left to the
program itself. The common way to do this is to link with setargv.obj, which
performs the expansion on argc/argv before main is entered. However, we don't
use argv in Clang on Windows, but instead call GetCommandLineW so we can handle
unicode arguments. This means we have to do wildcard expansion ourselves.
A test case will be added on the Clang side.
Differential Revision: http://reviews.llvm.org/D4529
llvm-svn: 213114
This patch modifies the existing DiagnosticInfo system to create a generic base
class that is inherited to produce diagnostic-based warnings. This is used by
the loop vectorizer to trigger a warning when vectorization is forced and
fails. Several tests have been added to verify this behavior.
Reviewed by: Arnold Schwaighofer
llvm-svn: 213110
There is no need to pass on TLI separately to the function. As Eric pointed out
the Target Machine already provides everything we need.
llvm-svn: 213108
There exists a helper function to abstract away the various differences
between ConstantVector, ConstantDataVector, ConstantAggregateZero, etc.
Use it to simplify X86WindowsTargetObjectFile::getSectionForConstant.
llvm-svn: 213104
Refactoring; no functional changes intended
Removed PostRAScheduler bits from subtargets (X86, ARM).
Added PostRAScheduler bit to MCSchedModel class.
This bit is set by a CPU's scheduling model (if it exists).
Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86:
a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget.
b. MIPS overrides the CPU's postRA settings by enabling postRA for everything.
c. PPC overrides the CPU's postRA settings by enabling postRA for everything.
d. X86 is the only target that actually has postRA specified via sched model info.
Differential Revision: http://reviews.llvm.org/D4217
llvm-svn: 213101
Removing the native CMakeCache.txt causes the target to get re-run needlessly
on some systems. We'll want another solution for that part of the fix.
llvm-svn: 213099
Just tried this on a few tests and this was the only one that was
easily ported to use the new feature, so we'll go with that for now.
Hopefully can act as inspiration/reminder for other tests.
Not all debug info tests need to check for every DW_TAG or NULL child
terminator, but perhaps they should (just to ensure they don't accidentally
end up with tags nested inside other tags without the test failing, for example)
llvm-svn: 213092
This adds support for building native artifacts when cross-compiling using the
popular side-by-side source directory layout (no symlinks, no nested
repositories).
llvm-svn: 213091
Add a `MapVector::remove_if()` that erases items in bulk in linear time,
as opposed to quadratic time for repeated calls to `MapVector::erase()`.
llvm-svn: 213090
The registration scheme used in r211652 violated the read-only contract of
MemoryBuffer. This caused crashes in llvm-rtdyld where macho objects were backed
by read-only mmap'd memory.
llvm-svn: 213086
Actually update the changed indexes in the map portion of `MapVector`
when erasing from the middle. Add a unit test that checks for this.
Note that `MapVector::erase()` is a linear time operation (it was and
still is). I'll commit a new method in a moment called
`MapVector::remove_if()` that deletes multiple entries in linear time,
which should be slightly less painful.
llvm-svn: 213084
The coalescer is very aggressive at propagating constraints on the register classes, and the register allocator doesn’t know how to split sub-registers later to recover. This patch provides an escape valve for targets that encounter this problem to limit coalescing.
This patch also implements such for ARM to lower register pressure when using lots of large register classes. This works around PR18825.
llvm-svn: 213078
LDP is unpredictable if the registers in the pair are identical, these tests check that we don't assemble instructions like that and error out instead.
llvm-svn: 213074
v2: use ffbh/l if available
v3: Rebase on top of Matt's SI patches
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 213072
Summary: Previously all the test cases set it after initialization with '.module fp=xx'.
Differential Revision: http://reviews.llvm.org/D4489
llvm-svn: 213071
This patch adds two new rules to the DAGCombiner:
1. shuffle (shuffle A, Undef, M0), B, M1 -> shuffle A, B, M2
2. shuffle (shuffle A, Undef, M0), A, M1 -> shuffle A, Undef, M2
We only do this if the combined shuffle is legal for the target.
Example:
;;
define <4 x float> @test(<4 x float> %a, <4 x float> %b) {
%1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32><i32 6, i32 0, i32 1, i32 7>
%2 = shufflevector <4 x float> %1, <4 x float> %b, <4 x i32><i32 1, i32 2, i32 4, i32 5>
ret <4 x i32> %2
}
;;
(using llc -mcpu=corei7 -march=x86-64)
Before, the x86 backend generated:
pshufd $120, %xmm0, %xmm0
shufps $-108, %xmm0, %xmm1
movaps %xmm1, %xmm0
Now the x86 backend generates:
movsd %xmm1, %xmm0
llvm-svn: 213069
Fixes a gcc warning caused by a typo. A redundant assignment operation was
accidentally used as the third operand of a conditional expression.
No functional change intended.
llvm-svn: 213061
This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.
This should also enable patchpoint intrinsic lowering for FastISel on X86.
Related to <rdar://problem/17427052>.
llvm-svn: 213049
Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook."
Revert "[FastISel][X86] Implement the FastLowerCall hook."
This reverts commit r213035, r213036, and r213037 to make the
buildbots happy again.
llvm-svn: 213048
Instead of specifying 32-bit x86, specify 32-bit x86 linux.
This test is testing a very specific behavior which changed with
WinCOFF's constant pools.
llvm-svn: 213041
The constant pool entry code for WinCOFF assumed that vector constants
would be formed using ConstantDataVector, it did not expect to see a
ConstantVector. Furthermore, it did not expect undef as one of the
elements of the vector.
ConstantVectors should be handled like ConstantDataVectors, treat Undef
as zero.
llvm-svn: 213038
This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.
This should also enable patchpoint intrinsic lowering for FastISel on X86.
Related to <rdar://problem/17427052>.
llvm-svn: 213035
The patchpoint instruction should have been inserted before the target
generated call instruction to be inside the ADJSTACKDOWN/ADJSTACKUP call
sequence window.
llvm-svn: 213034
Always update the value map with the result register (if there is one), for the
patchpoint instruction we created to replace the target-specific call
instruction.
llvm-svn: 213033
This helps avoid redundant instructions to unpack, and repack
the vectors. Ideally we could recognize that pattern and eliminate
it. Currently v4i8 and other small element type vectors are scalarized,
so this has the added bonus of avoiding that.
llvm-svn: 213031
Now functions 'test4', 'test9', 'test14' and 'test19' correctly perform
a move of two packed values from the high quadword of vector %b to the low
quadword of vector %a (movhlps idiom).
No functional change intended.
llvm-svn: 213029
This patch fixes a crasher in method 'DAGCombiner::visitOR' due to an invalid
call to method 'isShuffleMaskLegal'. On x86, method 'isShuffleMaskLegal'
always expects a legal vector value type in input.
With this patch, we immediately check if the input OR dag node has a legal
vector type; we only try to fold a OR dag node into a single shufflevector
if we know that the resulting shuffle will have a legal type.
This is to avoid calling method 'isShuffleMaskLegal' on a potentially
illegal vector value type.
Added a new test-case to file 'CodeGen/X86/combine-or.ll' to verify that
DAGCombiner doesn't crash in the attempt to check/combine an OR between shuffles
with illegal types.
llvm-svn: 213020
reading MachO files magic numbers in RuntimeDyld.
This is required now that we're testing cross-platform JITing (via
RuntimeDyldChecker), and should fix some issues that David Fang has seen on PPC
builds.
llvm-svn: 213012
No functional change.
The offsets for the other bitfields are specified symbolically. I need to
increase the size for one of the earlier fields which is easier after this
cleanup.
Why these bits are relative to VEXShift is a bit strange but that is for
another cleanup.
I made sure that the values for the enums are unchanged after this change.
llvm-svn: 213011
COFF lacks a feature that other object file formats support: mergeable
sections.
To work around this, MSVC sticks constant pool entries in special COMDAT
sections so that each constant is in it's own section. This permits
unused constants to be dropped and it also allows duplicate constants in
different translation units to get merged together.
This fixes PR20262.
Differential Revision: http://reviews.llvm.org/D4482
llvm-svn: 213006
This patch teaches the DAGCombiner how to fold a pair of shuffles
according to rules:
1. shuffle(shuffle A, B, M0), B, M1) -> shuffle(A, B, M2)
2. shuffle(shuffle A, B, M0), A, M1) -> shuffle(A, B, M3)
The new rules would only trigger if the resulting shuffle has legal type and
legal mask.
Added test 'combine-vec-shuffle-3.ll' to verify that DAGCombiner correctly
folds shuffles on x86 when the resulting mask is legal. Also added some negative
cases to verify that we avoid introducing illegal shuffles.
llvm-svn: 213001
The underlying function. utohex_buffer, already supports an argument for
deciding if the hex characters should be upper or lower case. Expose an
identical argument for utohexstr.
llvm-svn: 212991
Until now, attempting to create an alias of a required option would
complain if the user supplied the alias, because the required option
didn't have a value. Similarly, if you said the alias was required,
then using the base option would complain that the alias wasn't
supplied. Lastly, if you put required on both, *neither* option would
work.
By changning alias to overload addOccurrence and setting cl::Required
on the original option, we can get this to behave in a more useful
way. I've also added a test and updated a user that was getting this
wrong.
llvm-svn: 212986
Determining the bounds of x/ -1 would start off with us dividing it by
INT_MIN. Suffice to say, this would not work very well.
Instead, handle it upfront by checking for -1 and mapping it to the
range: [INT_MIN + 1, INT_MAX. This means that the result of our
division can be any value other than INT_MIN.
llvm-svn: 212981
Summary:
When calculating the upper bound of X / -8589934592, we would perform
the following calculation: Floor[INT_MAX / 8589934592]
However, flooring the result would make us wrongly come to the
conclusion that 1073741824 was not in the set of possible values.
Instead, use the ceiling of the result.
Reviewers: nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4502
llvm-svn: 212976
We would emit a libcall for a 64-bit atomic on x86 after SVN r212119. This was
due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a
32-bit target. They were added at different times and would result in the
border condition being mishandled.
This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic
operations on x86 at the cost of restoring a long-standing bug in the codegen.
We emit a cmpxchg8b on all x86 targets even where the CPU does not support this
instruction (pre-Pentium CPUs). Although this bug should be fixed, this was
present prior to SVN r212119 and this change, so this is not really introducing
a regression.
llvm-svn: 212956
Found during windows unwinding work. This header is indirectly included through
a chain leading through Support/Win64EH.h. Explicitly include the header. NFC.
llvm-svn: 212955
The size of the uninitialized sections, like BSS, can exceed the size of
the object file.
Do not attempt to grab the contents of such sections.
llvm-svn: 212953
We construct a temporary "atomicrmw xchg" instruction when lowering atomic
stores for widths that aren't supported natively. This isn't on the top-level
worklist though, so it won't be removed automatically and we have to do it
ourselves once that itself has been lowered.
Thanks Saleem for pointing this out!
llvm-svn: 212948
Summary:
.bss, .text, and .data are at least 16-byte aligned.
.reginfo is 4-byte aligned and has a 24-byte EntrySize.
.MIPS.abiflags has an 24-byte EntrySize.
.MIPS.options is 8-byte aligned and has 1-byte EntrySize.
Using a 1-byte EntrySize for .MIPS.options seems strange because the
records are neither 1-byte long nor fixed-length but this matches the value
that GAS emits.
Differential Revision: http://reviews.llvm.org/D4487
llvm-svn: 212939
Summary:
This is because the FP64A the hardware will redirect 32-bit reads/writes
from/to odd-numbered registers to the upper 32-bits of the corresponding
even register. In effect, simulating FR=0 mode when FR=0 mode is not
available.
Unfortunately, we have to make the decision to avoid mfc1/mtc1 before
register allocation so we currently do this for even registers too.
FPXX has a similar requirement on 32-bit architectures that lack
mfhc1/mthc1 so this patch also handles the affected moves from the FPU for
FPXX too. Moves to the FPU were supported by an earlier commit.
Differential Revision: http://reviews.llvm.org/D4484
llvm-svn: 212938
Summary:
This is similar to r210771 which did the same thing for MTHC1.
Also corrected MTHC1_D32 and MTHC1_D64 which used AFGR64 and FGR64 on the
wrong definitions.
Differential Revision: http://reviews.llvm.org/D4483
llvm-svn: 212936
For example, c-index-test.exe requires just libclang.dll (its import library).
When libraries in libclang were not PRIVATE but PUBLIC, c-index-test required libraries transitive by libclang.
Note, on mingw with BUILD_SHARED_LIBS, library dependencies would become more strict.
In principle, required libraries should be "required in its source file".
This will help to detect missing dependencies.
llvm-svn: 212934
enabled and mthc1 and dmtc1 are not available (e.g. on MIPS32r1)
This prevents the upper 32-bits of a double precision value from being moved to
the FPU with mtc1 to an odd-numbered FPU register. This is necessary to ensure
that the code generated executes correctly regardless of the current FPU mode.
MIPS32r2 and above continues to use mtc1/mthc1, while MIPS-IV and above continue
to use dmtc1.
Differential Revision: http://reviews.llvm.org/D4465
llvm-svn: 212930
This crash was pretty common while compiling Rust for iOS (armv7). Reason -
SjLj preparation step was lowering aggregate arguments as ExtractValue +
InsertValue. ExtractValue has assertion which checks that there is some data in
value, which is not true in case of empty (no fields) structures. Rust uses
them quite extensively so this patch uses a 'select true, %val, undef'
instruction to lower the argument.
Patch by Valerii Hiora.
llvm-svn: 212922
Verify that DAGCombiner does not crash when trying to fold a pair of shuffles
according to rule (added at r212539):
(shuffle (shuffle A, Undef, M0), Undef, M1) -> (shuffle A, Undef, M2)
The DAGCombiner avoids folding shuffles if the resulting shuffle dag node
is not legal for the target. That means, the resulting shuffle must have
legal type and legal mask.
Before, the DAGCombiner only called method
'TargetLowering::isShuffleMaskLegal' to check if it was "safe" to fold according
to the above-mentioned rule. However, this caused a crash in the x86 backend
since method 'isShuffleMaskLegal' always expects to be called on a
legal vector type.
llvm-svn: 212915
This is the first of a number of changes designed to generalise
MCWin64EHInstruction to support different target architectures. An ordered set
(vector) of these instructions is saved per frame to permit the emission of
information for Windows NT style unwinding. The only bit of information which
is actually target specific here is the Opcode for the unwinding bytecode. The
remainder of the information is simply generic information that is relevant to
the Windows NT unwinding model.
Remove the accessors for the fields, making them const and public instead. Sink
the knowledge of the alias'ed name into the single source and sink a single-use
check method into the use.
llvm-svn: 212914
Rename member variables and functions for the MCStreamer for DWARF-like
unwinding management. Rename the Windows ones as well and make the naming and
handling similar across the two. No functional change intended.
llvm-svn: 212912
Our verifier check for checking if a global has local linkage was too
strict. Forbid private linkage but permit local linkage.
Object file formats permit this and forbidding it prevents elimination
of unused, internal, vftables under the MSVC ABI.
llvm-svn: 212900
MC was aping a binutils bug where aliases would default their linkage to
private instead of internal.
I've sent a patch to the binutils maintainers and they've recently
applied it to the GNU assembler sources.
This fixes PR20152.
Differential Revision: http://reviews.llvm.org/D4395
llvm-svn: 212899
This adds a llvm.aarch64.hint intrinsic to mirror the llvm.arm.hint in order to
support the various hint intrinsic functions in the ACLE.
Add an optional pattern field that permits the subclass to specify the pattern
that matches the selection. The intrinsic pattern is set as mayLoad, mayStore,
so overload the value for the definition of the hint instruction.
llvm-svn: 212883
Due to the fact that the windows unwinding has the concept of chained frames, we
maintain a current frame info pointer that is adjusted on any push and pop of a
unwinding context. This just removes an unnecessary variable that was used to
mirror the DWARF unwinding code.
llvm-svn: 212882
This structure contains information related to the call frame used to generate
unwinding information. Rename this to reflect the future use to represent the
shared state between various architectures for WinCFI information.
llvm-svn: 212881
not properly handle the case where the predecessor block was the entry block to
the function. The only in-tree client of this is JumpThreading, which worked
around the issue in its own code. This patch moves the solution into the helper
so that JumpThreading (and other clients) do not have to replicate the same fix
everywhere.
llvm-svn: 212875
Currently ASan instrumentation pass creates a string with global name
for each instrumented global (to include global names in the error report). Global
name is already mangled at this point, and we may not be able to demangle it
at runtime (e.g. there is no __cxa_demangle on Android).
Instead, create a string with fully qualified global name in Clang, and pass it
to ASan instrumentation pass in llvm.asan.globals metadata. If there is no metadata
for some global, ASan will use the original algorithm.
This fixes https://code.google.com/p/address-sanitizer/issues/detail?id=264.
llvm-svn: 212872
Implementation is small now -- the interesting logic was moved to
`BranchProbability` a while ago. Move it into `bfi_detail` and get rid
of the related TODOs.
I was originally planning to define it within `BlockFrequencyInfoImpl`
(or `BFIIBase`), but it seems cleaner in a namespace. Besides,
`isPodLike` needs to be specialized before `BlockMass` can be used in
some of the other data structures, and there isn't a clear way to do
that.
llvm-svn: 212866
This implements the target-independent lowering for the patchpoint
intrinsic. Targets have to implement the FastLowerCall
hook to support this intrinsic.
Related to <rdar://problem/17427052>
llvm-svn: 212849
The infrastructure mimics the call lowering we have already in place for
SelectionDAG, but with limitations. For example structure return demotion and
non-simple types are not supported (yet).
Currently every backend has its own implementation and duplicated code for call
lowering. There is also no specified interface that could be called from
target-independent code. The target-hook is opt-in and doesn't affect current
implementations.
llvm-svn: 212848
Create a separate helper function for target-independent intrinsic lowering. Also
add an target-hook that allows to directly call into a target-sepcific intrinsic
lowering method. Currently the implementation is opt-in and doesn't affect
existing target implementations.
llvm-svn: 212843
the specified section. This is same functionality as darwin’s nm(1) "-s" flag.
There is one FIXME in the code and I’m all ears to anyone that can help me
with that. This option takes exactly two strings and should be allowed
anywhere on the command line. Such that "llvm-nm -s __TEXT __text foo.o"
would work. But that does not as the CommandLine Library does not have a
way to make this work as far as I can tell. For now the "-s __TEXT __text"
has to be last on the command line.
llvm-svn: 212842
The memcpy() and overlap helps didn't help much with timings, so clean up the change.
The difference at this point is that we now leave growth of the storage buffer
up to SmallVector's implementation:
- OS.reserve(OS.capacity() * 2);
+ OS.reserve(OS.size() + 64);
llvm-svn: 212837
This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that
could result in the LocalStackAlloc pass creating an MI instruction
out-of-range displacement:
%vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32)
%G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31
(In final assembler output the top bits are stripped off, resulting
in a negative offset loading from below the stack pointer.)
Common code expects the isFrameOffsetLegal routine to verify whether
adding a given offset to the offset already present in the instruction
results in a valid displacement. However, on PowerPC the routine
did not take the already present instruction offset into account.
This commit fixes isFrameOffsetLegal to add the instruction offset,
and updates a local caller (needsFrameBaseReg) to no longer add the
instruction offset itself before calling isFrameOffsetLegal.
Reviewed by Hal Finkel.
llvm-svn: 212832
We need the intrinsics with offsets, so why not just add them all.
The R128 parameter will also be useful for reducing SGPR usage.
GL_ARB_image_load_store also adds some image GLSL modifiers like "coherent",
so Mesa will probably translate those to slc, glc, etc.
When LLVM 3.5 is released, I'll switch Mesa to these new intrinsics.
llvm-svn: 212830
Including the scratch buffer size in the initial reservation eliminates the
subsequent malloc+move operation and offers a healthier constant growth with
less memory wastage.
When doing this, take care to avoid invalidating the source buffer.
llvm-svn: 212816
Summary:
Add FileCheck -implicit-check-not option which allows specifying a
pattern that should only occur in the input when explicitly matched by a
positive check. This feature allows checking tool diagnostics in a way
clang -verify does it for compiler diagnostics.
The option has been tested on a number of clang-tidy checks, I'll post a link to
the clang-tidy patch to this thread.
Once there's an agreement on the general direction, I can add tests and
documentation.
Reviewers: djasper, bkramer
Reviewed By: bkramer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4462
llvm-svn: 212810
No functional change. As I was trying to understand this function, I found
that variables were reused with confusing names and the broadcast case was a
bit too implicit. Hopefully, this is an improvement.
llvm-svn: 212795
It was computing the VL/n case as:
MemObjSize = VectorByteSize / ElemByteSize / Divider * ElemByteSize
ElemByteSize not only falls out but VectorByteSize/Divider now actually
matches the definition of VL/n.
Also some formatting fixes.
llvm-svn: 212794
This reverts commit r212776.
Nope, still seems to be failing on the sanitizer bots... but hey, not
the msan self-host anymore, it's failing in asan now. I'll start looking
there next.
llvm-svn: 212793
The compiler often emits assembler-local labels (beginning with 'L') for use in
relocation expressions, however these aren't included in the object files.
Teach RuntimeDyldChecker to warn the user if they try to use one of these in an
expression, since it will never work.
llvm-svn: 212777
Committed in r212205 and reverted in r212226 due to msan self-hosting
failure, I believe I've got that fixed by r212761 to Clang.
Original commit message:
"Originally committed in r211723, reverted in r211724 due to failure
cases found and fixed (ArgumentPromotion: r211872, Inlining: r212065),
committed again in r212085 and reverted again in r212089 after fixing
some other cases, such as debug info subprogram lists not keeping track
of the function they represent (r212128) and then short-circuiting
things like LiveDebugVariables that build LexicalScopes for functions
that might not have full debug info.
And again, I believe the invariant actually holds for some reasonable
amount of code (but I'll keep an eye on the buildbots and see what
happens... ).
Original commit message:
PR20038: DebugInfo: Inlined call sites where the caller has debug info
but the call itself has no debug location.
This situation does bad things when inlined, so I've fixed Clang not to
produce inlinable call sites without locations when the caller has debug
info (in the one case where I could find that this occurred). This
updates the PR20038 test case to be what clang now produces, and readds
the assertion that had to be removed due to this bug.
I've also beefed up the debug info verifier to help diagnose these
issues in the future, and I hope to add checks to the inliner to just
assert-fail if it encounters this situation. If, in the future, we
decide we have to cope with this situation, the right thing to do is
probably to just remove all the DebugLocs from the inlined
instructions."
llvm-svn: 212776
Use alg. from LegalizeDAG.cpp
Move Expand setting to SIISellowering
v2: Extend existing tests instead of creating new ones
v3: use separate LowerFPTOSINT function
v4: use TargetLowering::expandFP_TO_SINT
add comment about using FP_TO_SINT for uints
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 212773
Move the code to a helper function to allow calls from TypeLegalizer.
No functionality change intended
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Owen Anderson <resistor@mac.com>
llvm-svn: 212772
Add test cases where we don't expect to trigger the combine optimizations
introduced at revision 212748.
No functional change intended.
llvm-svn: 212756
This patch teaches the DAGCombiner how to fold shuffles according to the
following new rules:
1. shuffle(shuffle(x, y), undef) -> x
2. shuffle(shuffle(x, y), undef) -> y
3. shuffle(shuffle(x, y), undef) -> shuffle(x, undef)
4. shuffle(shuffle(x, y), undef) -> shuffle(y, undef)
The backend avoids to combine shuffles according to rules 3. and 4. if
the resulting shuffle does not have a legal mask. This is to avoid introducing
illegal shuffles that are potentially expanded into a sub-optimal sequence of
target specific dag nodes during vector legalization.
Added test case combine-vec-shuffle-2.ll to verify that we correctly triggers
the new rules when combining shuffles.
llvm-svn: 212748
passes in the mips back end. This, unfortunately, required a
bit of churn in the various predicates to use a pointer rather
than a reference.
llvm-svn: 212744
Fix a crash in `InstCombiner::Descale()` when a multiply-by-zero gets
created as an argument to a GEP partway through an iteration, causing
-instcombine to optimize the GEP before the multiply.
rdar://problem/17615671
llvm-svn: 212742
This is the one remaining place I see where passing
isSafeToSpeculativelyExecute a DataLayout pointer might matter (at least for
loads) -- I think I got the others in r212720. Most of the other remaining
callers of isSafeToSpeculativelyExecute only use it for call sites (or
otherwise exclude loads).
llvm-svn: 212730
This patch teaches the AsmParser to accept some logical+immediate
instructions and convert them as shown:
bic Rd, Rn, #imm -> and Rd, Rn, #~imm
bics Rd, Rn, #imm -> ands Rd, Rn, #~imm
orn Rd, Rn, #imm -> orr Rd, Rn, #~imm
eon Rd, Rn, #imm -> eor Rd, Rn, #~imm
Those instructions are an alternate syntax available to assembly coders,
and are needed in order to support code already compiling with some other
assemblers. For example, the bic construct is used by the linux kernel.
llvm-svn: 212722
isSafeToSpeculativelyExecute can optionally take a DataLayout pointer. In the
past, this was mainly used to make better decisions regarding divisions known
not to trap, and so was not all that important for users concerned with "cheap"
instructions. However, now it also helps look through bitcasts for
dereferencable loads, and will also be important if/when we add a
dereferencable pointer attribute.
This is some initial work to feed a DataLayout pointer through to callers of
isSafeToSpeculativelyExecute, generally where one was already available.
llvm-svn: 212720
Summary:
When -mno-odd-spreg is in effect, 32-bit floating point values are not
permitted in odd FPU registers. The option also prohibits 32-bit and 64-bit
floating point comparison results from being written to odd registers.
This option has three purposes:
* It allows support for certain MIPS implementations such as loongson-3a that
do not allow the use of odd registers for single precision arithmetic.
* When using -mfpxx, -mno-odd-spreg is the default and this allows us to
statically check that code is compliant with the O32 FPXX ABI since mtc1/mfc1
instructions to/from odd registers are guaranteed not to appear for any
reason. Once this has been established, the user can then re-enable
-modd-spreg to regain the use of all 32 single-precision registers.
* When using -mfp64 and -mno-odd-spreg together, an O32 extension named
O32 FP64A is used as the ABI. This is intended to provide almost all
functionality of an FR=1 processor but can also be executed on a FR=0 core
with the assistance of a hardware compatibility mode which emulates FR=0
behaviour on an FR=1 processor.
* Added '.module oddspreg' and '.module nooddspreg' each of which update
the .MIPS.abiflags section appropriately
* Moved setFpABI() call inside emitDirectiveModuleFP() so that the caller
doesn't have to remember to do it.
* MipsABIFlags now calculates the flags1 and flags2 member on demand rather
than trying to maintain them in the same format they will be emitted in.
There is one portion of the -mfp64 and -mno-odd-spreg combination that is not
implemented yet. Moves to/from odd-numbered double-precision registers must not
use mtc1. I will fix this in a follow-up.
Differential Revision: http://reviews.llvm.org/D4383
llvm-svn: 212717
This is minimal change for backend required to have "hello world" compiled and working on x32 target (x86_64-linux-gnux32). More patches for x32 will follow.
Differential Revision: http://reviews.llvm.org/D4181
llvm-svn: 212716
to the zero-extend-vector-inreg node introduced previously for the same
purpose: manage the type legalization of widened extend operations,
especially to support the experimental widening mode for x86.
I'm adding both because sign-extend is expanded in terms of any-extend
with shifts to propagate the sign bit. This removes the last
fundamental scalarization from vec_cast2.ll (a test case that hit many
really bad edge cases for widening legalization), although the trunc
tests in that file still appear scalarized because the the shuffle
legalization is scalarizing. Funny thing, I've been working on that.
Some initial experiments with this and SSE2 scenarios is showing
moderately good behavior already for sign extension. Still some work to
do on the shuffle combining on X86 before we're generating optimal
sequences, but avoiding scalarization is a huge step forward.
llvm-svn: 212714
There's no real need to have Shift as a separate format type from Binary.
The comments for other format types were too specific and in some cases
no longer accurate.
Just a clean-up, no behavioral change intended.
llvm-svn: 212707
shuffle lowering: match shuffle patterns equivalent to an unpcklwd or
unpckhwd instruction.
This allows us to use generic lowering code for v8i16 shuffles and match
the unpack pattern late.
llvm-svn: 212705
Immediate fields that have no natural MVT type tended to use i8 if the
field was small enough. This was a bit confusing since i8 isn't a legal
type for the target. Fields for short immediates in a 32-bit or 64-bit
operation use i32 or i64 instead, so it would be better to do the same
for all fields.
No behavioral change intended.
llvm-svn: 212702
The dwarf FPR numbers are supposed to have the order F0, F2, F4, F6,
F1, F3, F5, F7, F8, etc., which matches the pairing of registers for
long doubles. E.g. a long double stored in F0 is paired with F2.
llvm-svn: 212701
Summary:
On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer
comparisons return 0 or 1.
Updated the various uses of getBooleanContents. Two simplifications had to be
disabled when float and int boolean contents differ:
- ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially
discoverable (i.e. when the condition of the VSELECT is a SETCC node).
- visitVSELECT (select C, 0, 1) -> (xor C, 1).
Come to think of it, this one could test for the common case of 'C'
being a SETCC too.
Preserved existing behaviour for all other targets and updated the affected
MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low'
variable was counting in the wrong direction because it thought it could simply
add the result of the comparison.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D4389
llvm-svn: 212697
combine into half-shuffles through unpack instructions that expand the
half to a whole vector without messing with the dword lanes.
This fixes some redundant instructions in splat-like lowerings for
v16i8, which are now getting to be *really* nice.
llvm-svn: 212695
that splat i8s into i16s.
Previously, we would try much too hard to arrange a sequence of i8s in
one half of the input such that we could unpack them into i16s and
shuffle those into place. This isn't always going to be a cheaper i8
shuffle than our other strategies. The case where it is always going to
be cheaper is when we can arrange all the necessary inputs into one half
using just i16 shuffles. It happens that viewing the problem this way
also makes it much easier to produce an efficient set of shuffles to
move the inputs into one half and then unpack them.
With this, our splat code gets one step closer to being not terrible
with the new experimental lowering strategy. It also exposes two
combines missing which I will add next.
llvm-svn: 212692
isDereferenceablePointer should not give up upon encountering any bitcast. If
we're casting from a pointer to a larger type to a pointer to a small type, we
can continue by examining the bitcast's operand. This missing capability
was noted in a comment in the function.
In order for this to work, isDereferenceablePointer now takes an optional
DataLayout pointer (essentially all callers already had such a pointer
available). Most code uses isDereferenceablePointer though
isSafeToSpeculativelyExecute (which already took an optional DataLayout
pointer), and to enable the LICM test case, LICM needs to actually provide its DL
pointer to isSafeToSpeculativelyExecute (which it was not doing previously).
llvm-svn: 212686
This adds a utility method to access the WinCFI information in bulk and uses
that to iterate rather than requesting the count and individually iterating
them. This is in preparation for restructuring WinCFI handling to enable more
clear sharing across architectures to enable unwind information emission for
Windows on ARM.
llvm-svn: 212683
shuffles specifically for cases where a small subset of the elements in
the input vector are actually used.
This is specifically targetted at improving the shuffles generated for
trunc operations, but also helps out splat-like operations.
There is still some really low-hanging fruit here that I want to address
but this is a huge step in the right direction.
llvm-svn: 212680
don't need to set it manually.
This is based on feedback from Tom who pointed out that if every target
needs to handle this we need to reach out to those maintainers. In fact,
it doesn't make sense to duplicate everything when anything other than
expand seems unlikely at this stage.
llvm-svn: 212661
Reverted by Eric Christopher (Thanks!) in r212203 after Bob Wilson
reported LTO issues. Duncan Exon Smith and Aditya Nandakumar helped
provide a reduced reproduction, though the failure wasn't too hard to
guess, and even easier with the example to confirm.
The assertion that the subprogram metadata associated with an
llvm::Function matches the scope data referenced by the DbgLocs on the
instructions in that function is not valid under LTO. In LTO, a C++
inline function might exist in multiple CUs and the subprogram metadata
nodes will refer to the same llvm::Function. In this case, depending on
the order of the CUs, the first intance of the subprogram metadata may
not be the one referenced by the instructions in that function and the
assertion will fail.
A test case (test/DebugInfo/cross-cu-linkonce-distinct.ll) is added, the
assertion removed and a comment added to explain this situation.
Original commit message:
If a function isn't actually in a CU's subprogram list in the debug info
metadata, ignore all the DebugLocs and don't try to build scopes, track
variables, etc.
While this is possibly a minor optimization, it's also a correctness fix
for an incoming patch that will add assertions to LexicalScopes and the
debug info verifier to ensure that all scope chains lead to debug info
for the current function.
Fix up a few test cases that had broken/incomplete debug info that could
violate this constraint.
Add a test case where this occurs by design (inlining a
debug-info-having function in an attribute nodebug function - we want
this to work because /if/ the nodebug function is then inlined into a
debug-info-having function, it should be fine (and will work fine - we
just stitch the scopes up as usual), but should the inlining not happen
we need to not assert fail either).
llvm-svn: 212649
Turn llvm::SpecialCaseList into a simple class that parses text files in
a specified format and knows nothing about LLVM IR. Move this class into
LLVMSupport library. Implement two users of this class:
* DFSanABIList in DFSan instrumentation pass.
* SanitizerBlacklist in Clang CodeGen library.
The latter will be modified to use actual source-level information from frontend
(source file names) instead of unstable LLVM IR things (LLVM Module identifier).
Remove dependency edge from ClangCodeGen/ClangDriver to LLVMTransformUtils.
No functionality change.
llvm-svn: 212643
Storing will generally be immediately preceded by rounding from an f32
or f64, so make sure to match those patterns directly to convert into the
FPR16 register class directly rather than going through the integer GPRs.
This also eliminates an extra step in the convert-from-f64 path
which was first converting to f32 and then to f16 from there.
rdar://17594379
llvm-svn: 212638
This lets us experiment with 512-bit vectorization without passing
force-vector-width manually.
The code generated for a simple integer memset loop is properly vectorized.
Disassembly is still broken for it though :(.
llvm-svn: 212634
This is a follow up to r212492. There should be no functional difference, but
this patch makes it clear that SrcVT must be an i1/i8/16/i32 and DestVT must be
an i8/i16/i32/i64.
rdar://17516686
llvm-svn: 212633
In PR20059 ( http://llvm.org/pr20059 ), instcombine eliminates shuffles that are necessary before performing an operation that can trap (srem).
This patch calls isSafeToSpeculativelyExecute() and bails out of the optimization in SimplifyVectorOp() if needed.
Differential Revision: http://reviews.llvm.org/D4424
llvm-svn: 212629
Summary: This is a pre-requisite for supporting the mips-img-linux-gnu triple in clang.
Differential Revision: http://reviews.llvm.org/D4435
llvm-svn: 212626
not widening the input type to the node sufficiently to let the ext take
place in a register.
This would in turn result in a mysterious bitcast assertion failure
downstream. First change here is to add back the helpful assert I had in
an earlier version of the code to catch this immediately.
Next change is to add support to the type legalization to detect when we
have widened the operand either too little or too much (for whatever
reason) and find a size-matched legal vector type to convert it to
first. This can also fail so we get a new fallback path, but that seems
OK.
With this, we no longer crash on vec_cast2.ll when using widening. I've
also added the CHECK lines for the zero-extend cases here. We still need
to support sign-extend and trunc (or something) to get plausible code
for the other two thirds of this test which is one of the regression
tests that showed the most scalarization when widening was
force-enabled. Slowly closing in on widening being a viable legalization
strategy without it resorting to scalarization at every turn. =]
llvm-svn: 212614
Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out
as we have to blend two vectors. While there remove unecessary cross-lane moves
from the shuffles so the backend can lower it to palignr instead of vperm.
Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2.
llvm-svn: 212611
vector types to be legal and a ZERO_EXTEND node is encountered.
When we use widening to legalize vector types, extend nodes are a real
challenge. Either the input or output is likely to be legal, but in many
cases not both. As a consequence, we don't really have any way to
represent this situation and the prior code in the widening legalization
framework would just scalarize the extend operation completely.
This patch introduces a new DAG node to represent doing a zero extend of
a vector "in register". The core of the idea is to allow legal but
different vector types in the input and output. The output vector must
have fewer lanes but wider elements. The operation is defined to zero
extend the low elements of the input to the size of the output elements,
and drop all of the high elements which don't have a corresponding lane
in the output vector.
It also includes generic expansion of this node in terms of blending
a zero vector into the high elements of the vector and bitcasting
across. This in turn yields extremely nice code for x86 SSE2 when we use
the new widening legalization logic in conjunction with the new shuffle
lowering logic.
There is still more to do here. We need to support sign extension, any
extension, and potentially int-to-float conversions. My current plan is
to continue using similar synthetic nodes to model each of these
transitions with generic lowering code for each one.
However, with this patch LLVM already reaches performance parity with
GCC for the core C loops of the x264 code (assuming you disable the
hand-written assembly versions) when compiling for SSE2 and SSE3
architectures and enabling the new widening and lowering logic for
vectors.
Differential Revision: http://reviews.llvm.org/D4405
llvm-svn: 212610
Summary:
It seems we accidentally read the wrong column of the table MIPS64r6 spec
and used the names for c.cond.fmt instead of cmp.cond.fmt.
Differential Revision: http://reviews.llvm.org/D4387
llvm-svn: 212607
Summary:
This completes the change to use JALR instead of JR on MIPS32r6/MIPS64r6.
Reviewers: jkolek, vmedic, zoran.jovanovic, dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4269
llvm-svn: 212605
Summary:
RET, and RET_MM have been replaced by a pseudo named PseudoReturn.
In addition a version with a 64-bit GPR named PseudoReturn64 has been
added.
Instruction selection for a return matches RetRA, which is expanded post
register allocation to PseudoReturn/PseudoReturn64. During MipsAsmPrinter,
this PseudoReturn/PseudoReturn64 are emitted as:
- (JALR64 $zero, $rs) on MIPS64r6
- (JALR $zero, $rs) on MIPS32r6
- (JR_MM $rs) on microMIPS
- (JR $rs) otherwise
On MIPS32r6/MIPS64r6, 'jr $rs' is an alias for 'jalr $zero, $rs'. To aid
development and review (specifically, to ensure all cases of jr are
updated), these aliases are temporarily named 'r6.jr' instead of 'jr'.
A follow up patch will change them back to the correct mnemonic.
Added (JALR $zero, $rs) to MipsNaClELFStreamer's definition of an indirect
jump, and removed it from its definition of a call.
Note: I haven't accounted for MIPS64 in MipsNaClELFStreamer since it's
doesn't appear to account for any MIPS64-specifics.
The return instruction created as part of eh_return expansion is now expanded
using expandRetRA() so we use the right return instruction on MIPS32r6/MIPS64r6
('jalr $zero, $rs').
Also, fixed a misuse of isABI_N64() to detect 64-bit wide registers in
expandEhReturn().
Reviewers: jkolek, vmedic, mseaborn, zoran.jovanovic, dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4268
llvm-svn: 212604
Summary:
This patch re-uses the implementation of 'llvm-mc -show-inst' and makes it
available to llc as 'llc -asm-show-inst'.
This is necessary to test parts of MIPS32r6/MIPS64r6 without resorting to
'llc -filetype=obj' tests. For example, on MIPS32r2 and earlier we use the
'jr $rs' instruction for indirect branches and returns. On MIPS32r6, we no
longer have 'jr $rs' and use 'jalr $zero, $rs' instead. The catch is that,
on MIPS32r6, 'jr $rs' is an alias for 'jalr $zero, $rs' and is the preferred
way of writing this instruction. As a result, all MIPS ISA's emit 'jr $rs' in
their assembly output and the assembler encodes this to different opcodes
according to the ISA.
Using this option, we can check that the MCInst really is a JR or a JALR by
matching the emitted comment. This removes the need for a 'llc -filetype=obj'
test.
Reviewers: rafael, dsanders
Reviewed By: dsanders
Subscribers: zoran.jovanovic, llvm-commits
Differential Revision: http://reviews.llvm.org/D4267
llvm-svn: 212603
has settled without incident, removing the x86-specific and overly
strict 'isVectorSplat' routine in favor of generic and more powerful
splat detection.
The primary motivation and result of this is that the x86 backend can
now see through splats which contain undef elements. This is essential
if we are using a widening form of legalization and I've updated a test
case to also run in that mode as before this change the generated code
for the test case was completely scalarized.
This version of the patch much more carefully handles the undef lanes.
- We aren't overly conservative about them in the shift lowering
(where we will never use the splat itself).
- One place where the splat would have been re-used by the existing code
now explicitly constructs a new constant splat that will be safe.
- The broadcast lowering is much more reasonable with undefs by doing
a correct check of whether the splat is the only user of a loaded
value, checking that the splat actually crosses multiple lanes before
using a broadcast, and handling broadcasts of non-constant splats.
As a consequence of the last bullet, the weird usage of vpshufd instead
of vbroadcast is gone, and we actually can lower an AVX splat with
vbroadcastss where before we emitted a really strange pattern of
a vector load and a manual splat across the vector.
llvm-svn: 212602
This -f group flag appears to influence linker flags, breaking the usual rules
and causing CMake's link invocation to fail during feature detection due to
missing link dependencies (msan_*).
Let's forcibly add it for now to get things the way they were before feature
detection started working.
llvm-svn: 212590
When LLVM_ENABLE_TIMESTAMPS has been disabled we can prevent the preprocessor
from embedding dates, times and file timestamps.
There are a few motivations for this:
1) Validate the recent CMake feature detection bugfix from LLVM r212586 with
a flag that's not actually available everywhere.
2) Dogfood clang's new -Wdate-time warning from r210511 when bootstrapping.
3) Encourage reproducible builds.
llvm-svn: 212587
add_flag_if_supported() and add_flag_or_print_warning() were effectively
no-ops, just returning the value of the first result (usually
'-fno-omit-frame-pointer') for all subsequent checks for different flags.
Due to the way CMake caches feature detection results, we need to provide
symbolic variable names which will persist the cached results. This commit
fixes feature detection using these two macros.
The feature checks now run and get stored correctly, and the correct output can
be observed in configure logs:
-- Performing Test C_SUPPORTS_FPIC
-- Performing Test C_SUPPORTS_FPIC - Success
-- Performing Test CXX_SUPPORTS_FPIC
-- Performing Test CXX_SUPPORTS_FPIC - Success
llvm-svn: 212586
tracks which elements of the build vector are in fact undef.
This should make actually inpsecting them (likely in my next patch)
reasonably pretty. Also makes the output parameter optional as it is
clear now that *most* users are happy with undefs in their splats.
llvm-svn: 212581
Summary: This test ensures that we can correctly specify a full Windows path to the clang ASAN runtime libraries. This is in preparation to fix PR20246.
Reviewers: rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4427
llvm-svn: 212580
This will allow the "-s" flag to implemented in the future as it
is in darwin’s nm(1) to list symbols only in the specified section.
Given a LGTM by Shankar Easwaran who originally implemented
the support for lvm-nm’s -print-armap and archive map symbols.
llvm-svn: 212576
Loading will generally extend to an f32 or an 64, so make sure
to match those patterns directly to load into the FPR16 register
class directly rather than going through the integer GPRs.
This also eliminates an extra step in the convert-to-f64 path
which was first converting to f32 and then to f64 from there.
rdar://17594379
llvm-svn: 212573
BasicAA contains knowledge of certain intrinsics, such as memcpy and memset,
and uses that information to form more-accurate answers to CallSite vs. Loc
ModRef queries. Unfortunately, it did not use this information when answering
CallSite vs. CallSite queries.
Generically, when an intrinsic takes one or more pointers and the intrinsic is
marked only to read/write from its arguments, the offset/size is unknown. As a
result, the generic code that answers CallSite vs. CallSite (and CallSite vs.
Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's
arguments. While BasicAA's CallSite vs. Loc override could use more-accurate
size information for some intrinsics, it did not do the same for CallSite vs.
CallSite queries.
This change refactors the intrinsic-specific logic in BasicAA into a generic AA
query function: getArgLocation, which is overridden by BasicAA to supply the
intrinsic-specific knowledge, and used by AA's generic implementation. This
allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and
CallSite vs. CallSite queries, and simplifies the BasicAA implementation.
Currently, only one function, Mac's memset_pattern16, is handled by BasicAA
(all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's
getModRefBehavior override now also returns OnlyAccessesArgumentPointees for
this function (which is an improvement).
llvm-svn: 212572
This reverts commit 5b55a47e94e28fbb56d0cd5d72c3db9105c15b4c.
A test case was found to crash after this was applied. I'll file a bug to track fixing this with the test case needed.
llvm-svn: 212550
This changes the implementation of atomic NAND operations
from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)"
(compatible with GCC >= 4.4).
This is in line with the common-code and ARM back-end change
implemented in r212433.
llvm-svn: 212547
This patch teaches how to fold a shuffle according to rule:
shuffle (shuffle (x, undef, M0), undef, M1) -> shuffle(x, undef, M2)
We do this only if the resulting mask M2 is legal; this is to avoid introducing
illegal shuffles that are potentially expanded into a sub-optimal sequence
of target specific dag nodes.
This patch has the advantage of being target independent, since it works on ISD
nodes. Therefore, all targets (not only x86) can take advantage of this rule.
The idea behind this patch is that most shuffle pairs can be safely combined
before we run the legalizer on vector operations. This allows us to
combine/simplify dag nodes earlier in the process and not only immediately
before instruction selection stage.
That said. This patch is not meant to replace any existing target specific
combine rules; backends might still introduce new shuffles during legalization
stage. Also, this rule is very simple and avoids to aggressively optimize
shuffles.
llvm-svn: 212539
Summary:
Follow on to r212519 to improve the encapsulation and limit the scope of the enums.
Also merged two very similar parser functions, fixed a bug where ASE's
were not being reported, and marked CPR1's as being 128-bit when MSA is
enabled.
Differential Revision: http://reviews.llvm.org/D4384
llvm-svn: 212522
aggressively from the x86 shuffle lowering to the generic SDAG vector
shuffle formation code.
This code already tried to fold away shuffles of splats! It just had
lots of bugs and couldn't handle the case my new x86 shuffle lowering
needed.
First, it failed to correctly compute whether N2 was undef because it
pre-computed this, then did transformations which could *make* N2 undef,
then failed to ever re-consider the precomputed state.
Second, it didn't look through bitcasts at all, even in the safe cases
where they are just element-type bitcasts with no change to the number
of elements.
Third, it didn't handle all-zero bit casts nicely the way my code in the
x86 side of things did, which is essential to getting good zext-shuffle
lowerings.
But all of these are generic. I just ported the code down to this layer
and fixed the surrounding bugs. Tests exercising this in the x86 backend
still pass and some silly code in widen_cast-6.ll gets better. I updated
that test to be a bit more precise but it's still pretty unclear what
the value of the test is in this day and age.
llvm-svn: 212517
around the handling of UNDEF lanes in boolean vector content analysis.
The code before my changes here also failed to check for non-constant
splats in a buildvector. I have no idea how to trigger this, I just
spotted by inspection when trying to understand the code. It seems
extremely unlikely to be worth the trouble to teach the only caller of
this code (DAG combining setcc patterns) how to cleverly handle undef
lanes, so I've just commented more thoroughly that we're giving up
there.
llvm-svn: 212515
nodes about whether they are splats. This is factored out and improved
from r212324 which got reverted as it was far too aggressive. The new
API should help more conservatively handle buildvectors that are
a mixture of splatted and undef values.
No functionality change at this point. The hope is to slowly
re-introduce the undef-tolerant optimization of splats, but each time
being forced to make a concious decision about how to handle the undefs
in a way that doesn't lead to contradicting assumptions about the
collapsed value.
Hal has pointed out in discussions that this may not end up being the
desired API and instead it may be more convenient to get a mask of the
undef elements or something similar. I'm starting simple and will expand
the API as I adapt actual callers and see exactly what they need.
llvm-svn: 212514
All blacklisting logic is now moved to the frontend (Clang).
If a function (or source file it is in) is blacklisted, it doesn't
get sanitize_address attribute and is therefore not instrumented.
If a global variable (or source file it is in) is blacklisted, it is
reported to be blacklisted by the entry in llvm.asan.globals metadata,
and is not modified by the instrumentation.
The latter may lead to certain false positives - not all the globals
created by Clang are described in llvm.asan.globals metadata (e.g,
RTTI descriptors are not), so we may start reporting errors on them
even if "module" they appear in is blacklisted. We assume it's fine
to take such risk:
1) errors on these globals are rare and usually indicate wild memory access
2) we can lazily add descriptors for these globals into llvm.asan.globals
lazily.
llvm-svn: 212505
As destination k0 is allowed but not as predicate/writemask.
I also modified the test to allow checking of error messages by the assembler.
I applied a similar approach to the test ret.s in the same directory.
llvm-svn: 212504
When combining a sequence of two PSHUFD dag nodes into a single PSHUFD,
make sure that we assign the correct type to the resulting PSHUFD.
X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32.
Before this change, an assertion failure was triggered in method
'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test
below into a single PSHUFD.
define <4 x float> @test1(<4 x float> %V) {
%1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
%2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
ret <4 x float> %2
}
llvm-svn: 212498
Add custom lowering code for signed multiply instruction selection, because the
default FastISel instruction selection for ISD::MUL will use unsigned multiply
for the i8 type and signed multiply for all other types. This would set the
incorrect flags for the overflow check.
This fixes <rdar://problem/17549300>
llvm-svn: 212493
Currently AArch64FastISel crashes if it tries to extend an integer into an
MVT::i128. This can happen by creating 128 bit integers like so:
typedef unsigned int uint128_t __attribute__((mode(TI)));
typedef int sint128_t __attribute__((mode(TI)));
This patch makes EmitIntExt check for their presence and then falls back to
SelectionDAG.
Tests included.
rdar://17516686
llvm-svn: 212492
This patch adds to an existing loop over phi nodes in SimplifyCondBranchToCondBranch() to check for trapping ops and bails out of the optimization if we find one of those.
The test cases verify that trapping ops are not hoisted and non-trapping ops are still optimized as expected.
llvm-svn: 212490
Arguments passed as "byval align" should get the specified alignment
in the parameter save area. There was some code in PPCISelLowering.cpp
that attempted to implement this, but this didn't work correctly:
while code did update the ArgOffset value, it neglected to update
the PtrOff value (which was already computed from the old ArgOffset),
and it also neglected to update GPR_idx -- fields skipped due to
alignment in the save area must likewise be skipped in GPRs.
This patch fixes and simplifies this logic by:
- handling argument offset alignment right at the beginning
of argument processing, using a new helper routine
CalculateStackSlotAlignment (this avoids having to update
PtrOff and other derived values later on)
- not tracking GPR_idx separately, but always computing the
correct GPR_idx for each argument *from* its ArgOffset
- removing some redundant computation in LowerFormalArguments:
MinReservedArea must equal ArgOffset after argument processing,
so there's no use in computing it twice.
[This doesn't change the behavior of the current clang front-end,
since that never creates "byval align" arguments at the moment.
This will change with a follow-on patch, however.]
llvm-svn: 212476
lanes in vector splats.
The core problem here is that undef lanes can't *unilaterally* be
considered to contribute to splats. Their handling needs to be more
cautious. There is also a reported failure of the nightly testers
(thanks Tobias!) that may well stem from the same core issue. I'm going
to fix this theoretical issue, factor the APIs a bit better, and then
verify that I don't see anything bad with Tobias's reduction from the
test suite before recommitting.
Original commit message for r212324:
[x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.
This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.
With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.
We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can
easily tweak the lowering if they want.
llvm-svn: 212475
essentially a DAG combine that never gets a chance to run.
We might typically expect DAG combining to remove shuffles-of-splats and
other similar patterns, but we don't get a chance to run the DAG
combiner when we recursively form sub-shuffles during the lowering of
a shuffle. So instead hand-roll a really important combine directly into
the lowering code to detect shuffles-of-splats, especially shuffles of
an all-zero splat which needn't even have the same element width, etc.
This lets the new vector shuffle lowering handle shuffles which
implement things like zero-extension really nicely. This will become
even more important when I wire the legalization of zero-extension to
vector shuffles with the new widening legalization strategy.
llvm-svn: 212444
We've been performing the wrong operation on ARM for "atomicrmw nand" for
years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b".
This bled over into the generic expansion pass.
So I assume no-one has ever actually tried to do an atomic nand in the real
world. Oh well.
llvm-svn: 212443
This completes the handling for DLL import storage symbols when lowering
instructions. A DLL import storage symbol must have an additional load
performed prior to use. This is applicable to variables and functions.
This is particularly important for non-function symbols as it is possible to
handle function references by emitting a thunk which performs the translation
from the unprefixed __imp_ symbol to the proper symbol (although, this is a
non-optimal lowering). For a variable symbol, no such thunk can be
accommodated.
llvm-svn: 212431
Add support for tracking DLLImport storage class information on a per symbol
basis in the ARM instruction selection. Use that information to correctly
mangle the symbol (dllimport symbols are referenced via *__imp_<name>).
llvm-svn: 212430
Ensure that all paths that retrieve the symbol name go through GetARMGVSymbol
rather than getSymbol. This is desirable so that any global symbol mangling can
be centralised to this function. The motivation for this is handling of symbols
that are marked as having dll import dll storage. Such a symbol requires an
extra load that is currently handled in the backend and a __imp_ prefix on the
symbol name.
llvm-svn: 212429
Use 0 for the invalid buffer instead of -1/~0 and switch to unsigned
representation to enable more idiomatic usage.
Also introduce a trivial SourceMgr::getMainFileID() instead of hard-coding 0/1
to identify the main file.
llvm-svn: 212398
A number of the ARM intrinsics are aliased with alternative names in MSVC
compatibility mode. This change indicates those intrinsics to permit tablegen
to construct an appropriate list of MSBuiltins. With the corresponding change
in clang, these intrinsics can then be mapped from the frontend.
The tests to validate the intrinsics are aliased correctly will be added with
the corresponding clang change.
llvm-svn: 212377
The slice(N, M) interface is powerful but not concise when wanting to
drop a few elements off of an ArrayRef, fix this by adding a drop_back
method.
llvm-svn: 212370
This better aligns with other LLVM-specific and C++ standard library smart
pointer types.
In particular there are at least a few uses of intrusive refcounting in the
frontend where it's worth investigating std::shared_ptr as a more appropriate
alternative.
llvm-svn: 212366
A GEP of a non-weak global variable will not be equivalent to another
non-weak global variable or a GEP of such a variable.
Differential Revision: http://reviews.llvm.org/D4238
llvm-svn: 212360
The regular end of the bitcode parsing is in the BitstreamEntry::EndBlock
case.
Should fix the LTO bootstrap on OS X (this function is only used by ld64).
llvm-svn: 212357
This reverts commit r212342.
We can get a StringRef into the current Record, but not one in the bitcode
itself since the string is compressed in it.
llvm-svn: 212356
These are the llvm.* globals and functions.
I don't think it is possible to test this directly since llvm-lto is not
a full linker and will not report duplicated symbols, but this fixes
bootstrap with gold and lto enabled.
llvm-svn: 212354
It is not clear if llvm.global_ctors should or should not be in llvm.metadata,
but in practice it is not and we need to ignore it for LTO.
llvm-svn: 212351
Add MSBuiltin which is similar in vein to GCCBuiltin. This allows for adding
intrinsics for Microsoft compatibility to individual instructions. This is
needed to permit the creation of ARM specific MSVC extensions.
This is not currently in use, and requires an associated change in clang to
enable use of the intrinsics defined by this new class. This merely sets the
LLVM portion of the infrastructure in place to permit the use of this
functionality. A separate set of changes will enable the new intrinsics.
llvm-svn: 212350
IRObjectFile provides all the logic for producing mangled names and getting
symbols from inline assembly.
LTOModule then adds logic for linking specific tasks, like constructing
llvm.compiler_user or extracting linker options from the bitcode.
The rule of the thumb is that IRObjectFile has the functionality that is
needed by both LTO and llvm-ar.
llvm-svn: 212349
Summary:
The tests in this directory are intended to test a single IR instruction
with as few dependencies on other instructions as possible. The aim is to
be very confident that each LLVM-IR instruction is implemented correctly and
with the optimal sequence of instructions, as well as to make it easy to tell
what is tested, and make it easier to bring up new ISA revisions in the
future. This gives us a good foundation on which to test bigger things.
These particular tests will allow testing that MIPS32r6/MIPS64r6 generate
the correct return instruction for returns, calls, and indirect branches.
This will be a bit tricky since the assembly text is identical but the
instruction is actually different. On MIPS32r6/MIPS64r6 'jr $rs' has been
removed in favour of the equivalent 'jalr $zero, $rs'. 'jr $rs' remains as
an alias for 'jalr $zero, $rs'.
Differential Revision: http://reviews.llvm.org/D4266
llvm-svn: 212345