When the loop vectorizer vectorizes code that uses the loop induction variable,
we often end up with IR like this:
%b1 = insertelement <2 x i32> undef, i32 %v, i32 0
%b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer
%i = add <2 x i32> %b2, <i32 2, i32 3>
If the add in this example is not legal (as is the case on PPC with VSX), it
will be scalarized, and we'll end up with a number of extract_vector_elt nodes
with the vector shuffle as the input operand, and that vector shuffle is fed by
one or more build_vector nodes. By the time that vector operations are
expanded, visitEXTRACT_VECTOR_ELT will not create new extract_vector_elt by
looking through the vector shuffle (to make sure that no illegal operations are
created), and so the extract_vector_elt -> vector shuffle -> build_vector is
never simplified to an operand of the build vector.
By looking at build_vectors through a shuffle we fix this particular situation,
preventing a vector from being built, only to be deconstructed again (for the
scalarized add) -- an expensive proposition when this all needs to be done via
the stack. We probably want a more comprehensive fix here where we look back
recursively through any shuffles to any build_vectors or scalar_to_vectors,
etc. but that can come later.
llvm-svn: 205179
Summary:
The FileHeader mapping now accepts an optional Flags sequence that accepts
the EF_<arch>_<flag> constants. When not given, Flags defaults to zero.
Reviewers: atanasyan
Reviewed By: atanasyan
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3213
llvm-svn: 205173
Previously, MinGW OS was Triple::MinGW and Cygwin was Triple::Cygwin
and now it is Triple::Win32 with Environment being GNU or Cygwin.
So,
TheTriple.getOS() == Triple::Win32
is replaced by
TheTriple.isWindowsMSVCEnvironment()
and
(TheTriple.getOS() == Triple::MinGW32 || TheTriple.getOS() == Triple::Cygwin)
is replaced by
TheTriple.isOSCygMing()
llvm-svn: 205170
is not a pattern to lower this with clever instructions that zero the
register, so restrict the zero immediate legality special case to f64
and f32 (the only two sizes which fmov seems to directly support). Fixes
backend errors when building code such as libxml.
llvm-svn: 205161
There is no direct AVX instruction to convert to unsigned. I have some ideas
how we may be able to do this with three vector instructions but the current
backend just bails on this to get it scalarized.
See the comment why we need to adjust the cost returned by BasicTTI.
The test is a bit roundabout (and checks assembly rather than bit code) because
I'd like it to work even if at some point we could vectorize this conversion.
Fixes <rdar://problem/16371920>
llvm-svn: 205159
When expanding EXTRACT_VECTOR_ELT and EXTRACT_SUBVECTOR using
SelectionDAGLegalize::ExpandExtractFromVectorThroughStack, we store the entire
vector and then load the piece we want. This is fine in isolation, but
generating a new store (and corresponding stack slot) for each extraction ends
up producing code of poor quality. When we scalarize a vector operation (using
SelectionDAG::UnrollVectorOp for example) we generate one EXTRACT_VECTOR_ELT
for each element in the vector. This used to generate one stored copy of the
vector for each element in the vector. Now we search the uses of the vector for
a suitable store before generating a new one, which results in much more
efficient scalarization code.
llvm-svn: 205153
sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node
(and similarly for v2i16 and v2i8). Even though there are no sign-extension (or
algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by
converting two and from v2i64. The small trick necessary here is to shift the
i32 elements into the right lanes before the i32 -> f64 step. This is because
of the big Endian nature of the system, we need the i32 portion in the high
word of the i64 elements.
For v2i16 and v2i8 we can do the same, but we first use the default Altivec
shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and
then apply the above procedure.
llvm-svn: 205146
parameters rather than runtime parameters.
There is only one user of these parameters and they are compile time for
that user. Making these compile time seems to better reflect their
intended usage as well.
llvm-svn: 205143
That causes references to them to be weak references which can collapse
to null if no definition is provided. We call these functions
unconditionally, so a definition *must* be provided. Make the
definitions provided in the .cpp file weak by re-declaring them as weak
just prior to defining them. This should keep compilers which cannot
attach the weak attribute to the definition happy while actually
resolving the symbols correctly during the link.
You might ask yourself upon reading this commit log: how did *any* of
this work before? Well, fun story. It turns out we have some code in
Support (BumpPtrAllocator) which both uses virtual dispatch and has
out-of-line vtables used by that virtual dispatch. If you move the
virtual dispatch into its header in *just* the right way, the optimizer
gets to devirtualize, and remove all references to the vtable. Then the
sad part: the references to this one vtable were the only strong symbol
uses in the support library for llvm-tblgen AFAICT. At least, after
doing something just like this, these symbols stopped getting their weak
definition and random calls to them would segfault instead.
Yay software.
llvm-svn: 205137
StringRef::lower() returns a std::string. Better yet, we can now stop
thinking about what it returns and write 'auto'. It does the right
thing. =]
llvm-svn: 205135
It was doing functional but highly suspect operations on bools due to
the more limited shifting operands supported by memory instructions.
Should fix some MSVC warnings.
llvm-svn: 205134
Actually, mostly only those in the top-level directory that already
had a "virtual" attached. But it's the thought that counts and it's
been a long day.
llvm-svn: 205131
If the environment is unknown and no object file is provided, then assume an
"MSVC" environment, otherwise, set the environment to the object file format.
In the case that we have a known environment but a non-native file format for
Windows (COFF) which is used for MCJIT, then append the custom file format to
the triple as an additional component.
This fixes the MCJIT tests on Windows.
llvm-svn: 205130
Patch by Tobias Güntner.
I tried to write a test, but the only difference is the Changed value that
gets returned. It can be tested with "opt -debug-pass=Executions -functionattrs,
but that doesn't seem worth it.
llvm-svn: 205121
This will fix cross-compiling buildbots (e.g. cygwin). This is in the same vein
as SVN r205070. Apply this to fix the cross-compiling scenario, even though the
preferred solution is to update the build system to normalize the embedded
triple rather than perform this at runtime every time. This is meant to tide us
over until that approach is fleshed out and applied.
llvm-svn: 205120
The ARM64 backend uses it only as a container to keep an MCLOHType and
Arguments around so give it its own little copy. The other functionality
isn't used and we had a crazy method specialization hack in place to
keep it working. Unfortunately that was incompatible with MSVC.
Also range-ify a couple of loops while at it.
llvm-svn: 205114
v2i64 is a legal type under VSX, however we don't have native vector
comparisons. We can handle eq/ne by casting it to an Altivec type, but
everything else must be expanded.
llvm-svn: 205106
The vector divide and sqrt instructions have high latencies, and the scalar
comparisons are like all of the others. On the P7, permutations take an extra
cycle over purely-simple vector ops.
llvm-svn: 205096
Issue subject: Crash using integrated assembler with immediate arithmetic
Fix description:
Expressions like 'cmp r0, #(l1 - l2) >> 3' could not be evaluated on asm parsing stage,
since it is impossible to resolve labels on this stage. In the end of stage we still have
expression (MCExpr).
Then, when we want to encode it, we expect it to be an immediate, but it still an expression.
Patch introduces a Fixup (MCFixup instance), that is processed after main encoding stage.
llvm-svn: 205094
no assert at all. ;] Some of these should probably be switched to
llvm_unreachable, but I didn't want to perturb the behavior in this
patch.
Found by -Wstring-conversion, which I'll try to turn on in CMake builds
at least as it is finding useful things.
llvm-svn: 205091
This adds a second implementation of the AArch64 architecture to LLVM,
accessible in parallel via the "arm64" triple. The plan over the
coming weeks & months is to merge the two into a single backend,
during which time thorough code review should naturally occur.
Everything will be easier with the target in-tree though, hence this
commit.
llvm-svn: 205090
ARM64 has compact-unwind information, but doesn't necessarily want to
emit .eh_frame directives as well. This teaches MC about such a
situation so that it will skip .eh_frame info when compact unwind has
been successfully produced.
For functions incompatible with compact unwind, the normal information
is still written.
llvm-svn: 205087
Given IR like:
%bit = and %val, #imm-with-1-bit-set
%tst = icmp %bit, 0
br i1 %tst, label %true, label %false
some targets can emit just a single instruction (tbz/tbnz in the
AArch64 case). However, with ISel acting at the basic-block level, all
three instructions need to be together for this to be possible.
This adds another transformation to CodeGenPrep to expose these
opportunities, if targets opt in via the hook.
llvm-svn: 205086
This is principally to allow neater mapping of fixups to relocations
in ARM64 ELF. Without this, there isn't enough information available
to GetRelocType, leading to many more fixup_arm64_... enumerators.
llvm-svn: 205085
Another part of the ARM64 backend (so tests will be following soon).
This is currently used by the linker to relax adrp/ldr pairs into nops
where possible, though could well be more broadly applicable.
llvm-svn: 205084
The upcoming ARM64 backend doesn't have section-relative relocations,
so we give each section its own symbol to provide this functionality.
Of course, it doesn't need to appear in the final executable, so
linker-private is the best kind for this purpose.
llvm-svn: 205081
This is like the LLVMMatchType, except the verifier checks that the
second argument is a vector with the same base type and half the
number of elements.
This will be used by the ARM64 backend.
llvm-svn: 205079
I started trying to fix a small issue, but this code has seen a small fix too
many.
The old code was fairly convoluted. Some of the issues it had:
* It failed to check if a symbol difference was in the some section when
converting a relocation to pcrel.
* It failed to check if the relocation was already pcrel.
* The pcrel value computation was wrong in some cases (relocation-pc.s)
* It was missing quiet a few cases where it should not convert symbol
relocations to section relocations, leaving the backends to patch it up.
* It would not propagate the fact that it had changed a relocation to pcrel,
requiring a quiet nasty work around in ARM.
* It was missing comments.
llvm-svn: 205076
We had stored both f64 values and v2f64, etc. values in the VSX registers. This
worked, but was suboptimal because we would always spill 16-byte values even
through we almost always had scalar 8-byte values. This resulted in an
increase in stack-size use, extra memory bandwidth, etc. To fix this, I've
added 64-bit subregisters of the Altivec registers, and combined those with the
existing scalar floating-point registers to form a class of VSX scalar
floating-point registers. The ABI code has also been enhanced to use this
register class and some other necessary improvements have been made.
llvm-svn: 205075