essentially a DAG combine that never gets a chance to run.
We might typically expect DAG combining to remove shuffles-of-splats and
other similar patterns, but we don't get a chance to run the DAG
combiner when we recursively form sub-shuffles during the lowering of
a shuffle. So instead hand-roll a really important combine directly into
the lowering code to detect shuffles-of-splats, especially shuffles of
an all-zero splat which needn't even have the same element width, etc.
This lets the new vector shuffle lowering handle shuffles which
implement things like zero-extension really nicely. This will become
even more important when I wire the legalization of zero-extension to
vector shuffles with the new widening legalization strategy.
llvm-svn: 212444
We've been performing the wrong operation on ARM for "atomicrmw nand" for
years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b".
This bled over into the generic expansion pass.
So I assume no-one has ever actually tried to do an atomic nand in the real
world. Oh well.
llvm-svn: 212443
This completes the handling for DLL import storage symbols when lowering
instructions. A DLL import storage symbol must have an additional load
performed prior to use. This is applicable to variables and functions.
This is particularly important for non-function symbols as it is possible to
handle function references by emitting a thunk which performs the translation
from the unprefixed __imp_ symbol to the proper symbol (although, this is a
non-optimal lowering). For a variable symbol, no such thunk can be
accommodated.
llvm-svn: 212431
Add support for tracking DLLImport storage class information on a per symbol
basis in the ARM instruction selection. Use that information to correctly
mangle the symbol (dllimport symbols are referenced via *__imp_<name>).
llvm-svn: 212430
Ensure that all paths that retrieve the symbol name go through GetARMGVSymbol
rather than getSymbol. This is desirable so that any global symbol mangling can
be centralised to this function. The motivation for this is handling of symbols
that are marked as having dll import dll storage. Such a symbol requires an
extra load that is currently handled in the backend and a __imp_ prefix on the
symbol name.
llvm-svn: 212429
Use 0 for the invalid buffer instead of -1/~0 and switch to unsigned
representation to enable more idiomatic usage.
Also introduce a trivial SourceMgr::getMainFileID() instead of hard-coding 0/1
to identify the main file.
llvm-svn: 212398
A number of the ARM intrinsics are aliased with alternative names in MSVC
compatibility mode. This change indicates those intrinsics to permit tablegen
to construct an appropriate list of MSBuiltins. With the corresponding change
in clang, these intrinsics can then be mapped from the frontend.
The tests to validate the intrinsics are aliased correctly will be added with
the corresponding clang change.
llvm-svn: 212377
The slice(N, M) interface is powerful but not concise when wanting to
drop a few elements off of an ArrayRef, fix this by adding a drop_back
method.
llvm-svn: 212370
This better aligns with other LLVM-specific and C++ standard library smart
pointer types.
In particular there are at least a few uses of intrusive refcounting in the
frontend where it's worth investigating std::shared_ptr as a more appropriate
alternative.
llvm-svn: 212366
A GEP of a non-weak global variable will not be equivalent to another
non-weak global variable or a GEP of such a variable.
Differential Revision: http://reviews.llvm.org/D4238
llvm-svn: 212360
The regular end of the bitcode parsing is in the BitstreamEntry::EndBlock
case.
Should fix the LTO bootstrap on OS X (this function is only used by ld64).
llvm-svn: 212357
This reverts commit r212342.
We can get a StringRef into the current Record, but not one in the bitcode
itself since the string is compressed in it.
llvm-svn: 212356
These are the llvm.* globals and functions.
I don't think it is possible to test this directly since llvm-lto is not
a full linker and will not report duplicated symbols, but this fixes
bootstrap with gold and lto enabled.
llvm-svn: 212354
It is not clear if llvm.global_ctors should or should not be in llvm.metadata,
but in practice it is not and we need to ignore it for LTO.
llvm-svn: 212351
Add MSBuiltin which is similar in vein to GCCBuiltin. This allows for adding
intrinsics for Microsoft compatibility to individual instructions. This is
needed to permit the creation of ARM specific MSVC extensions.
This is not currently in use, and requires an associated change in clang to
enable use of the intrinsics defined by this new class. This merely sets the
LLVM portion of the infrastructure in place to permit the use of this
functionality. A separate set of changes will enable the new intrinsics.
llvm-svn: 212350
IRObjectFile provides all the logic for producing mangled names and getting
symbols from inline assembly.
LTOModule then adds logic for linking specific tasks, like constructing
llvm.compiler_user or extracting linker options from the bitcode.
The rule of the thumb is that IRObjectFile has the functionality that is
needed by both LTO and llvm-ar.
llvm-svn: 212349
Summary:
The tests in this directory are intended to test a single IR instruction
with as few dependencies on other instructions as possible. The aim is to
be very confident that each LLVM-IR instruction is implemented correctly and
with the optimal sequence of instructions, as well as to make it easy to tell
what is tested, and make it easier to bring up new ISA revisions in the
future. This gives us a good foundation on which to test bigger things.
These particular tests will allow testing that MIPS32r6/MIPS64r6 generate
the correct return instruction for returns, calls, and indirect branches.
This will be a bit tricky since the assembly text is identical but the
instruction is actually different. On MIPS32r6/MIPS64r6 'jr $rs' has been
removed in favour of the equivalent 'jalr $zero, $rs'. 'jr $rs' remains as
an alias for 'jalr $zero, $rs'.
Differential Revision: http://reviews.llvm.org/D4266
llvm-svn: 212345
This is useful for functions that are not actually available externally but
referenced by a vtable of some kind. Clang emits functions like this for the MS
ABI.
PR20182.
llvm-svn: 212337
The linker relies on relocation type info (e.g. is it a branch?) to perform the
correct actions, so we should keep that even when we end up using a scattered
relocation for whatever reason.
rdar://problem/17553104
llvm-svn: 212333
There were two issues here:
1. At the very least, scattered relocations cannot use the same code to
determine the corresponding symbol being referred to. For some reason we
pretend there is no symbol, even when one actually exists in the symtab, so to
match this behaviour getRelocationSymbol should simply return symbols_end for
scattered relocations.
2. Printing "-" when we can't get a symbol (including the scattered case, but
not exclusively), isn't that helpful. In both cases there *is* interesting
information in that field, so we should print it. As hex will do.
Small part of rdar://problem/17553104
llvm-svn: 212332
We have detected a documentation bug in the encoding tables of the released
MIPS64r6 specification that has resulted in the wrong encodings being used for
these instructions in LLVM. This commit corrects them.
llvm-svn: 212330
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.
This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.
With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.
We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can easily
tweak the lowering if they want.
llvm-svn: 212324
Silvermont can only decode one instruction per cycle if the instruction exceeds 8 bytes.
Also in Silvermont instructions with more than 3 prefixes will cause 3 cycle penalty.
Maximum nop length is limited to 7 bytes when used for padding on Silvermont.
For other x86 processors max nop length remains unchanged 15 bytes.
Differential Revision: http://reviews.llvm.org/D4374
llvm-svn: 212321
FIXME: Make this configurable.
FIXME: "ENABLE_SHARED" doesn't make sense, since it is used just for plugins. We may rename it.
I introduced config.enable_shared in r120273.
llvm-svn: 212315
subtarget. This involved having the movt predicate take the current
function - since we care about size in instruction selection for
whether or not to use movw/movt take the function so we can check
the attributes. This required adding the current MachineFunction to
FastISel and propagating through.
llvm-svn: 212309
We want to encourage users of the C++ LTO API to reuse memory buffers instead
of repeatedly opening and reading the same file contents.
This reverts commit r212305 and implements a tidier scheme.
llvm-svn: 212308
When INT_MIN is the numerator in a sdiv, we would not properly handle
overflow when calculating the bounds of possible values; abs(INT_MIN) is
not a meaningful number.
Instead, check and handle INT_MIN by reasoning that the largest value is
INT_MIN/-2 and the smallest value is INT_MIN.
This fixes PR20199.
llvm-svn: 212307
On at least my machine, ar does not register an all symbols read hook (which
previously triggered target initialization), but it does register a claim
files hook, which depends on the targets being initialized.
Differential Revision: http://reviews.llvm.org/D4372
llvm-svn: 212303
This rename makes it easier to identify the specific overload being called
in each particular case and makes future refactorings easier.
Differential Revision: http://reviews.llvm.org/D4370
llvm-svn: 212302
This patch:
1) Improves the cost model for x86 alternate shuffles (originally
added at revision 211339);
2) Teaches the Cost Model Analysis pass how to analyze alternate shuffles.
Alternate shuffles are a special kind of blend; on x86, we can often
easily lowered alternate shuffled into single blend
instruction (depending on the subtarget features).
The existing cost model didn't take into account subtarget features.
Also, it had a couple of "dead" entries for vector types that are never
legal (example: on x86 types v2i32 and v2f32 are not legal; those are
always either promoted or widened to 128-bit vector types).
The new x86 cost model takes into account what target features we have
before returning the shuffle cost (i.e. the number of instructions
after the blend is lowered/expanded).
This patch also teaches the Cost Model Analysis how to identify and analyze
alternate shuffles (i.e. 'SK_Alternate' shufflevector instructions):
- added function 'isAlternateVectorMask';
- added some logic to check if an instruction is a alternate shuffle and, in
case, call the target specific TTI to get the corresponding shuffle cost;
- added a test to verify the cost model analysis on alternate shuffles.
llvm-svn: 212296
symbol’s name. On darwin the -j flag is used (often in combinations
with other flags) to produce a complete list of symbol names which
than can then be reorder and used with ld(1)’s -order_file.
llvm-svn: 212294
This patch adds tablegen patterns to select F16C float-to-half-float
conversion instructions from 'f32_to_f16' and 'f16_to_f32' dag nodes.
If the target doesn't have F16C, then 'f32_to_f16' and 'f16_to_f32'
are expanded into library calls.
llvm-svn: 212293
This should allow llvm-ar to be used instead of gnu ar + plugin in a LTO
build. I will add a release note about it once I finish a LTO bootstrap with it.
llvm-svn: 212287
Exposes more constant globals that can be removed by
the global optimizer. A specific example is the removal
of the static global block address array in
clang/test/CodeGen/indirect-goto.c. This change impacts only
lower optimization levels. With LTO interprocedural
const prop runs already before global opt.
llvm-svn: 212284
This patch sets the 'KeepReg' bit for any tied and live registers during the PrescanInstruction() phase of the dependency breaking algorithm. It then checks those 'KeepReg' bits during the ScanInstruction() phase to avoid changing any tied registers. For more details, please see comments in:
http://llvm.org/bugs/show_bug.cgi?id=20020
I added two FIXME comments for code that I think can be removed by using register iterators that include self. I don't want to include those code changes with this patch, however, to keep things as small as possible.
The test case is larger than I'd like, but I don't know how to reduce it further and still produce the failing asm.
Differential Revision: http://reviews.llvm.org/D4351
llvm-svn: 212275
The PowerPC 128-bit long double data type (ppcf128 in LLVM) is in fact a
pair of two doubles, where one is considered the "high" or
more-significant part, and the other is considered the "low" or
less-significant part. When a ppcf128 value is stored in memory or a
register pair, the high part always comes first, i.e. at the lower
memory address or in the lower-numbered register, and the low part
always comes second. This is true both on big-endian and little-endian
PowerPC systems. (Similar to how with a complex number, the real part
always comes first and the imaginary part second, no matter the byte
order of the system.)
This was implemented incorrectly for little-endian systems in LLVM.
This commit fixes three related issues:
- When printing an immediate ppcf128 constant to assembler output
in emitGlobalConstantFP, emit the high part first on both big-
and little-endian systems.
- When lowering a ppcf128 type to a pair of f64 types in SelectionDAG
(which is used e.g. when generating code to load an argument into a
register pair), use correct low/high part ordering on little-endian
systems.
- In a related issue, because lowering ppcf128 into a pair of f64 must
operate differently from lowering an int128 into a pair of i64,
bitcasts between ppcf128 and int128 must not be optimized away by the
DAG combiner on little-endian systems, but must effect a word-swap.
Reviewed by Hal Finkel.
llvm-svn: 212274
With this change all values passed through blacklisted functions
become fully initialized. Previous behavior was to initialize all
loads in blacklisted functions, but apply normal shadow propagation
logic for all other operation.
This makes blacklist applicable in a wider range of situations.
It also makes code for blacklisted functions a lot shorter, which
works as yet another workaround for PR17409.
llvm-svn: 212268