A test in test/Generic creates a DAG where the NZCV output of an ADCS is used
by multiple nodes. This makes LLVM want to save a copy of NZCV for later, which
it couldn't do before.
This should be the last fix required for the aarch64 buildbot.
llvm-svn: 209651
Cortex-M4 only has single-precision floating point support, so any LLVM
"double" type will have been split into 2 i32s by now. Fortunately, the
consecutive-register framework turns out to be precisely what's needed to
reconstruct the double and follow AAPCS-VFP correctly!
rdar://problem/17012966
llvm-svn: 209650
These are tested by test/CodeGen/Generic, so we should probably know
how to deal with them. Fortunately generic code does it if asked.
llvm-svn: 209646
Summary:
Implemented an InstCombine transformation that takes a blendv* intrinsic
call and translates it into an IR select, if the mask is constant.
This will eventually get lowered into blends with immediates if possible,
or pblendvb (with an option to further optimize if we can transform the
pblendvb into a blend+immediate instruction, depending on the selector).
It will also enable optimizations by the IR passes, which give up on
sight of the intrinsic.
Both the transformation and the lowering of its result to asm got shiny
new tests.
The transformation is a bit convoluted because of blendvp[sd]'s
definition:
Its mask is a floating point value! This forces us to convert it and get
the highest bit. I suppose this happened because the mask has type
__m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin.
I will send an email to llvm-dev to discuss if we want to change this or
not.
Reviewers: grosbach, delena, nadav
Differential Revision: http://reviews.llvm.org/D3859
llvm-svn: 209643
This commit is debatable. There are two possible approaches, neither
of which is really satisfactory:
1. Use "@foo(i1 zeroext)" to mean an extension to 32-bits on Darwin,
and 8 bits otherwise.
2. Redefine "@foo(i1)" to mean that the i1 is extended by the caller
to 8 bits. This goes against the spirit of "zeroext" I think, but
it's a bit of a vague construct anyway (by definition you're going
to extend to the amount required by the ABI, that's why it's the
ABI!).
This implements option 2. The DAG machinery really isn't setup for the
first (there's a fairly strong assumption that "zeroext" goes to at
least the smallest register size), and even if it was the resulting
DAG looks like it would be inferior in many cases.
Theoretically we could add AssertZext nodes in the consumers of
ABI-passed values too now, but this actually seems to make the code
worse in practice by making truncation proceed in two steps. The code
produced is equally valid if we continue to assume only the low bit is
defined.
Should fix PR19850
llvm-svn: 209637
We can eliminate the custom C++ code in favour of some TableGen to
check the same things. Functionality should be identical, except for a
buffer overrun that was present in the C++ code and meant webkit
failed if any small argument needed to be passed on the stack.
llvm-svn: 209636
and via the command line, mirroring similar functionality in LoopUnroll. In
situations where clients used custom unrolling thresholds, their intent could
previously be foiled by LoopRotate having a hardcoded threshold.
llvm-svn: 209617
Seems my previous fix was insufficient - we were still not adding the
inlined function to the abstract scope list. Which meant it wasn't
flagged as inline, didn't have nested lexical scopes in the abstract
definition, and didn't have abstract variables - so the inlined variable
didn't reference an abstract variable, instead being described
completely inline.
llvm-svn: 209602
Currently we look at the Aliasee to decide what type of export
directive to use. It seems better to use the type of the alias
directly. This is similar to how we handle the alias having the
same address but other attributes (linkage, visibility) from the
aliasee.
With this patch it is now possible to do things like
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc"
@foo = global [6 x i8] c"\B8*\00\00\00\C3", section ".text", align 16
@f = dllexport alias i32 (), [6 x i8]* @foo
!llvm.module.flags = !{!0}
!0 = metadata !{i32 6, metadata !"Linker Options", metadata !1}
!1 = metadata !{metadata !2, metadata !3}
!2 = metadata !{metadata !"/DEFAULTLIB:libcmt.lib"}
!3 = metadata !{metadata !"/DEFAULTLIB:oldnames.lib"}
llvm-svn: 209600
This extension point allows adding passes that perform peephole optimizations
similar to the instruction combiner. These passes will be inserted after
each instance of the instruction combiner pass.
Differential Revision: http://reviews.llvm.org/D3905
llvm-svn: 209595
The code emitted is what would be expected for the small model, so it
shouldn't be used when objects can be the full 64-bits away.
This fixes MCJIT tests on Linux.
llvm-svn: 209585
This makes front/back symmetric with begin/end, avoiding some confusion.
Added instr_front/instr_back for the old behavior, corresponding to
instr_begin/instr_end. Audited all three in-tree users of back(), all
of them look like they don't want to look inside bundles.
Fixes an assertion (PR19815) when generating debug info on mips, where a
delay slot was bundled at the end of a branch.
llvm-svn: 209580
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.
"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.
This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.
llvm-svn: 209577
I'm doing this in two phases for a better "git blame" record. This
commit removes the previous AArch64 backend and redirects all
functionality to ARM64. It also deduplicates test-lines and removes
orphaned AArch64 tests.
The next step will be "git mv ARM64 AArch64" and rewire most of the
tests.
Hopefully LLVM is still functional, though it would be even better if
no-one ever had to care because the rename happens straight
afterwards.
llvm-svn: 209576
After the load/store refactoring, we were sometimes trying to feed a
GPR64 into a 32-bit register offset operand. This failed in
copyPhysReg.
llvm-svn: 209566
This seems like a simple cleanup/improved consistency, but also helps
lay the foundation to fix the bug mentioned in the test case: concrete
definitions preceeding any inlined usage aren't properly split into
concrete + abstract (because they're not known to need it until it's too
late).
Once we start deferring this choice until later, we won't have the
choice to put concrete definitions for inlined subroutines in a
different scope from concrete definitions for non-inlined subroutines
(since we won't know at time-of-construction which one it'll be). This
change brings those two cases into alignment ahead of that future
chaneg/fix.
llvm-svn: 209547
This is a follow-up to r209358: PR19799: Indvars miscompile due to an
incorrect max backedge taken count from SCEV.
That fix was incomplete as pointed out by Arnold and Michael Z. The
code was also too confusing. It needed a careful rewrite with more
unit tests. This version will also happen to optimize more cases.
<rdar://17005101> PR19799: Indvars miscompile...
llvm-svn: 209545
This matches both what we do for the non-thread case and what gcc does.
With this patch clang would match gcc's behaviour in
static __thread int a = 42;
extern __thread int b __attribute__((alias("a")));
int *f(void) { return &a; }
int *g(void) { return &b; }
if not for pr19843. Manually writing the IL does produce the same access modes.
It is also a step in the direction of fixing pr19844.
llvm-svn: 209543
Fixed a TODO in r207783.
Add the extracted constant offset using GEP instead of ugly
ptrtoint+add+inttoptr. Using GEP simplifies future optimizations and makes IR
easier to understand.
Updated all affected tests, and added a new test in split-gep.ll to cover a
corner case where emitting uglygep is necessary.
llvm-svn: 209537
We do all of our address arithmetic in 64-bit, and operations involving
logically negative 32-bit offsets (actually represented as unsigned 64 bit ints)
often overflow into higher bits. The overflow check could be preserved by
casting to uint32 at the callsite for applyRelocationValue, but this would
eliminate the value of the check.
The right way to handle overflow in relocations is to make relocation processing
target specific, and compute the values for RelocationEntry objects in the
appropriate types (32-bit for 32-bit targets, 64-bit for 64-bit targets). This
is coming as part of the cleanup I'm working on.
This fixes another i386 regression test.
<rdar://problem/16889891>
llvm-svn: 209536
Summary:
Add a second fixup table to MipsAsmBackend::getFixupKindInfo() to correctly
position llvm-mc's fixup placeholders for big-endian.
See PR19836 for full details of the issue. To summarize, the fixup placeholders
do not account for endianness properly and the implementations of
getFixupKindInfo() for each target are measuring MCFixupKindInfo.TargetOffset
from different ends of the instruction encoding to compensate.
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3889
llvm-svn: 209514
Summary:
Instead the system is required to provide some means of handling unaligned
load/store without special instructions. Options include full hardware
support, full trap-and-emulate, and hybrids such as hardware support within
a cache line and trap-and-emulate for multi-line accesses.
MipsSETargetLowering::allowsUnalignedMemoryAccesses() has been configured to
assume that unaligned accesses are 'fast' on the basis that I expect few
hardware implementations will opt for pure-software handling of unaligned
accesses. The ones that do handle it purely in software can override this.
mips64-load-store-left-right.ll has been merged into load-store-left-right.ll
The stricter testing revealed a Bits!=Bytes bug in passByValArg(). This has
been fixed and the variables renamed to clarify the units they hold.
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3872
llvm-svn: 209512
Some bit-set fields used in ELF file headers in fact contain two parts.
The first one is a regular bit-field. The second one is an enumeraion.
For example ELF header `e_flags` for MIPS target might contain the
following values:
Bit-set values:
EF_MIPS_NOREORDER = 0x00000001
EF_MIPS_PIC = 0x00000002
EF_MIPS_CPIC = 0x00000004
EF_MIPS_ABI2 = 0x00000020
Enumeration:
EF_MIPS_ARCH_32 = 0x50000000
EF_MIPS_ARCH_64 = 0x60000000
EF_MIPS_ARCH_32R2 = 0x70000000
EF_MIPS_ARCH_64R2 = 0x80000000
For printing bit-sets we use the `yaml::IO::bitSetCase()`. It does not
support bit-set/enumeration combinations and prints too many flags from
an enumeration part. This patch fixes this problem. New method
`yaml::IO::maskedBitSetCase()` handle "enumeration" part of bitset
defined by provided mask.
Patch reviewed by Nick Kledzik and Sean Silva.
llvm-svn: 209504
It's not really a "ScopeDIE", as such - it's the abstract function
definition's DIE. And we usually use "SP" for subprograms, rather than
"Sub".
llvm-svn: 209499
ScalarEvolution::isKnownPredicate() can wrongly reduce a comparison
when both the LHS and RHS are SCEVAddRecExprs. This checks that both
LHS and RHS are guarded in the case when both are SCEVAddRecExprs.
The test case is against indvars because I could not find a way to
directly test SCEV.
Patch by Sanjay Patel!
llvm-svn: 209487
i386.
This fixes two more MCJIT regression tests on i386:
ExecutionEngine/MCJIT/2003-05-06-LivenessClobber.ll
ExecutionEngine/MCJIT/2013-04-04-RelocAddend.ll
The implementation of processScatteredVANILLA is tasteless (*ba-dum-ching*),
but I'm working on a substantial tidy-up of RuntimeDyldMachO that should
improve things.
This patch also fixes a type-o in RuntimeDyldMachO::processSECTDIFFRelocation,
and teaches that method to skip over the PAIR reloc following the SECTDIFF.
<rdar://problem/16961886>
llvm-svn: 209478