across PHI nodes. The code was computing the Idxs from the 'GEP'
variable's indices when what it wanted was Op1's indices. This caused an
ASan heap-overflow for me that pin pointed the issue when Op1 had more
indices than GEP did. =] I'll let Louis add a specific test case for
this if he wants.
llvm-svn: 209857
The loop vectorizer instantiates be-taken-count + 1 as the loop iteration count.
If this expression overflows the generated code was invalid.
In case of overflow the code now jumps to the scalar loop.
Fixes PR17288.
llvm-svn: 209854
These tests ensure that a change I will propose in clang works as
expected.
Summary:
Added tests for the generation of blend+immediate instructions from a
shufflevector.
These tests were proposed along with a patch that was dropped. I'm
committing the tests anyway to protect against possible regressions in
codegen.
Reviewers: nadav, bkramer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3600
llvm-svn: 209853
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:
GEP1--\
|-->PHI1-->GEP3
GEP2--/
This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):
GEP1--\ --\ --\
|-->PHI1-->GEP3 ==> |-->PHI2->GEP12->GEP3 == > |-->PHI2->GEP123
GEP2--/ --/ --/
This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.
Tests included.
llvm-svn: 209843
without this case we would end on an infinite recursion: the remainder is zero,
so Numerator - Remainder is equal to Numerator and so we would recursively ask
for the division of Numerator by Denominator.
llvm-svn: 209838
when ScalarEvolution::getElementSize returns nullptr it is safe to early return
in ScalarEvolution::findArrayDimensions such that we avoid later problems when
we try to divide the terms by ElementSize.
llvm-svn: 209837
This makes it slightly harder to misuse Twines. It is still possible to
refer to destroyed temporaries with the regular constructors, though.
Patch by Marco Alesiani!
llvm-svn: 209832
This seems to match what gcc does for ppc and what every other llvm
backend does.
This is a fixed version of r209638. The difference is to avoid any change
in behavior for functions. The logic for using constant pools for function
addresseses is spread over a few places and we have to keep them in sync.
llvm-svn: 209821
field represents ELF section header sh_info field and does not have any
sense for regular sections. Its interpretation depends on section type.
llvm-svn: 209801
During loop-unroll, loop exits from the current loop may end up in in different
outer loop. This requires to re-form LCSSA recursively for one level down from
the outer most loop where loop exits are landed during unroll. This fixes PR18861.
Differential Revision: http://reviews.llvm.org/D2976
llvm-svn: 209796
Clang knows about the sanitizer blacklist and it makes no sense to
add global to the list of llvm.asan.dynamically_initialized_globals if it
will be blacklisted in the instrumentation pass anyway. Instead, we should
do as much blacklisting as possible (if not all) in the frontend.
llvm-svn: 209790
An address only use of an extract element of a load can be simplified to a
load. Without this the result of the extract element is spilled to the
stack so that an address is available.
llvm-svn: 209788
Don't assume that dynamically initialized globals are all initialized from
_GLOBAL__<module_name>I_ function. Instead, scan the llvm.global_ctors and
insert poison/unpoison calls to each function there.
Patch by Nico Weber!
llvm-svn: 209780
Add a function to combine two 32-bit integers into a 64-bit integer.
There are no calls to this function yet, although a subsequent change
will add some in LLDB.
Reviewers: rnk
Differential Revision: http://reviews.llvm.org/D3941
llvm-svn: 209777
No test because no in-tree targets change the bitwidth of the
setcc type depending on the bitwidth of the compared type.
Patch by Ke Bai
llvm-svn: 209771
Previously, DataTypes.h would #define a variety of symbols any time
they weren't already defined. However, some versions of Visual
Studio do provide the appropriate headers, so if those headers are
included after DataTypes.h, it can lead to macro redefinition
warnings.
The fix is to include the appropriate headers if they exist, and
only #define the symbols if the required header does not exist.
Patch by Zachary Turner!
---
The big change here is that we no longer have our own stdint.h
typedefs because now all supported toolchains have stdint.h.
Hooray!
llvm-svn: 209760
This matches gcc's behavior. It also seems natural given that aliases
contain other properties that govern how it is accessed (linkage,
visibility, dll storage).
Clang still has to be updated to expose this feature to C.
llvm-svn: 209759
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:
GEP1--\
|-->PHI1-->GEP3
GEP2--/
This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):
GEP1--\ --\ --\
|-->PHI1-->GEP3 ==> |-->PHI2->GEP12->GEP3 == > |-->PHI2->GEP123
GEP2--/ --/ --/
This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.
Tests included.
llvm-svn: 209755
This reverts r208640 (I've just XFAILed the test) because it broke ppc64/Linux
self-hosting. Because nearly every regression test triggers a segfault, I hope
this will be easy to fix.
llvm-svn: 209747
This patch implements two things:
1. If we know one number is positive and another is negative, we return true as
signed addition of two opposite signed numbers will never overflow.
2. Implemented TODO : If one of the operands only has one non-zero bit, and if
the other operand has a known-zero bit in a more significant place than it
(not including the sign bit) the ripple may go up to and fill the zero, but
won't change the sign. e.x - (x & ~4) + 1
We make sure that we are ignoring 0 at MSB.
Patch by Suyog Sarda.
llvm-svn: 209746
This reverts commit r209638 because it broke self-hosting on ppc64/Linux. (the
Clang-compiled TableGen would segfault because it jumped to an invalid address
from within _ZNK4llvm17ManagedStaticBase21RegisterManagedStaticEPFPvvEPFvS1_E
(which is within the command-line parameter registration process)).
llvm-svn: 209745
Add regression tests for the following transformation:
str X, [x20]
...
add x20, x20, #32
->
str X, [x20], #32
with X being either w0, x0, s0, d0 or q0.
llvm-svn: 209715
Add regression tests for the following transformation:
ldr X, [x20]
...
add x20, x20, #32
->
ldr X, [x20], #32
with X being either w0, x0, s0, d0 or q0.
llvm-svn: 209711
Use more straightforward way to represent the set of instruction
ranges where the location of a user variable is defined - vector of pairs
of instructions (defining start/end of each range),
instead of a flattened vector of instructions where some instructions
are supposed to start the range, and the rest are supposed to "clobber" it.
Simplify the code which generates actual .debug_loc entries.
No functionality change.
llvm-svn: 209698
This is a corner case I have stumbled upon when dealing with ARM64 type
conversions. I was not able to extract a testcase for the community codebase to
fail on. The patch conservatively discards a division that would have ended up
in an ICE due to a type mismatch when building a multiply expression. I have
also added code to a place that builds add expressions and in which we should be
careful not to pass in operands of different types.
llvm-svn: 209694
We do not need to compute the GCD anymore after we removed the constant
coefficients from the terms: the terms are now all parametric expressions and
there is no need to recognize constant terms that divide only a subset of the
terms. We only rely on the size of the terms, i.e., the number of operands in
the multiply expressions, to sort the terms and recognize the parametric
dimensions.
llvm-svn: 209693
No functional change is intended: instead of relying on the delinearization to
come up with the base pointer as a remainder of the divisions in the
delinearization, we just compute it from the array access and use that value.
We substract the base pointer from the SCEV to be delinearized and that
simplifies the work of the delinearizer.
llvm-svn: 209692
The delinearization is needed only to remove the non linearity induced by
expressions involving multiplications of parameters and induction variables.
There is no problem in dealing with constant times parameters, or constant times
an induction variable.
For this reason, the current patch discards all constant terms and multipliers
before running the delinearization algorithm on the terms. The only thing
remaining in the term expressions are parameters and multiply expressions of
parameters: these simplified term expressions are passed to the array shape
recognizer that will not recognize constant dimensions anymore: these will be
recognized as different strides in parametric subscripts.
The only important special case of a constant dimension is the size of elements.
Instead of relying on the delinearization to infer the size of an element,
compute the element size from the base address type. This is a much more precise
way of computing the element size than before, as we would have mixed together
the size of an element with the strides of the innermost dimension.
llvm-svn: 209691
Current implementation of calculateDbgValueHistory already creates the
keys in the expected order (user variables are listed in order of appearance),
and should do so later by contract.
No functionality change.
llvm-svn: 209690
I'm not sure exactly where/how we end up with an abstract DbgVariable
with a null DIE, but we do... looking into it & will add a test and/or
fix when I figure it out.
Currently shows up in selfhost or compiler-rt builds.
llvm-svn: 209683
%higher and %highest can have non-zero values only for offsets greater
than 2GB, which is highly unlikely, if not impossible when compiling a
single function. This makes long branch for MIPS64 3 instructions smaller.
Differential Revision: http://llvm-reviews.chandlerc.com/D3281.diff
llvm-svn: 209678
After much puppetry, here's the major piece of the work to ensure that
even when a concrete definition preceeds all inline definitions, an
abstract definition is still created and referenced from both concrete
and inline definitions.
Variables are still broken in this case (see comment in
dbg-value-inlined-parameter.ll test case) and will be addressed in
follow up work.
llvm-svn: 209677
A further step to correctly emitting concrete out of line definitions
preceeding inlined instances of the same program.
To do this, emission of subprograms must be delayed until required since
we don't know which (abstract only (if there's no out of line
definition), concrete only (if there are no inlined instances), or both)
DIEs are required at the start of the module.
To reduce the test churn in the following commit that actually fixes the
bug, this commit introduces the lazy DIE construction and cleans up test
cases that are impacted by the changes in the resulting DIE ordering.
llvm-svn: 209675
This is a precursor to fixing inlined debug info where the concrete,
out-of-line definition may preceed any inlined usage. To cope with this,
the attributes that may appear on the concrete definition or the
abstract definition are delayed until the end of the module. Then, if an
abstract definition was created, it is referenced (and no other
attributes are added to the out-of-line definition), otherwise the
attributes are added directly to the out-of-line definition.
In a couple of cases this causes not just reordering of attributes, but
reordering of types. When the creation of the attribute is delayed, if
that creation would create a type (such as for a DW_AT_type attribute)
then other top level DIEs may've been constructed during the delay,
causing the referenced type to be created and added after those
intervening DIEs. In the extreme case, in cross-cu-inlining.ll, this
actually causes the DW_TAG_basic_type for "int" to move from one CU to
another.
llvm-svn: 209674
This is an enhancement to SeparateConstOffsetFromGEP. With this patch, we can
extract a constant offset from "s/zext and/or/xor A, B".
Added a new test @ext_or to verify this enhancement.
Refactoring the code, I also extracted some common logic to function
Distributable.
llvm-svn: 209670
This old test didn't have the argument numbering that's now squirelled
away in the high bits of the line number in the DW_TAG_arg_variable
metadata.
Add the numbering and update the test to ensure arguments are in-order.
llvm-svn: 209669
Detected by Daniel Jasper, Ilia Filippov, and Andrea Di Biagio
Fixed the argument order to select (the mask semantics to blendv* are the
inverse of select) and fixed the tests
Added parenthesis to the assert condition
Ran clang-format
llvm-svn: 209667
In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there
is an optimization for certain patterns to generate one or two vector
splats followed by a vector add or subtract. This operation is
represented by a VADD_SPLAT in the selection DAG. Prior to this
patch, it was possible for the VADD_SPLAT to be assigned the wrong
data type, causing incorrect code generation. This patch corrects the
problem.
Specifically, the code previously assigned the value type of the
BUILD_VECTOR node to the newly generated VADD_SPLAT node. This is
correct much of the time, but not always. The problem is that the
call to isConstantSplat() may return a SplatBitSize that is not the
same as the number of bits in the original element vector type. The
correct type to assign is a vector type with the same element bit size
as SplatBitSize.
The included test case shows an example of this, where the
BUILD_VECTOR node has a type of v16i8. The vector to be built is {0,
16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}. isConstantSplat
detects that we can generate a splat of 16 for type v8i16, which is
the type we must assign to the VADD_SPLAT node. If we do not, we
generate a vspltisb of 8 and a vaddubm, which generates the incorrect
result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16}. The correct code generation is a vspltish of 8 and a vadduhm.
This patch also corrected code generation for
CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked
as an XFAIL, so we can remove the XFAIL from the test case.
llvm-svn: 209662
A test in test/Generic creates a DAG where the NZCV output of an ADCS is used
by multiple nodes. This makes LLVM want to save a copy of NZCV for later, which
it couldn't do before.
This should be the last fix required for the aarch64 buildbot.
llvm-svn: 209651
Cortex-M4 only has single-precision floating point support, so any LLVM
"double" type will have been split into 2 i32s by now. Fortunately, the
consecutive-register framework turns out to be precisely what's needed to
reconstruct the double and follow AAPCS-VFP correctly!
rdar://problem/17012966
llvm-svn: 209650
These are tested by test/CodeGen/Generic, so we should probably know
how to deal with them. Fortunately generic code does it if asked.
llvm-svn: 209646
Summary:
Implemented an InstCombine transformation that takes a blendv* intrinsic
call and translates it into an IR select, if the mask is constant.
This will eventually get lowered into blends with immediates if possible,
or pblendvb (with an option to further optimize if we can transform the
pblendvb into a blend+immediate instruction, depending on the selector).
It will also enable optimizations by the IR passes, which give up on
sight of the intrinsic.
Both the transformation and the lowering of its result to asm got shiny
new tests.
The transformation is a bit convoluted because of blendvp[sd]'s
definition:
Its mask is a floating point value! This forces us to convert it and get
the highest bit. I suppose this happened because the mask has type
__m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin.
I will send an email to llvm-dev to discuss if we want to change this or
not.
Reviewers: grosbach, delena, nadav
Differential Revision: http://reviews.llvm.org/D3859
llvm-svn: 209643
This commit is debatable. There are two possible approaches, neither
of which is really satisfactory:
1. Use "@foo(i1 zeroext)" to mean an extension to 32-bits on Darwin,
and 8 bits otherwise.
2. Redefine "@foo(i1)" to mean that the i1 is extended by the caller
to 8 bits. This goes against the spirit of "zeroext" I think, but
it's a bit of a vague construct anyway (by definition you're going
to extend to the amount required by the ABI, that's why it's the
ABI!).
This implements option 2. The DAG machinery really isn't setup for the
first (there's a fairly strong assumption that "zeroext" goes to at
least the smallest register size), and even if it was the resulting
DAG looks like it would be inferior in many cases.
Theoretically we could add AssertZext nodes in the consumers of
ABI-passed values too now, but this actually seems to make the code
worse in practice by making truncation proceed in two steps. The code
produced is equally valid if we continue to assume only the low bit is
defined.
Should fix PR19850
llvm-svn: 209637
We can eliminate the custom C++ code in favour of some TableGen to
check the same things. Functionality should be identical, except for a
buffer overrun that was present in the C++ code and meant webkit
failed if any small argument needed to be passed on the stack.
llvm-svn: 209636
Add tests for the following transform:
str X, [x0, #32]
...
add x0, x0, #32
->
str X, [x0, #32]!
with X being either w1, x1, s0, d0 or q0.
llvm-svn: 209627
We have a couple of regression tests for load/store pairing, but (to my knowledge) there are no regression tests for the load/store + add/sub folding.
As a first step towards increased test coverage of this area, this commit adds a test for one instance of a load + add to pre-indexed load transformation.
llvm-svn: 209618
and via the command line, mirroring similar functionality in LoopUnroll. In
situations where clients used custom unrolling thresholds, their intent could
previously be foiled by LoopRotate having a hardcoded threshold.
llvm-svn: 209617
This was previously regressed/broken by r192749 (reverted due to this
issue in r192938) and I was about to break it again by accident with
some more invasive changes that deal with the subprogram lists. So to
avoid that and further issues - here's a test.
It's a pretty basic test - in both r192749 and my impending case, this
test would crash, but checking the basics (that we put a subprogram in
just one of the two CUs) seems like a good start.
We still get this wrong in weird ways if the linkonce-odr function
happens to not be identical in the metadata (because it's defined in two
different files (hence the # line directives in this test), etc) even
though it meets the language requirements (identical token stream) for
such a thing. That results in two subprogram DIEs, but only one of them
gets the parameter and high/low pc information, etc. We probably need to
use the DIRef infrastructure to deduplicate functions as we do types to
address this issue - or perhaps teach the BC linker to remove the
duplicate entries in subprogram lists?
llvm-svn: 209614
Remove the use of the std::function and replace the capturing lambda with a
non-capturing one, opting to pass the user data down to the context. This is
needed as std::function is not yet available on all hosted platforms (it
requires RTTI, which breaks on Windows).
Thanks to Nico Rieck for pointing this out!
llvm-svn: 209607
Move the implementation of the Win64 EH printer from the COFFDumper into its own
class. This is in preparation for adding support to print ARM EH information.
The only real change here is in printUnwindInfo where we now lambda lift the
implicit this parameter for the resolveFunction. Also setup the printing to
handle ARM. This now has set the stage to introduce ARM EH printing.
llvm-svn: 209606
Make the use of the cache more transparent to the users. There is no reason
that the cached entries really need to be passed along. The overhead for doing
so is minimal: a single extra parameter. This requires that some standalone
functions be brought into the COFFDumper class so that they may access the
cache.
llvm-svn: 209604
Switch to use references for parameters that are guaranteed to be non-null.
Simplifies the code a slight bit in preparation for another change.
llvm-svn: 209603
Seems my previous fix was insufficient - we were still not adding the
inlined function to the abstract scope list. Which meant it wasn't
flagged as inline, didn't have nested lexical scopes in the abstract
definition, and didn't have abstract variables - so the inlined variable
didn't reference an abstract variable, instead being described
completely inline.
llvm-svn: 209602
We still do temporary files in many cases, just updating this particular
one because I was debugging it and made this change while doing so.
llvm-svn: 209601
Currently we look at the Aliasee to decide what type of export
directive to use. It seems better to use the type of the alias
directly. This is similar to how we handle the alias having the
same address but other attributes (linkage, visibility) from the
aliasee.
With this patch it is now possible to do things like
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc"
@foo = global [6 x i8] c"\B8*\00\00\00\C3", section ".text", align 16
@f = dllexport alias i32 (), [6 x i8]* @foo
!llvm.module.flags = !{!0}
!0 = metadata !{i32 6, metadata !"Linker Options", metadata !1}
!1 = metadata !{metadata !2, metadata !3}
!2 = metadata !{metadata !"/DEFAULTLIB:libcmt.lib"}
!3 = metadata !{metadata !"/DEFAULTLIB:oldnames.lib"}
llvm-svn: 209600
This extension point allows adding passes that perform peephole optimizations
similar to the instruction combiner. These passes will be inserted after
each instance of the instruction combiner pass.
Differential Revision: http://reviews.llvm.org/D3905
llvm-svn: 209595
The code emitted is what would be expected for the small model, so it
shouldn't be used when objects can be the full 64-bits away.
This fixes MCJIT tests on Linux.
llvm-svn: 209585
This makes front/back symmetric with begin/end, avoiding some confusion.
Added instr_front/instr_back for the old behavior, corresponding to
instr_begin/instr_end. Audited all three in-tree users of back(), all
of them look like they don't want to look inside bundles.
Fixes an assertion (PR19815) when generating debug info on mips, where a
delay slot was bundled at the end of a branch.
llvm-svn: 209580
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.
"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.
This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.
llvm-svn: 209577
I'm doing this in two phases for a better "git blame" record. This
commit removes the previous AArch64 backend and redirects all
functionality to ARM64. It also deduplicates test-lines and removes
orphaned AArch64 tests.
The next step will be "git mv ARM64 AArch64" and rewire most of the
tests.
Hopefully LLVM is still functional, though it would be even better if
no-one ever had to care because the rename happens straight
afterwards.
llvm-svn: 209576
After the load/store refactoring, we were sometimes trying to feed a
GPR64 into a 32-bit register offset operand. This failed in
copyPhysReg.
llvm-svn: 209566
In an effort to fix inlined debug info in situations where the out of
line definition of a function preceeds any inlined usage, the order in
which some attributes are added to subprogram DIEs may change. (in
essence, definition-necessary attributes like DW_AT_low_pc/high_pc will
be added immediately, but the names, types, and other features will be
delayed to module end where they may either be added to the subprogram
DIE or instead reference an abstract definition for those values)
These tests can be generalized to be resilient to this change. 5 or so
tests actually have to be incompatibly changed to cope with this
reordering and will go along with the change that affects the order.
llvm-svn: 209554
This seems like a simple cleanup/improved consistency, but also helps
lay the foundation to fix the bug mentioned in the test case: concrete
definitions preceeding any inlined usage aren't properly split into
concrete + abstract (because they're not known to need it until it's too
late).
Once we start deferring this choice until later, we won't have the
choice to put concrete definitions for inlined subroutines in a
different scope from concrete definitions for non-inlined subroutines
(since we won't know at time-of-construction which one it'll be). This
change brings those two cases into alignment ahead of that future
chaneg/fix.
llvm-svn: 209547
This is a follow-up to r209358: PR19799: Indvars miscompile due to an
incorrect max backedge taken count from SCEV.
That fix was incomplete as pointed out by Arnold and Michael Z. The
code was also too confusing. It needed a careful rewrite with more
unit tests. This version will also happen to optimize more cases.
<rdar://17005101> PR19799: Indvars miscompile...
llvm-svn: 209545
This matches both what we do for the non-thread case and what gcc does.
With this patch clang would match gcc's behaviour in
static __thread int a = 42;
extern __thread int b __attribute__((alias("a")));
int *f(void) { return &a; }
int *g(void) { return &b; }
if not for pr19843. Manually writing the IL does produce the same access modes.
It is also a step in the direction of fixing pr19844.
llvm-svn: 209543
Fixed a TODO in r207783.
Add the extracted constant offset using GEP instead of ugly
ptrtoint+add+inttoptr. Using GEP simplifies future optimizations and makes IR
easier to understand.
Updated all affected tests, and added a new test in split-gep.ll to cover a
corner case where emitting uglygep is necessary.
llvm-svn: 209537
We do all of our address arithmetic in 64-bit, and operations involving
logically negative 32-bit offsets (actually represented as unsigned 64 bit ints)
often overflow into higher bits. The overflow check could be preserved by
casting to uint32 at the callsite for applyRelocationValue, but this would
eliminate the value of the check.
The right way to handle overflow in relocations is to make relocation processing
target specific, and compute the values for RelocationEntry objects in the
appropriate types (32-bit for 32-bit targets, 64-bit for 64-bit targets). This
is coming as part of the cleanup I'm working on.
This fixes another i386 regression test.
<rdar://problem/16889891>
llvm-svn: 209536
Summary:
Add a second fixup table to MipsAsmBackend::getFixupKindInfo() to correctly
position llvm-mc's fixup placeholders for big-endian.
See PR19836 for full details of the issue. To summarize, the fixup placeholders
do not account for endianness properly and the implementations of
getFixupKindInfo() for each target are measuring MCFixupKindInfo.TargetOffset
from different ends of the instruction encoding to compensate.
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3889
llvm-svn: 209514
Summary:
Instead the system is required to provide some means of handling unaligned
load/store without special instructions. Options include full hardware
support, full trap-and-emulate, and hybrids such as hardware support within
a cache line and trap-and-emulate for multi-line accesses.
MipsSETargetLowering::allowsUnalignedMemoryAccesses() has been configured to
assume that unaligned accesses are 'fast' on the basis that I expect few
hardware implementations will opt for pure-software handling of unaligned
accesses. The ones that do handle it purely in software can override this.
mips64-load-store-left-right.ll has been merged into load-store-left-right.ll
The stricter testing revealed a Bits!=Bytes bug in passByValArg(). This has
been fixed and the variables renamed to clarify the units they hold.
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3872
llvm-svn: 209512
to have only some of the loop's memory instructions be annotated and still _help_
the loop carried dependence analysis.
This was discussed in the llvmdev ML (topic: "parallel loop metadata question").
llvm-svn: 209507
Some bit-set fields used in ELF file headers in fact contain two parts.
The first one is a regular bit-field. The second one is an enumeraion.
For example ELF header `e_flags` for MIPS target might contain the
following values:
Bit-set values:
EF_MIPS_NOREORDER = 0x00000001
EF_MIPS_PIC = 0x00000002
EF_MIPS_CPIC = 0x00000004
EF_MIPS_ABI2 = 0x00000020
Enumeration:
EF_MIPS_ARCH_32 = 0x50000000
EF_MIPS_ARCH_64 = 0x60000000
EF_MIPS_ARCH_32R2 = 0x70000000
EF_MIPS_ARCH_64R2 = 0x80000000
For printing bit-sets we use the `yaml::IO::bitSetCase()`. It does not
support bit-set/enumeration combinations and prints too many flags from
an enumeration part. This patch fixes this problem. New method
`yaml::IO::maskedBitSetCase()` handle "enumeration" part of bitset
defined by provided mask.
Patch reviewed by Nick Kledzik and Sean Silva.
llvm-svn: 209504
It's not really a "ScopeDIE", as such - it's the abstract function
definition's DIE. And we usually use "SP" for subprograms, rather than
"Sub".
llvm-svn: 209499
Rafael correctly pointed out that the restriction is unnecessary. Although the
tests are intended to ensure that we dont abort due to an assertion, running the
tests in all modes is better since it also ensures that we dont crash without
assertions. Always run these tests to ensure that we can handle invalid input
correctly.
llvm-svn: 209496
ScalarEvolution::isKnownPredicate() can wrongly reduce a comparison
when both the LHS and RHS are SCEVAddRecExprs. This checks that both
LHS and RHS are guarded in the case when both are SCEVAddRecExprs.
The test case is against indvars because I could not find a way to
directly test SCEV.
Patch by Sanjay Patel!
llvm-svn: 209487
Windows can't handle paths longer than 260 code points without \\?\. Even
with \\?\ it can't handle path components longer than 255 code points. So
limit graph names to the arbitrary length of 140. Random characters are still
added to the end, so it's ok if graph names collide.
Differential Revision: http://reviews.llvm.org/D3883
llvm-svn: 209483
i386.
This fixes two more MCJIT regression tests on i386:
ExecutionEngine/MCJIT/2003-05-06-LivenessClobber.ll
ExecutionEngine/MCJIT/2013-04-04-RelocAddend.ll
The implementation of processScatteredVANILLA is tasteless (*ba-dum-ching*),
but I'm working on a substantial tidy-up of RuntimeDyldMachO that should
improve things.
This patch also fixes a type-o in RuntimeDyldMachO::processSECTDIFFRelocation,
and teaches that method to skip over the PAIR reloc following the SECTDIFF.
<rdar://problem/16961886>
llvm-svn: 209478
Use 4 since that's probably what it will be for spir.
Move ADDRESS_NONE to the end to keep the constant_buffer_* values
unchanged, since apparently a bunch of r600 tests use those directly.
llvm-svn: 209463
Summary:
This patch moves the handling of -pass-remarks* over to
lib/DiagnosticInfo.cpp. This allows the removal of the
optimizationRemarkEnabledFor functions from LLVMContextImpl, as they're
not needed anymore.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3878
llvm-svn: 209453
This patch teaches the x86 backend how to efficiently lower ISD::BITCAST dag
nodes from MVT::f64 to MVT::v4i16 (and vice versa), and from MVT::f64 to
MVT::v8i8 (and vice versa).
This patch extends the logic from revision 208107 to also handle MVT::v4i16
and MVT::v8i8. Also, this patch correctly propagates Undef values when
performing the widening of a vector (example: when widening from v2i32 to
v4i32, the upper 64bits of the resulting vector are 'undef').
llvm-svn: 209451
Summary:
This adds two new diagnostics: -pass-remarks-missed and
-pass-remarks-analysis. They take the same values as -pass-remarks but
are intended to be triggered in different contexts.
-pass-remarks-missed is used by LLVMContext::emitOptimizationRemarkMissed,
which passes call when they tried to apply a transformation but
couldn't.
-pass-remarks-analysis is used by LLVMContext::emitOptimizationRemarkAnalysis,
which passes call when they want to inform the user about analysis
results.
The patch also:
1- Adds support in the inliner for the two new remarks and a
test case.
2- Moves emitOptimizationRemark* functions to the llvm namespace.
3- Adds an LLVMContext argument instead of making them member functions
of LLVMContext.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3682
llvm-svn: 209442
We should be keeping track of the writeback on these instructions,
otherwise we're relying on LLVM's stupidity for correct code.
Fortunately, the MC layer can now handle all required constraints,
which means we can get rid of the CodeGen only PseudoInsts too.
llvm-svn: 209426
This changes ARM64 to use separate operands for each component of an
address, and look for separate '[', '$Rn, ..., ']' tokens when
parsing.
This allows us to do away with quite a bit of special C++ code to
handle monolithic "addressing modes" in the MC components. The more
incremental matching of the assembler operands also allows for better
diagnostics when LLVM is presented with invalid input.
Most of the complexity here is with the register-offset instructions,
which were extremely dodgy beforehand: even when the instruction used
wM, LLVM's model had xM as an operand. We papered over this
discrepancy before, but that approach doesn't work now so I split them
into separate X and W variants.
llvm-svn: 209425
Summary:
* Split into two functions, one to test each struct.
* R0 and R2 must be defined by an lw with a %got reference to the correct
symbol.
* Test for $4 (first argument) where appropriate instead of accepting any
register.
* Test that the two lbu's are correctly combined into $4
Depends on D3844
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3845
llvm-svn: 209424
Summary:
lwl and lwr are not available in MIPS32r6/MIPS64r6. The purpose of the test
is to check that the '$1' expands to '0($x)' rather than to test something related
to the lwl or lwr instructions so we can simply switch to lw.
Depends on D3842
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3844
llvm-svn: 209423
Summary:
This patch is necessary so that they do not fail on MIPS32r6/MIPS64r6 when
-integrated-as is enabled by default and we correctly detect the host CPU.
No functional change since these tests are testing the behaviour of the
constraint used for the third operand rather than the mnemonic.
Depends on D3842
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3843
llvm-svn: 209421
Summary: Depends on D3787. Tablegen will raise an assertion without it.
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3842
llvm-svn: 209419
Summary:
These emit the 'unknown instruction' instead of the correct error
because they have not been implemented in LLVM for any MIPS ISA.
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3841
llvm-svn: 209418
Summary:
This required me to implement the disassembler for MIPS64r6 since the encodings
are ambiguous with other instructions. This in turn revealed a few
assembly/disassembly bugs which I have fixed.
* da[ht]i only take two operands according to the spec, not three.
* DecodeBranchTarget2[16] correctly handles wider immediates than simm16
* Also made non-functional change to DecodeBranchTarget and
DecodeBranchTargetMM to keep implementation style consistent between
them.
* Difficult encodings are handled by a custom decode method on the most
general encoding in the group. This method will convert the MCInst to a
different opcode if necessary.
DecodeBranchTarget is not currently the inverse of getBranchTargetOpValue
so disassembling some branch instructions emit incorrect output. This seems
to affect branches with delay slots on all MIPS ISA's. I've left this bug
for now and temporarily removed the check for the immediate on
bc[12]eqz/bc[12]nez in the MIPS32r6/MIPS64r6 tests.
jialc and jic crash the disassembler for some reason. I've left these
instructions commented out for the moment.
Depends on D3760
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3761
llvm-svn: 209415
Should be no change in behaviour, but it makes the intended
functionality a bit clearer and means we only have to reason about
real extend operations.
llvm-svn: 209409
This intrinsic permits the emission of platform specific undefined sequences.
ARM has reserved the 0xde opcode which takes a single integer parameter (ignored
by the CPU). This permits the operating system to implement custom behaviour on
this trap. The llvm.arm.undefined intrinsic is meant to provide a means for
generating the target specific behaviour from the frontend. This is
particularly useful for Windows on ARM which has made use of a series of these
special opcodes.
llvm-svn: 209390
Now that clang can be used as an assembler via the IAS, invalid assembler inputs
would cause the assertions to trigger. Although we cannot recover from the
errors here, nor provide caret diagnostics, attempt to handle them slightly more
gracefully by reporting a fatal error.
llvm-svn: 209387
constructSubprogramDIE was already called for every subprogram in every
CU when the module was started - there's no need to call it again at
module finalization.
llvm-svn: 209372
This has to do with the trip count computation for loops with multiple
exits, which is quite subtle. Most passes just ask for a single trip
count number, so we must be conservative assuming any exit could be
taken. Normally, we rely on the "exact" trip count, which was
correctly given as "unknown". However, SCEV also gives a "max"
back-edge taken count. The loops max BE taken count is conservatively
a maximum over the max of each exit's non-exiting iterations
count. Note that some exit tests can be skipped so the max loop
back-edge taken count can actually exceed the max non-exiting
iterations for some exits. However, when we know the loop *latch*
cannot be skipped, we can directly use its max taken count
disregarding other exits. I previously took the minimum here without
checking whether the other exit could be skipped. The correct, and
simpler thing to do here is just to directly use the loop latch's max
non-exiting iterations as the loops max back-edge count.
In the problematic test case, the first loop exit had a max of zero
non-exiting iterations, but could be skipped. The loop latch was known
not to be skipped but had max of one non-exiting iteration. We
incorrectly claimed the loop back-edge could be taken zero times, when
it is actually taken one time.
Fixes Loop %for.body.i: <multiple exits> Unpredictable backedge-taken count.
Loop %for.body.i: max backedge-taken count is 1.
llvm-svn: 209358
This reverts commit r208930, r208933, and r208975.
It seems not all fission consumers are ready to handle this behavior.
Reverting until tools are brought up to spec.
llvm-svn: 209338
This corrects the emission of IMAGE_REL_ARM_MOV32T relocations. Previously, we
were avoiding the high portion of the relocation too early. If there was a
section-relative relocation with an offset greater than 16-bits (65535), you
would end up truncating the high order bits of the offset. Allow the current
relocation representation to flow through out the MC layer to the object writer.
Use the new ability to restrict recorded relocations to avoid emitting the
relocation into the final object.
llvm-svn: 209337
Add support to allow a target specific COFF object writer to restrict the
recorded resolutions in the emitted object files. This is motivated by the need
in Windows on ARM, where an intermediate relocation needs to be prevented from
being emitted in the object file.
llvm-svn: 209336
Committed in r209178 then reverted in r209251 due to LTO breakage,
here's a proper fix for the case of the missing subprogram DIE. The DIEs
were there, just in other compile units. Using the SPMap we can find the
right compile unit to search for and produce cross-unit references to
describe this kind of inlining.
One existing test case needed to be updated because it had a function
that wasn't in the CU's subprogram list, so it didn't appear in the
SPMap.
llvm-svn: 209335
ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the
second argument.
On the other hand, BLENDI uses 0 to identify the first argument and 1 to
identify the second argument.
Fix the generation of the blend mask to account for this difference.
The bug did not show up with r209043, because we were not checking for the
actual arguments of the blend instruction!
This commit also fixes the test cases.
Note: The same mask works for the BLENDr variant because the arguments are
swapped during instruction selection (see the BLENDXXrr patterns).
<rdar://problem/16975435>
llvm-svn: 209324
Permit active macro expansions when terminating the assembler if there were
errors during the expansion. This would only trigger on invalid input when
built with assertions.
llvm-svn: 209309
Summary:
The minimal type needs to hold a value of '1ULL << 31' but
getMinimalTypeForRange() is called with a value of '1ULL << 32'.
This patch will also reduce the size of the matcher table when there are 8
or 16 SubtargetFeatures.
Also added a dump of the SubtargetFeatures to the -debug output and corrected getMinimalTypeInRange() to consider 0xffffffffull to be a 32-bit value.
The testcase is that no existing code is broken and that LLVM still successfully
compiles after adding MIPS64r6 CodeGen support.
Reviewers: rafael
Reviewed By: rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3787
llvm-svn: 209288
The .drectve section should be marked as IMAGE_SCN_LNK_REMOVE. This matches what
the MSVC toolchain does and accurately reflects that this section should not be
emitted into the final binary. This section is merely information for the
linker, comprising of additional linker directives.
llvm-svn: 209273
Although the previous code would construct a bundle and add the correct elements
to it, it would not finalise the bundle. This resulted in the InternalRead
markers not being added to the MachineOperands nor, more importantly, the
externally visible defs to the bundle itself. So, although the bundle was not
exposing the def, the generated code would be correct because there was no
optimisations being performed. When optimisations were enabled, the post
register allocator would kick in, and the hazard recognizer would reorder
operations around the load which would define the value being operated upon.
Rather than manually constructing the bundle, simply construct and finalise the
bundle via the finaliseBundle call after both MIs have been emitted. This
improves the code generation with optimisations where IMAGE_REL_ARM_MOV32T
relocations are emitted.
The changes to the other tests are the result of the bundle generation
preventing the scheduler from hoisting the moves across the loads. The net
effect of the generated code is equivalent, but, is much more identical to what
is actually being lowered.
llvm-svn: 209267
for undefined symbols, so it matches what COFFObjectFile::getSymbolAddress
does. This allows llvm-nm to print spaces instead of 0’s for the value
of undefined symbols in Mach-O files.
To make this change other uses of MachOObjectFile::getSymbolAddress
are updated to handle when the Value is returned as UnknownAddressOrSize.
Which is needed to keep two of the ExecutionEngine tests working for example.
llvm-svn: 209253
This reverts commit r209178.
This seems to be asserting in an LTO build on some internal Apple
buildbots. No upstream reproduction (and I don't have an LLVM-aware gold
built right now to reproduce it personally) but it's a small patch & the
failure's semi-plausible so I'm going to revert first while I try to
reproduce this.
llvm-svn: 209251
Povray and dealII currently assert with "Overran sorted position" in
AssignTopologicalOrder. The problem is that performPostLD1Combine can
introduce cycles.
Consider:
(insert_vector_elt (INSERT_SUBREG undef,
(load (add %vreg0, Constant<8>), undef), <= A
TargetConstant<2>),
(load %vreg0, undef), <= B
Constant<1>)
This is turned into a LD1LANEpost node. However the address in A is not a
valid user of the post-incremented address of B in LD1LANEpost.
llvm-svn: 209242
Undecided whether this should include a test case - SROA produces bad
dbg.value metadata describing a value for a reference that is actually
the value of the thing the reference refers to. For now, loosening the
assert lets this not assert, but it's still bogus/wrong output...
If someone wants to tell me to add a test, I'm willing/able, just
undecided. Hopefully we'll get SROA fixed soon & we can tighten up this
assertion again.
llvm-svn: 209240
make the functions to set them non-static.
Move and rename the llvm specific backend options to avoid conflicting
with the clang option.
Paired with a backend commit to update.
llvm-svn: 209238
This commit introduces a canonical representation for the formulae.
Basically, as soon as a formula has more that one base register, the scaled
register field is used for one of them. The register put into the scaled
register is preferably a loop variant.
The commit refactors how the formulae are built in order to produce such
representation.
This yields a more accurate, but still perfectible, cost model.
<rdar://problem/16731508>
llvm-svn: 209230
r208264 started asserting in `setLinkage()` and `setVisibility()` that
visibility and linkage are compatible. There are a few places in clang
where visibility is set first, and then linkage later, so the assert
fires. In `setLinkage()`, it's clear what the visibility *should* be,
so rather than updating all the call sites just automatically fix the
visibility.
The testcase for this is for *clang*, so it'll follow separately in cfe.
PR19760
llvm-svn: 209227
This change preserves the original algorithm of generating history
for user variables, but makes it more clear.
High-level description of algorithm:
Scan all the machine basic blocks and machine instructions in the order
they are emitted to the object file. Do the following:
1) If we see a DBG_VALUE instruction, add it to the history of the
corresponding user variable. Keep track of all user variables, whose
locations are described by a register.
2) If we see a regular instruction, look at all the registers it clobbers,
and terminate the location range for all variables described by these registers.
3) At the end of the basic block, terminate location ranges for all
user variables described by some register.
Although this change shouldn't be user-visible (the contents of .debug_loc section
should be the same), it changes some internal assumptions about the set
of instructions used to track the variable locations. Watching the bots.
llvm-svn: 209225
In refactoring DwarfUnit::isUnsignedDIType I restricted it to only work
on values with signedness (unsigned or signed), asserting on anything
else (which did uncover some bugs). But it turns out that we do need to
emit constants of signless data, such as pointer constants - only null
pointer constants are known to need this so far, but it's conceivable
that there might be non-null pointer constants at some point (hardcoded
address offsets for device drivers?).
This patch just uses 'unsigned' for signless data such as pointer
constants. Arguably we could use signless representations
(DW_FORM_dataN) instead, allowing a trinary result from isUnsignedDIType
(signed, unsigned, signless), but this seems reasonable for now.
llvm-svn: 209223