According to Intel Software Optimization Manual
on Silvermont INC or DEC instructions require
an additional uop to merge the flags.
As a result, a branch instruction depending
on an INC or a DEC instruction incurs a 1 cycle penalty.
Differential Revision: http://reviews.llvm.org/D3990
llvm-svn: 210466
Instructions from __nodebug__ functions don't have file:line
information even when inlined into no-nodebug functions. As a result,
intrinsics (SSE and other) from <*intrin.h> clang headers _never_
have file:line information.
With this change, an instruction without !dbg metadata gets one from
the call instruction when inlined.
Fixes PR19001.
llvm-svn: 210459
For each array index that is in the form of zext(a), convert it to sext(a)
if we can prove zext(a) <= max signed value of typeof(a). The conversion
helps to split zext(x + y) into sext(x) + sext(y).
Reviewed in http://reviews.llvm.org/D4060
llvm-svn: 210444
zext(a + b) != zext(a) + zext(b) even if a + b >= 0 && b >= 0.
e.g., a = i4 0b1111, b = i4 0b0001
zext a + b to i8 = zext 0b0000 to i8 = 0b00000000
(zext a to i8) + (zext b to i8) = 0b00001111 + 0b00000001 = 0b00010000
llvm-svn: 210439
To test cases that involve actual repetition (> 1 elements), at least
one element before the insertion point, and some elements of the
original range that still fit in that range space after insertion.
Actually we need coverage for the inverse case too (where no elements
after the insertion point fit into the previously allocated space), but
this'll do for now, and I might end up rewriting bits of SmallVector to
avoid that special case anyway.
llvm-svn: 210436
Before, we where looking at the size of the pointer type that specifies the
location from which to load the element. This did not make any sense at all.
This change fixes a bug in the delinearization where we failed to delinerize
certain load instructions.
llvm-svn: 210435
X86Subtarget::isTargetCygMing || X86Subtarget::isTargetKnownWindowsMSVC is
equivalent to all Windows environments. Simplify the check to isOSWindows.
NFC.
llvm-svn: 210431
Specifically this caused inserting an element from a SmallVector into
itself when such an insertion would cause a reallocation. We have code
to handle this for non-reallocating cases, but it's not robust against
reallocation.
llvm-svn: 210430
(& because it makes it easier to test, this also improves
correctness/performance slightly by moving the last element in an insert
operation, rather than copying it)
llvm-svn: 210429
Because we don't have a separate negate( ) function, 0 - NaN does double-duty as the IEEE-754 negate( ) operation, which (unlike most FP ops) *does* attach semantic meaning to the signbit of NaN.
llvm-svn: 210428
I saw at least a memory leak or two from inspection (on probably
untested error paths) and r206991, which was the original inspiration
for this change.
I ran this idea by Jim Grosbach a few weeks ago & he was OK with it.
Since it's a basically mechanical patch that seemed sufficient - usual
post-commit review, revert, etc, as needed.
llvm-svn: 210427
This would cause the last element in a range to be in a moved-from state
after an insert at a non-end position, losing that value entirely in the
process.
Side note: move_backward is subtle. It copies [A, B) to C-1 and down.
(the fact that it decrements both the second and third iterators before
the first movement is the subtle part... kind of surprising, anyway)
llvm-svn: 210426
1) The commit was made despite profound lack of understanding:
"I did not understand the comment about using dyn_cast instead of isa. I will
commit as is and make the update after. You can explain what you meant to me."
Commit first, understand later isn't OK.
2) Review comments were simply ignored:
"Can you edit the summary to describe what the patch is for? It appears to be
a list of commits at the moment."
3) The patch got LGTM'd off-list without any indication of readiness.
4) The public mailing list was excluded from patch review so all of this was
hidden from the community.
This reverts commit r210414.
llvm-svn: 210424
link.exe requires that the text section has the IMAGE_SCN_MEM_16BIT flag set.
Otherwise, it will treat the function as ARM. If this occurs, then jumps to the
function will fail, switching from thumb to ARM mode execution.
With this change, it is possible to link using the MSVC linker as well.
llvm-svn: 210415
Summary:
start to do simple constants
finish simplestore
add test case
format
Merge branch 'master' into 1756_8
Add basic functionality for assignment of ints. This creates a lot of core infrastructure in which to add, with little effort, quite a bit more to mips fast-isel
Merge branch 'master' into 1756_8
Add basic functionality for assignment of ints. This creates a lot of core infrastructure in which to add, with little effort, quite a bit more to mips fast-isel
in progress
finish integer materialize
test cases
test cases
in progress
Finish up fast-isel materialize for ints.
Finish materialize for ints
test cases
simplestorei.ll
Merge branch 'master' into 1756_8
fix fp constants for fast-isel
Merge branch '1758_1' of dmz-portal.mips.com:llvm into 1758_1
in progress
lastest for fp materialization
clean up
Merge branch 'master' into 1758_1
formatting
add test case
finish test case
Merge branch 'master' into 1758_2
Test Plan:
simplestore.ll
simplestore.ll
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3659
llvm-svn: 210414
Summary: Merge branch 'master' into 1758_6
Test Plan:
No functionality change. Run "make check" and run test-suite.
Because our servers are not yet running again I have not yet run test-suite.
I will further review myself before submission.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3819
llvm-svn: 210413
Summary:
Included this file which is needed to enable tablegen generated functionality
for fast mips-isel
Test Plan:
This has no visible functionality by itself but just adding the include
file creates some issues so I have it as a separate patch.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3812
llvm-svn: 210410
Rather than requiring ARM support for the ELF tests (which is odd), move the
tests that require ARM into a subdirectory to use lit to disable them if the
support is not present. Play this game to prevent disabling the ELF tests on
the Windows build bots as they have caught issues in the past with interactions
between various platforms.
llvm-svn: 210408
GAS documents the .type directive as having an optional comma following the key
symbol name when using the STT_<TYPE_IN_UPPER_CASE> form. However, it treats
the comma as optional in all cases. This makes the IAS support both forms of
inputs. Furthermore, the prefixed forms take either the upper case name or the
lower case alias.
The tests are split into two separate sets as the hash character serves as a
comment character on x86, which is tested in the second set by using arm-elf
which uses the at symbol as a comment character.
llvm-svn: 210407
This adjusts the section setup for the windows-itanium environment. This
environment does not report to be a known windows msvc environment, even though
it is (nearly) identical to the MSVC environment for C code.
llvm-svn: 210406
Add some whitespace, combine two sequential conditionals into a single one.
Reformat some section definitions to maintain uniformity in the function.
NFC.
llvm-svn: 210405
This was incurring an unsatisfied dependency on CodeGen from LTO breaking
shared builds:
Undefined symbols for architecture x86_64:
"llvm::initializeJumpInstrTablesPass(llvm::PassRegistry&)", referenced from:
llvm::LTOCodeGenerator::initializeLTOPasses() in LTOCodeGenerator.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Removed as a temporary measure pending feedback from the author.
llvm-svn: 210400
COFF/PE, so the relocation model is never static. Loosen the assertion
accordingly. The relocation can still be emitted properly, as it will be
converted to an IMAGE_REL_ARM_ADDR32 which will be resolved by the loader
taking the base relocation into account. This is necessary to permit the
emission of long calls which can be controlled via the -mlong-calls option in
the driver.
llvm-svn: 210399
Add a brief explanation of the data section layout for the unwind data that the
Windows on ARM EH models. This is simply to provide a rough idea of the layout
of the code involved in the decoding of the unwinding. Details on the involved
data structures are available in the associated support header. The bulk of it
is related to printing out the byte-code to help validate generation of WoA EH.
No functional change.
llvm-svn: 210397
Patch by Gabriel Radanne.
While this commit technically breaks API, no code should have supplied
the integer IDs directly, and thus no code should break.
llvm-svn: 210395
The messages were
"PR19753: Optimize comparisons with "ashr exact" of a constanst."
"Added support to optimize comparisons with "lshr exact" of a constant."
They were not correctly handling signed/unsigned operation differences,
causing pr19958.
llvm-svn: 210393
Now the scheduler updates a node's ready time as soon as it is
scheduled, before releasing dependent nodes. There was a reason I
didn't do this initially but it no longer applies.
A53 is in-order and was running into an issue where nodes where added
to the readyQ too early. That's now fixed.
This also makes it easier for custom scheduling strategies to build
heuristics based on the actual cycles that the node was scheduled at.
The only impact on OOO (sandybridge/cyclone) is that ready times will
be slightly more accurate. I didn't measure any significant regressions.
llvm-svn: 210390
Add an isWindowsItaniumEnvironment function to Triple to mirror the other
Windows environments. This is simply a utility function to check if we are
targeting windows-itanium rather than windows-msvc.
llvm-svn: 210383
This ensures that member functions, for example, are entered into
pubnames with their fully qualified name, rather than inside the global
namespace.
llvm-svn: 210379
addrspacecast X addrspace(M)* to Y addrspace(N)*
-->
bitcast X addrspace(M)* to Y addrspace(M)*
addrspacecast Y addrspace(M)* to Y addrspace(N)*
Updat all affected tests and add several new tests in addrspacecast.ll.
This patch is based on http://reviews.llvm.org/D2186 (authored by Matt
Arsenault) with fixes and more tests.
llvm-svn: 210375
Prevent the early elimination of sections in the object writer. There may be
references to the section itself by other symbols, which may potentially not be
possible to resolve. ML (Visual Studio's Macro Assembler) also seems to retain
empty sections.
The elimination of symbols and sections which are unused should really occur at
the link phase. This will not cause any change in the resulting binary, simply
in the generated object files.
The adjustments to the other unit tests account for the fluctuating section
index caused by the appearance of sections which were previously discarded.
llvm-svn: 210373
* Section association cannot use just the section name as many
sections can have the same name. With this patch, the comdat symbol in
an assoc section is interpreted to mean a symbol in the associated
section and the mapping is discovered from it.
* Comdat symbols were not being set correctly. Instead we were getting
whatever was output first for that section.
A consequence is that associative sections now must use .section to
set the association. Using .linkonce would not work since it is not
possible to change a sections comdat symbol (it is used to decide if
we should create a new section or reuse an existing one).
This includes r210298, which was reverted because it was asserting
on an associated section having the same comdat as the associated
section.
llvm-svn: 210367
These checks were accidentally skipping the 0x prefix in the hex
offsets, then cunningly ignoring the prefix in the use of those captured
values.
Except in the case of the unit length, where the match was only matching
the leading '0' before the x in the 0x prefix, then matching that
against the length. We can't actually express the length association
here, as the length field in the Compile Unit header does not include
the length field itself, but the length field in the pubnames section
/does/ include the size of the length field in the Compile Unit header -
so the two numbers are actually 4 bytes different. Just skip matching
that.
llvm-svn: 210364
This was added to test that DW_AT_GNU_pubnames used sec_offset in DWARF4
and data4 in DWARF3 and below. Since then we've updated
DW_AT_GNU_pubnames to be a flag, rather than a section offset anyway.
Granted this still differs between DWARF 3 and DWARF 4
(FORM_flag_present versun FORM_flag) but it doesn't seem worthwhile
testing that codepath again here. It's covered adequately in many other
test cases.
And while I'm here, don't hardcode the byte size of the compile unit -
it's not relevant to this test and just makes it brittle if/when
anything changes in the way this CU is emitted.
llvm-svn: 210362
Summary:
We were being too strict and not accounting for undefs.
Added a test case and fixed another one where we improved codegen.
Reviewers: grosbach, nadav, delena
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4039
llvm-svn: 210361
This patch fixes a couple of lowering issues for little endian
PowerPC. The code for lowering BUILD_VECTOR contains a number of
optimizations that are only valid for big endian. For now, we disable
those optimizations for correctness. In the future, we will add
analogous optimizations that are correct for little endian.
When lowering a SHUFFLE_VECTOR to a VPERM operation, we again need to
make the now-familiar transformation of swapping the input operands
and complementing the permute control vector. Correctness of this
transformation is tested by the accompanying test case.
llvm-svn: 210336
r210177 added lld Makefiles, r210245 added automatic build when the source is present.
This revision completes the set by adding the lld test and unittests to the check-all target.
llvm-svn: 210318
The option check was being performed after config.h/llvm-config.h substitution,
generating incorrect macro definitions.
Fixes PR19614.
llvm-svn: 210311
If we have common uses on separate paths in the tree; process the one with greater common depth first.
This makes sure that we do not assume we need to extract a load when it is actually going to be part of a vectorized tree.
Review: http://reviews.llvm.org/D3800
llvm-svn: 210310
clang's own CMake setup handles this as of r210308.
The CMAKE_CROSSCOMPILING special-case will no longer be hard-coded. This was
clearly site-specific to someone's local configuration and should be passed in
at configure time if needed with e.g. -DLIBXML2_LIBRARIES=... (the libxml2
target I tried here doesn't even support liblzma so it's *way* off).
llvm-svn: 210309
Alias with unnamed_addr were in a strange state. It is stored in GlobalValue,
the language reference talks about "unnamed_addr aliases" but the verifier
was rejecting them.
It seems natural to allow unnamed_addr in aliases:
* It is a property of how it is accessed, not of the data itself.
* It is perfectly possible to write code that depends on the address
of an alias.
This patch then makes unname_addr legal for aliases. One side effect is that
the syntax changes for a corner case: In globals, unnamed_addr is now printed
before the address space.
llvm-svn: 210302
We extended the .section syntax to allow multiple sections with the
same name but different comdats, but currently we don't make sure that
the output section has that comdat symbol.
That happens to work with the code llc produces currently because it looks like
.section secName, "dr", one_only, "COMDATSym"
.globl COMDATSym
COMDATSym:
....
but that is not very friendly to anyone coding in assembly or even to
llc once we get comdat support in the IR.
This patch changes the coff object writer to make sure the comdat symbol is
output just after the section symbol, as required by the coff spec.
llvm-svn: 210298
Chandler correctly pointed out that I need an LLVM IR test for
r210282, which modified the vperm -> shuffle transform for little
endian PowerPC. This patch provides that test.
llvm-svn: 210297
Most issues are on mishandling s/zext.
Fixes:
1. When rebuilding new indices, s/zext should be distributed to
sub-expressions. e.g., sext(a +nsw (b +nsw 5)) = sext(a) + sext(b) + 5 but not
sext(a + b) + 5. This also affects the logic of recursively looking for a
constant offset, we need to include s/zext into the context of the searching.
2. Function find should return the bitwidth of the constant offset instead of
always sign-extending it to i64.
3. Stop shortcutting zext'ed GEP indices. LLVM conceptually sign-extends GEP
indices to pointer-size before computing the address. Therefore, gep base,
zext(a + b) != gep base, a + b
Improvements:
1. Add an optimization for splitting sext(a + b): if a + b is proven
non-negative (e.g., used as an index of an inbound GEP) and one of a, b is
non-negative, sext(a + b) = sext(a) + sext(b)
2. Function Distributable checks whether both sext and zext can be distributed
to operands of a binary operator. This helps us split zext(sext(a + b)) to
zext(sext(a) + zext(sext(b)) when a + b does not signed or unsigned overflow.
Refactoring:
Merge some common logic of handling add/sub/or in find.
Testing:
Add many tests in split-gep.ll and split-gep-and-gvn.ll to verify the changes
we made.
llvm-svn: 210291
This is a first step in seeing if it is possible to make llvm-nm produce
the same output as darwin's nm(1). Darwin's default format is bsd but its
-m output prints the longer Mach-O specific details. For now I added the
"-format darwin" to do this (whos name may need to change in the future).
As there are other Mach-O specific flags to nm(1) which I'm hoping to add some
how in the future. But I wanted to see if I could get the correct output for
-m flag using llvm-nm and the libObject interfaces.
I got this working but would love to hear what others think about this approach
to getting object/format specific details printed with llvm-nm.
llvm-svn: 210285
As discussed in cfe commit r210279, the correct little-endian
semantics for the vec_perm Altivec interfaces are implemented by
reversing the order of the input vectors and complementing the permute
control vector. This converts the desired permute from little endian
element order into the big endian element order that the underlying
PowerPC vperm instruction uses. This is represented with a
ppc_altivec_vperm intrinsic function.
The instruction combining pass contains code to convert a
ppc_altivec_vperm intrinsic into a vector shuffle operation when the
intrinsic has a permute control vector (mask) that is a constant.
However, the vector shuffle operation assumes that vector elements are
in natural order for their endianness, so for little endian code we
will get the wrong result with the existing transformation.
This patch reverses the semantic change to vec_perm that was performed
in altivec.h by once again swapping the input operands and
complementing the permute control vector, returning the element
ordering to little endian.
The correctness of this code is tested by the new perm.c test added in
a previous patch, and by other tests in the test suite that fail
without this patch.
llvm-svn: 210282
It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables.
This also adds backend support for generating the jump-instruction tables on ARM and X86.
Note that since the jumptable attribute creates a second function pointer for a
function, any function marked with jumptable must also be marked with unnamed_addr.
llvm-svn: 210280
This is a preliminary patch for the PowerPC64LE support. In stage 1
of the vector support, we will support the VMX (Altivec) instruction
set, but will not yet support the VSX instructions. This is merely a
staging issue to provide functional vector support as soon as
possible.
llvm-svn: 210271
When not optimizing, do not run the IfConverter pass, this makes
debugging more difficult (and causes a testsuite failure in
DebugInfo/unconditional-branch.ll).
llvm-svn: 210263
* Move the instruction that changes sp outside of the branch delay slot.
* Bundle-align the target of indirect branch.
Differential Revision: http://llvm-reviews.chandlerc.com/D3928
llvm-svn: 210262
Unused arguments were not being added to the argument list, but instead
treated as arbitrary scope variables. This meant they weren't carefully
added in the original argument order.
In this particular example, though, it turns out the argument is only
/mostly/ unused (well, actually it's entirely used, but in a specific
way). It's a struct that, due to ABI reasons, is decomposed into chunks
(exactly one chunk, since it has one member) and then passed. Since only
one of those chunks is used (SROA, etc, kill the original reconstitution
code) we don't have a location to describe the whole variable.
In this particular case, since the struct consists of just the one int,
once we have partial location information, this should have a location
that describes the entire variable (since the piece is the entirety of
the object).
And at some point we'll need to describe the location of even /entirely/
unused arguments so that they can at least be printed on function entry.
llvm-svn: 210231
Abstract variables within abstract scopes that are entirely optimized
away in their first inlining are omitted because their scope is not
present so the variable is never created. Instead, we should ensure the
scope is created so the variable can be added, even if it's been
optimized away in its first inlining.
This fixes the incorrect debug info in missing-abstract-variable.ll
(added in r210143) and passes an asserts self-hosting build, so
hopefully there's not more of these issues left behind... *fingers
crossed*.
llvm-svn: 210221
We would previously assert here when trying to figure out the section
for the global.
This makes us handle the situation more gracefully since the IR isn't
malformed.
Differential Revision: http://reviews.llvm.org/D4022
llvm-svn: 210215
When JITting a large project such as Boost it's quite hard to figure out the problematic inline asm without debug location. This patch provides debug location printout before the JIT aborts due to inline asm. printDebugLoc() was exposed from MachineInstr.cpp and reused here.
If the JIT run with debug info, don't bomb on DBG_VALUE but ignore them.
http://reviews.llvm.org/D3416
llvm-svn: 210201
Add support to llvm-readobj to decode Windows ARM Exception Handling data. This
uses the previously added datastructures to decode the information into a format
that can be used by tests. This is a necessary step to add support for emitting
Windows on ARM exception handling information.
A fair amount of formatting inspiration is drawn from the Win64 EH printer as
well as the ARM EHABI printer. This allows for a reasonably thorough look into
the encoded data.
llvm-svn: 210192
This is purely a documentation/whitespace cleanup for the format support
functions.
The current style does not duplicate the function/class names in the
documentation; conform to this style.
Additionally, there was a large amount of duplication of comments that added no
real value. Use block comments for the related sets of functions which are used
for type deduction and parameter container classes.
No functional change.
llvm-svn: 210190
Replicate the fact that ARM::WinEH::RuntimeFunction purposefully does not merge
functions to accommodate raw data access use cases in tools such as readobj.
Pointed out by Renato during post-commit review.
No functional change.
llvm-svn: 210189
This patch implements two things:
1. If we know one number is positive and another is negative, we return true as
signed addition of two opposite signed numbers will never overflow.
2. Implemented TODO : If one of the operands only has one non-zero bit, and if
the other operand has a known-zero bit in a more significant place than it
(not including the sign bit) the ripple may go up to and fill the zero, but
won't change the sign. e.x - (x & ~4) + 1
We make sure that we are ignoring 0 at MSB.
Patch by Suyog Sarda.
llvm-svn: 210186
As requested by AArch64 subtargets.
Note that this will have no effect until the
AArch64 target actually enables the pass like this:
substitutePass(&PostRASchedulerID, &PostMachineSchedulerID);
As soon as armv7 switches over, PostMachineScheduler will become the
default postRA scheduler, so this won't be necessary any more.
Targets using the old postRA schedule would then do:
substitutePass(&PostMachineSchedulerID, &PostRASchedulerID);
llvm-svn: 210167
These were not exposed previously because I didn't want out-of-tree
targets to be too dependent on their internals. They can be reused for
a very wide variety of processors with casual scheduling needs without
exposing the classes by instead using hooks defined in
MachineSchedPolicy (we can add more if needed). When targets are more
aggressively tuned or want to provide custom heuristics, they can
define their own MachineSchedStrategy. I tend to think this is better
once you start customizing heuristics because you can copy over only
what you need. I don't think that layering heuristics generally works
well.
However, Arch64 targets now want to reuse the Generic scheduling logic
but also provide extensions. I don't see much harm in exposing the
Generic scheduling classes with a major caveat: these scheduling
strategies may change in the future without validating performance on
less mainstream processors. If you want to be immune from changes,
just define your own MachineSchedStrategy.
llvm-svn: 210166
Late last year r191835 removed a largely unmaintained legacy PGO
infrastructure, but some of the docs were missed. Since these docs are
for things that don't actually exist anymore, they should be removed.
llvm-svn: 210165
Avoid changing behaviour for everyone who's used to the traditional ghostview
UI, especially since it knows how to stay in the foreground unlike xdg-open.
Amendment to r210147.
llvm-svn: 210148
Also correct the llvm-config.h header guard so it doesn't depend on 'CONFIG_H'
which is commonly defined in external projects and caused trouble for
embedders.
In future llvm/Config/llvm-config.h will be installed, but not
the private llvm/Config/config.h header.
llvm-svn: 210144
Along with a test case to demonstrate that due to inlining order there
are cases where abstract variable DIEs are not constructed since the
abstract subprogram was built due to a previous inlining that optimized
away those variables. This produces incorrect debug info (the 'missing'
abstract variable causes the inlined instance of that variable to be
emitted with a full description (name, line, file) rather than
referencing the abstract origin), but this commit at least ensures that
it doesn't crash...
llvm-svn: 210143
This gets us closer to being able to remove LiveVariables entirely which is where dead instructions are currently tagged as such.
Reviewed by Jakob Olesen
llvm-svn: 210132
The tests check that the following transform happens:
(ldr|str) X, [x20]
...
sub x20, x20, #16
->
(ldr|str) X, [x20], #-16
with X being either w0, x0, s0, d0 or q0.
llvm-svn: 210113
This means the output of LowerFormalArguments returns a lowered
SDValue with the correct type (expected in SelectionDAGBuilder).
Without this, an assertion under a DEBUG macro triggers when those
types are passed on the stack.
llvm-svn: 210102
This patch changes GlobalAlias to point to an arbitrary ConstantExpr and it is
up to MC (or the system assembler) to decide if that expression is valid or not.
This reduces our ability to diagnose invalid uses and how early we can spot
them, but it also lets us do things like
@test5 = alias inttoptr(i32 sub (i32 ptrtoint (i32* @test2 to i32),
i32 ptrtoint (i32* @bar to i32)) to i32*)
An important implication of this patch is that the notion of aliased global
doesn't exist any more. The alias has to encode the information needed to
access it in its metadata (linkage, visibility, type, etc).
Another consequence to notice is that getSection has to return a "const char *".
It could return a NullTerminatedStringRef if there was such a thing, but when
that was proposed the decision was to just uses "const char*" for that.
llvm-svn: 210062
The code was actually correct. Sorry for the confusion. I have expanded the
comment saying why the analysis is valid to avoid me misunderstaning it
again in the future.
llvm-svn: 210052
Instrumentation passes now use attributes
address_safety/thread_safety/memory_safety which are added by Clang frontend.
Clang parses the blacklist file and adds the attributes accordingly.
Currently blacklist is still used in ASan module pass to disable instrumentation
for certain global variables. We should fix this as well by collecting the
set of globals we're going to instrument in Clang and passing it to ASan
in metadata (as we already do for dynamically-initialized globals and init-order
checking).
This change also removes -tsan-blacklist and -msan-blacklist LLVM commandline
flags in favor of -fsanitize-blacklist= Clang flag.
llvm-svn: 210038
When lowering a ISD::BRCOND into a test+branch, make sure that we
always use the correct condition code to emit the test operation.
This fixes PR19858: "i8 checked mul is wrong on x86".
Patch by Keno Fisher!
llvm-svn: 210032
if ((x & C) == 0) x |= C becomes x |= C
if ((x & C) != 0) x ^= C becomes x &= ~C
if ((x & C) == 0) x ^= C becomes x |= C
if ((x & C) != 0) x &= ~C becomes x &= ~C
if ((x & C) == 0) x &= ~C becomes nothing
Differential Revision: http://reviews.llvm.org/D3777
llvm-svn: 210006
Replace the crufty build-time configure checks for program paths with
equivalent runtime logic.
This lets users install graphing tools as needed without having to reconfigure
and rebuild LLVM, while eliminating a long chain of inappropriate compile
dependencies that included GUI programs and the windowing system.
Additional features:
* Support the OS X 'open' command to view graphs generated by any of the
Graphviz utilities. This is an alternative to the Graphviz OS X UI which is
no longer available on Mountain Lion.
* Produce informative log output upon failure to indicate which programs can
be installed to view graphs.
Ping me if this doesn't work for your particular environment.
llvm-svn: 210001
Since we cannot yet use variadic templates, add a specialisation for
6-parameters to format. This is motivated by a need for the additional
parameter for formatting information for an unwind decoder for Windows on ARM.
llvm-svn: 209999
Introduce the support structures necessary to deal with the Windows ARM EH data.
These definitions are extremely aggressive about assertions to aid future use
for generation of the entries and subsequent decoding.
The names for the various fields are meant to reflect the names used by the
Visual Studio toolchain to aid communication.
Due to the complexity in reading a few of the values, there are a couple of
additional utility functions to decode the information.
In general, there are two ways to encode the unwinding information:
- packed, which places the data inline into the
_IMAGE_ARM_RUNTIME_FUNCTION_ENTRY structure.
- unpacked, which places the data into auxiliary structures placed into the
.xdata section.
The set of structures allow reading of data in either encoding, with the minor
caveat that epilogue scopes need to be decoded manually by constructing the
structure from the data returned by the RuntimeFunction structure.
These definitions are meant for read-only access at the current point as the
first use of them will be to decode the exception information.
llvm-svn: 209998
This was previously committed in r209680 and reverted in r209683 after
it caused sanitizer builds to crash.
The issue seems to be that the DebugLoc associated with dbg.value IR
intrinsics isn't necessarily accurate. Instead, we duplicate the
DIVariables and add an InlinedAt field to them to record their
location.
We were using this InlinedAt field to compute the LexicalScope for the
variable, but not using it in the abstract DbgVariable construction and
mapping. This resulted in a formal parameter to the current concrete
function, correctly having no InlinedAt information, but incorrectly
having a DebugLoc that described an inlined location within the
function... thus an abstract DbgVariable was created for the variable,
but its DIE was never constructed (since the LexicalScope had no such
variable). This DbgVariable was silently ignored (by testing for a
non-null DIE on the abstract DbgVariable).
So, fix this by using the right scoping information when constructing
abstract DbgVariables.
In the long run, I suspect we want to undo the work that added this
second kind of location tracking and fix the places where the DebugLoc
propagation on the dbg.value intrinsic fails. This will shrink debug
info (by not duplicating DIVariables), make it more efficient (by not
having to construct new DIVariable metadata nodes to try to map back to
a single variable), and benefit all instructions.
But perhaps there are insurmountable issues with DebugLoc quality that
I'm unaware of... I just don't know how we can't /just keep the DebugLoc
from the dbg.declare to the dbg.values and never get this wrong/.
Some history context:
http://llvm.org/viewvc/llvm-project?view=revision&revision=135629http://llvm.org/viewvc/llvm-project?view=revision&revision=137253
llvm-svn: 209984
DAG cycle detection is only enabled with ENABLE_EXPENSIVE_CHECKS. However we
can run it just before we would crash in order to provide more informative
diagnostics.
Now in addition to the "Overran sorted position" message we also get the Node
printed if a cycle was detected.
Tested by building several configs: Debug+Assert, Debug+Assert+Check (this is
ENABLE_EXPENSIVE_CHECKS), Release+Assert and Release. Also tried that the
AssignTopologicalOrder assert produces the expected results.
llvm-svn: 209977
Pass the DAG down to checkForCycles from all callers where we have it. This
allows target-specific nodes to be printed properly.
Also print some missing newlines.
llvm-svn: 209976
Handle "X + ~X" -> "-1" in the function Value *Reassociate::OptimizeAdd(Instruction *I, SmallVectorImpl<ValueEntry> &Ops);
This patch implements:
TODO: We could handle "X + ~X" -> "-1" if we wanted, since "-X = ~X+1".
Patch by Rahul Jain!
Differential Revision: http://reviews.llvm.org/D3835
llvm-svn: 209973
Input YAML file might contain multiple object file definitions.
New option `-docnum` allows to specify an ordinal number (starting from 1)
of definition used for an object file generation.
Patch reviewed by Sean Silva.
llvm-svn: 209967
Following the lead set by r209324, I'm making these tests match the whole
instruction, so we can be sure we're lowering them correctly.
llvm-svn: 209947
This patch teaches the backend how to simplify/canonicalize dag node
sequences normally introduced by the backend when promoting certain dag nodes
with illegal vector type.
This patch adds two new combine rules:
1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
(shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)
2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) ->
(shuffle (BINOP A, B), Undef, <Mask>).
Both rules are only triggered on the type-legalized DAG.
In particular, rule 1. is a target specific combine rule that attempts
to sink a bitconvert into the operands of a binary operation.
Rule 2. is a target independet rule that attempts to move a shuffle
immediately after a binary operation.
llvm-svn: 209930
Summary:
If both vector args to vselect are concat_vectors and the condition is
constant and picks half a vector from each argument, convert the vselect
into a concat_vectors.
Added a test.
The ConvertSelectToConcatVector is assuming it doesn't get vselects with
arguments of, for example, <undef, undef, true, true>. Those get taken
care of in the checks above its call.
Reviewers: nadav, delena, grosbach, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3916
llvm-svn: 209929
Summary:
Separate the check for blend shuffle_vector masks into isBlendMask.
This function will also be used to check if a vector shuffle is legal. No
change in functionality was intended, but we ended up improving codegen on
two tests, which were being (more) optimized only if the resulting shuffle
was legal.
Reviewers: nadav, delena, andreadb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3964
llvm-svn: 209923
For MIPS, we have to encode the personality routine with
an indirect pointer to absptr; otherwise, some link warning
warning will be raised, and the program might crash in some
early MIPS Android device.
llvm-svn: 209907
Unordered is strictly weaker than monotonic, so if the latter doesn't have any
barriers then the former certainly shouldn't.
rdar://problem/16548260
llvm-svn: 209901
Darwin prologues save their GPRs in two stages: a narrow push of r0-r7 & lr,
followed by a wide push of the remaining registers if there are any. AAPCS uses
a single push.w instruction.
It turns out that, on average, enough registers get pushed that code is smaller
in the AAPCS prologue, which is a nice property for M-class programmers. They
also have other options available for back-traces, so can hopefully deal with
the fact that FP & LR aren't adjacent in memory.
rdar://problem/15909583
llvm-svn: 209895
The C and C++ semantics for compare_exchange require it to return a bool
indicating success. This gets mapped to LLVM IR which follows each cmpxchg with
an icmp of the value loaded against the desired value.
When lowered to ldxr/stxr loops, this extra comparison is redundant: its
results are implicit in the control-flow of the function.
This commit makes two changes: it replaces that icmp with appropriate PHI
nodes, and then makes sure earlyCSE is called after expansion to actually make
use of the opportunities revealed.
I've also added -{arm,aarch64}-enable-atomic-tidy options, so that
existing fragile tests aren't perturbed too much by the change. Many
of them either rely on undef/unreachable too pervasively to be
restored to something well-defined (particularly while making sure
they test the same obscure assert from many years ago), or depend on a
particular CFG shape, which is disrupted by SimplifyCFG.
rdar://problem/16227836
llvm-svn: 209883
This patch adds support to vectorize intrinsics such as powi, cttz and ctlz in Vectorizer. These intrinsics are different from other
intrinsics as second argument to these function must be same in order to vectorize them and it should be represented as a scalar.
Review: http://reviews.llvm.org/D3851#inline-32769 and http://reviews.llvm.org/D3937#inline-32857
llvm-svn: 209873
The corresponding CFE patch replaces these intrinsics with vector initializers
in avxintrin.h. This patch removes the LLVM intrinsics from the backend.
We now stop lowering at X86ISD::VBROADCAST custom node rather than lowering
that further to the intrinsics.
The patch only changes VBROADCASTS* and leaves VBROADCAST[FI]128 to continue
to use intrinsics. As explained in the CFE patch, the reason is that we
currently don't generate as good code for them without the intrinsics.
CodeGen/X86/avx-vbroadcast.ll already provides coverage for this change. It
checks that for a series of insertelements we generate the appropriate
vbroadcast instruction.
Also verified that there was no assembly change in the test-suite before and
after this patch.
llvm-svn: 209864
They are replaced with the same IR that is generated for the
vector-initializers in avxintrin.h.
The test verifies that we get back the original instruction. I haven't seen
this approach to be used in other auto-upgrade tests (i.e. llc + FileCheck)
but I think it's the most direct way to test this case. I believe this should
work because llc upgrades calls during parsing. (Other tests mostly check
that assembling and disassembling yields the upgraded IR.)
llvm-svn: 209863
original fix would actually trigger the *exact* same crasher as the
original bug for a different reason. Awesomesauce.
Working on test cases now, but wanted to get bots healthier.
llvm-svn: 209860
across PHI nodes. The code was computing the Idxs from the 'GEP'
variable's indices when what it wanted was Op1's indices. This caused an
ASan heap-overflow for me that pin pointed the issue when Op1 had more
indices than GEP did. =] I'll let Louis add a specific test case for
this if he wants.
llvm-svn: 209857
The loop vectorizer instantiates be-taken-count + 1 as the loop iteration count.
If this expression overflows the generated code was invalid.
In case of overflow the code now jumps to the scalar loop.
Fixes PR17288.
llvm-svn: 209854
These tests ensure that a change I will propose in clang works as
expected.
Summary:
Added tests for the generation of blend+immediate instructions from a
shufflevector.
These tests were proposed along with a patch that was dropped. I'm
committing the tests anyway to protect against possible regressions in
codegen.
Reviewers: nadav, bkramer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3600
llvm-svn: 209853
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:
GEP1--\
|-->PHI1-->GEP3
GEP2--/
This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):
GEP1--\ --\ --\
|-->PHI1-->GEP3 ==> |-->PHI2->GEP12->GEP3 == > |-->PHI2->GEP123
GEP2--/ --/ --/
This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.
Tests included.
llvm-svn: 209843
without this case we would end on an infinite recursion: the remainder is zero,
so Numerator - Remainder is equal to Numerator and so we would recursively ask
for the division of Numerator by Denominator.
llvm-svn: 209838
when ScalarEvolution::getElementSize returns nullptr it is safe to early return
in ScalarEvolution::findArrayDimensions such that we avoid later problems when
we try to divide the terms by ElementSize.
llvm-svn: 209837
This makes it slightly harder to misuse Twines. It is still possible to
refer to destroyed temporaries with the regular constructors, though.
Patch by Marco Alesiani!
llvm-svn: 209832
This seems to match what gcc does for ppc and what every other llvm
backend does.
This is a fixed version of r209638. The difference is to avoid any change
in behavior for functions. The logic for using constant pools for function
addresseses is spread over a few places and we have to keep them in sync.
llvm-svn: 209821
field represents ELF section header sh_info field and does not have any
sense for regular sections. Its interpretation depends on section type.
llvm-svn: 209801
During loop-unroll, loop exits from the current loop may end up in in different
outer loop. This requires to re-form LCSSA recursively for one level down from
the outer most loop where loop exits are landed during unroll. This fixes PR18861.
Differential Revision: http://reviews.llvm.org/D2976
llvm-svn: 209796
Clang knows about the sanitizer blacklist and it makes no sense to
add global to the list of llvm.asan.dynamically_initialized_globals if it
will be blacklisted in the instrumentation pass anyway. Instead, we should
do as much blacklisting as possible (if not all) in the frontend.
llvm-svn: 209790
An address only use of an extract element of a load can be simplified to a
load. Without this the result of the extract element is spilled to the
stack so that an address is available.
llvm-svn: 209788
Don't assume that dynamically initialized globals are all initialized from
_GLOBAL__<module_name>I_ function. Instead, scan the llvm.global_ctors and
insert poison/unpoison calls to each function there.
Patch by Nico Weber!
llvm-svn: 209780
Add a function to combine two 32-bit integers into a 64-bit integer.
There are no calls to this function yet, although a subsequent change
will add some in LLDB.
Reviewers: rnk
Differential Revision: http://reviews.llvm.org/D3941
llvm-svn: 209777
No test because no in-tree targets change the bitwidth of the
setcc type depending on the bitwidth of the compared type.
Patch by Ke Bai
llvm-svn: 209771
Previously, DataTypes.h would #define a variety of symbols any time
they weren't already defined. However, some versions of Visual
Studio do provide the appropriate headers, so if those headers are
included after DataTypes.h, it can lead to macro redefinition
warnings.
The fix is to include the appropriate headers if they exist, and
only #define the symbols if the required header does not exist.
Patch by Zachary Turner!
---
The big change here is that we no longer have our own stdint.h
typedefs because now all supported toolchains have stdint.h.
Hooray!
llvm-svn: 209760
This matches gcc's behavior. It also seems natural given that aliases
contain other properties that govern how it is accessed (linkage,
visibility, dll storage).
Clang still has to be updated to expose this feature to C.
llvm-svn: 209759
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:
GEP1--\
|-->PHI1-->GEP3
GEP2--/
This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):
GEP1--\ --\ --\
|-->PHI1-->GEP3 ==> |-->PHI2->GEP12->GEP3 == > |-->PHI2->GEP123
GEP2--/ --/ --/
This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.
Tests included.
llvm-svn: 209755
This reverts r208640 (I've just XFAILed the test) because it broke ppc64/Linux
self-hosting. Because nearly every regression test triggers a segfault, I hope
this will be easy to fix.
llvm-svn: 209747
This patch implements two things:
1. If we know one number is positive and another is negative, we return true as
signed addition of two opposite signed numbers will never overflow.
2. Implemented TODO : If one of the operands only has one non-zero bit, and if
the other operand has a known-zero bit in a more significant place than it
(not including the sign bit) the ripple may go up to and fill the zero, but
won't change the sign. e.x - (x & ~4) + 1
We make sure that we are ignoring 0 at MSB.
Patch by Suyog Sarda.
llvm-svn: 209746
This reverts commit r209638 because it broke self-hosting on ppc64/Linux. (the
Clang-compiled TableGen would segfault because it jumped to an invalid address
from within _ZNK4llvm17ManagedStaticBase21RegisterManagedStaticEPFPvvEPFvS1_E
(which is within the command-line parameter registration process)).
llvm-svn: 209745
Add regression tests for the following transformation:
str X, [x20]
...
add x20, x20, #32
->
str X, [x20], #32
with X being either w0, x0, s0, d0 or q0.
llvm-svn: 209715
Add regression tests for the following transformation:
ldr X, [x20]
...
add x20, x20, #32
->
ldr X, [x20], #32
with X being either w0, x0, s0, d0 or q0.
llvm-svn: 209711
Use more straightforward way to represent the set of instruction
ranges where the location of a user variable is defined - vector of pairs
of instructions (defining start/end of each range),
instead of a flattened vector of instructions where some instructions
are supposed to start the range, and the rest are supposed to "clobber" it.
Simplify the code which generates actual .debug_loc entries.
No functionality change.
llvm-svn: 209698
This is a corner case I have stumbled upon when dealing with ARM64 type
conversions. I was not able to extract a testcase for the community codebase to
fail on. The patch conservatively discards a division that would have ended up
in an ICE due to a type mismatch when building a multiply expression. I have
also added code to a place that builds add expressions and in which we should be
careful not to pass in operands of different types.
llvm-svn: 209694
We do not need to compute the GCD anymore after we removed the constant
coefficients from the terms: the terms are now all parametric expressions and
there is no need to recognize constant terms that divide only a subset of the
terms. We only rely on the size of the terms, i.e., the number of operands in
the multiply expressions, to sort the terms and recognize the parametric
dimensions.
llvm-svn: 209693
No functional change is intended: instead of relying on the delinearization to
come up with the base pointer as a remainder of the divisions in the
delinearization, we just compute it from the array access and use that value.
We substract the base pointer from the SCEV to be delinearized and that
simplifies the work of the delinearizer.
llvm-svn: 209692
The delinearization is needed only to remove the non linearity induced by
expressions involving multiplications of parameters and induction variables.
There is no problem in dealing with constant times parameters, or constant times
an induction variable.
For this reason, the current patch discards all constant terms and multipliers
before running the delinearization algorithm on the terms. The only thing
remaining in the term expressions are parameters and multiply expressions of
parameters: these simplified term expressions are passed to the array shape
recognizer that will not recognize constant dimensions anymore: these will be
recognized as different strides in parametric subscripts.
The only important special case of a constant dimension is the size of elements.
Instead of relying on the delinearization to infer the size of an element,
compute the element size from the base address type. This is a much more precise
way of computing the element size than before, as we would have mixed together
the size of an element with the strides of the innermost dimension.
llvm-svn: 209691
Current implementation of calculateDbgValueHistory already creates the
keys in the expected order (user variables are listed in order of appearance),
and should do so later by contract.
No functionality change.
llvm-svn: 209690
I'm not sure exactly where/how we end up with an abstract DbgVariable
with a null DIE, but we do... looking into it & will add a test and/or
fix when I figure it out.
Currently shows up in selfhost or compiler-rt builds.
llvm-svn: 209683
%higher and %highest can have non-zero values only for offsets greater
than 2GB, which is highly unlikely, if not impossible when compiling a
single function. This makes long branch for MIPS64 3 instructions smaller.
Differential Revision: http://llvm-reviews.chandlerc.com/D3281.diff
llvm-svn: 209678
After much puppetry, here's the major piece of the work to ensure that
even when a concrete definition preceeds all inline definitions, an
abstract definition is still created and referenced from both concrete
and inline definitions.
Variables are still broken in this case (see comment in
dbg-value-inlined-parameter.ll test case) and will be addressed in
follow up work.
llvm-svn: 209677
A further step to correctly emitting concrete out of line definitions
preceeding inlined instances of the same program.
To do this, emission of subprograms must be delayed until required since
we don't know which (abstract only (if there's no out of line
definition), concrete only (if there are no inlined instances), or both)
DIEs are required at the start of the module.
To reduce the test churn in the following commit that actually fixes the
bug, this commit introduces the lazy DIE construction and cleans up test
cases that are impacted by the changes in the resulting DIE ordering.
llvm-svn: 209675
This is a precursor to fixing inlined debug info where the concrete,
out-of-line definition may preceed any inlined usage. To cope with this,
the attributes that may appear on the concrete definition or the
abstract definition are delayed until the end of the module. Then, if an
abstract definition was created, it is referenced (and no other
attributes are added to the out-of-line definition), otherwise the
attributes are added directly to the out-of-line definition.
In a couple of cases this causes not just reordering of attributes, but
reordering of types. When the creation of the attribute is delayed, if
that creation would create a type (such as for a DW_AT_type attribute)
then other top level DIEs may've been constructed during the delay,
causing the referenced type to be created and added after those
intervening DIEs. In the extreme case, in cross-cu-inlining.ll, this
actually causes the DW_TAG_basic_type for "int" to move from one CU to
another.
llvm-svn: 209674
This is an enhancement to SeparateConstOffsetFromGEP. With this patch, we can
extract a constant offset from "s/zext and/or/xor A, B".
Added a new test @ext_or to verify this enhancement.
Refactoring the code, I also extracted some common logic to function
Distributable.
llvm-svn: 209670
This old test didn't have the argument numbering that's now squirelled
away in the high bits of the line number in the DW_TAG_arg_variable
metadata.
Add the numbering and update the test to ensure arguments are in-order.
llvm-svn: 209669
Detected by Daniel Jasper, Ilia Filippov, and Andrea Di Biagio
Fixed the argument order to select (the mask semantics to blendv* are the
inverse of select) and fixed the tests
Added parenthesis to the assert condition
Ran clang-format
llvm-svn: 209667
In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there
is an optimization for certain patterns to generate one or two vector
splats followed by a vector add or subtract. This operation is
represented by a VADD_SPLAT in the selection DAG. Prior to this
patch, it was possible for the VADD_SPLAT to be assigned the wrong
data type, causing incorrect code generation. This patch corrects the
problem.
Specifically, the code previously assigned the value type of the
BUILD_VECTOR node to the newly generated VADD_SPLAT node. This is
correct much of the time, but not always. The problem is that the
call to isConstantSplat() may return a SplatBitSize that is not the
same as the number of bits in the original element vector type. The
correct type to assign is a vector type with the same element bit size
as SplatBitSize.
The included test case shows an example of this, where the
BUILD_VECTOR node has a type of v16i8. The vector to be built is {0,
16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}. isConstantSplat
detects that we can generate a splat of 16 for type v8i16, which is
the type we must assign to the VADD_SPLAT node. If we do not, we
generate a vspltisb of 8 and a vaddubm, which generates the incorrect
result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16}. The correct code generation is a vspltish of 8 and a vadduhm.
This patch also corrected code generation for
CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked
as an XFAIL, so we can remove the XFAIL from the test case.
llvm-svn: 209662
A test in test/Generic creates a DAG where the NZCV output of an ADCS is used
by multiple nodes. This makes LLVM want to save a copy of NZCV for later, which
it couldn't do before.
This should be the last fix required for the aarch64 buildbot.
llvm-svn: 209651
Cortex-M4 only has single-precision floating point support, so any LLVM
"double" type will have been split into 2 i32s by now. Fortunately, the
consecutive-register framework turns out to be precisely what's needed to
reconstruct the double and follow AAPCS-VFP correctly!
rdar://problem/17012966
llvm-svn: 209650
These are tested by test/CodeGen/Generic, so we should probably know
how to deal with them. Fortunately generic code does it if asked.
llvm-svn: 209646
Summary:
Implemented an InstCombine transformation that takes a blendv* intrinsic
call and translates it into an IR select, if the mask is constant.
This will eventually get lowered into blends with immediates if possible,
or pblendvb (with an option to further optimize if we can transform the
pblendvb into a blend+immediate instruction, depending on the selector).
It will also enable optimizations by the IR passes, which give up on
sight of the intrinsic.
Both the transformation and the lowering of its result to asm got shiny
new tests.
The transformation is a bit convoluted because of blendvp[sd]'s
definition:
Its mask is a floating point value! This forces us to convert it and get
the highest bit. I suppose this happened because the mask has type
__m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin.
I will send an email to llvm-dev to discuss if we want to change this or
not.
Reviewers: grosbach, delena, nadav
Differential Revision: http://reviews.llvm.org/D3859
llvm-svn: 209643
This commit is debatable. There are two possible approaches, neither
of which is really satisfactory:
1. Use "@foo(i1 zeroext)" to mean an extension to 32-bits on Darwin,
and 8 bits otherwise.
2. Redefine "@foo(i1)" to mean that the i1 is extended by the caller
to 8 bits. This goes against the spirit of "zeroext" I think, but
it's a bit of a vague construct anyway (by definition you're going
to extend to the amount required by the ABI, that's why it's the
ABI!).
This implements option 2. The DAG machinery really isn't setup for the
first (there's a fairly strong assumption that "zeroext" goes to at
least the smallest register size), and even if it was the resulting
DAG looks like it would be inferior in many cases.
Theoretically we could add AssertZext nodes in the consumers of
ABI-passed values too now, but this actually seems to make the code
worse in practice by making truncation proceed in two steps. The code
produced is equally valid if we continue to assume only the low bit is
defined.
Should fix PR19850
llvm-svn: 209637
We can eliminate the custom C++ code in favour of some TableGen to
check the same things. Functionality should be identical, except for a
buffer overrun that was present in the C++ code and meant webkit
failed if any small argument needed to be passed on the stack.
llvm-svn: 209636
Add tests for the following transform:
str X, [x0, #32]
...
add x0, x0, #32
->
str X, [x0, #32]!
with X being either w1, x1, s0, d0 or q0.
llvm-svn: 209627
We have a couple of regression tests for load/store pairing, but (to my knowledge) there are no regression tests for the load/store + add/sub folding.
As a first step towards increased test coverage of this area, this commit adds a test for one instance of a load + add to pre-indexed load transformation.
llvm-svn: 209618
and via the command line, mirroring similar functionality in LoopUnroll. In
situations where clients used custom unrolling thresholds, their intent could
previously be foiled by LoopRotate having a hardcoded threshold.
llvm-svn: 209617
This was previously regressed/broken by r192749 (reverted due to this
issue in r192938) and I was about to break it again by accident with
some more invasive changes that deal with the subprogram lists. So to
avoid that and further issues - here's a test.
It's a pretty basic test - in both r192749 and my impending case, this
test would crash, but checking the basics (that we put a subprogram in
just one of the two CUs) seems like a good start.
We still get this wrong in weird ways if the linkonce-odr function
happens to not be identical in the metadata (because it's defined in two
different files (hence the # line directives in this test), etc) even
though it meets the language requirements (identical token stream) for
such a thing. That results in two subprogram DIEs, but only one of them
gets the parameter and high/low pc information, etc. We probably need to
use the DIRef infrastructure to deduplicate functions as we do types to
address this issue - or perhaps teach the BC linker to remove the
duplicate entries in subprogram lists?
llvm-svn: 209614
Remove the use of the std::function and replace the capturing lambda with a
non-capturing one, opting to pass the user data down to the context. This is
needed as std::function is not yet available on all hosted platforms (it
requires RTTI, which breaks on Windows).
Thanks to Nico Rieck for pointing this out!
llvm-svn: 209607
Move the implementation of the Win64 EH printer from the COFFDumper into its own
class. This is in preparation for adding support to print ARM EH information.
The only real change here is in printUnwindInfo where we now lambda lift the
implicit this parameter for the resolveFunction. Also setup the printing to
handle ARM. This now has set the stage to introduce ARM EH printing.
llvm-svn: 209606
Make the use of the cache more transparent to the users. There is no reason
that the cached entries really need to be passed along. The overhead for doing
so is minimal: a single extra parameter. This requires that some standalone
functions be brought into the COFFDumper class so that they may access the
cache.
llvm-svn: 209604
Switch to use references for parameters that are guaranteed to be non-null.
Simplifies the code a slight bit in preparation for another change.
llvm-svn: 209603
Seems my previous fix was insufficient - we were still not adding the
inlined function to the abstract scope list. Which meant it wasn't
flagged as inline, didn't have nested lexical scopes in the abstract
definition, and didn't have abstract variables - so the inlined variable
didn't reference an abstract variable, instead being described
completely inline.
llvm-svn: 209602
We still do temporary files in many cases, just updating this particular
one because I was debugging it and made this change while doing so.
llvm-svn: 209601