Commit Graph

26820 Commits

Author SHA1 Message Date
Rafael Espindola c9b33ff9ba Add support for addmod to mri scripts.
llvm-svn: 220294
2014-10-21 14:46:17 +00:00
Bill Schmidt 5c6cb813b6 [PowerPC] Avoid VSX FMA mutate when killed product reg = addend reg
With VSX enabled, test/CodeGen/PowerPC/recipest.ll exposes a bug in
the FMA mutation pass.  If we have a situation where a killed product
register is the same register as the FMA target, such as:

   %vreg5<def,tied1> = XSNMSUBADP %vreg5<tied0>, %vreg11, %vreg5,
                       %RM<imp-use>; VSFRC:%vreg5 F8RC:%vreg11 

then the substitution makes no sense.  We end up getting a crash when
we try to extend the interval associated with the killed product
register, as there is already a live range for %vreg5 there.  This
patch just disables the mutation under those circumstances.

Since recipest.ll generates different code with VMX enabled, I've
modified that test to use -mattr=-vsx.  I've borrowed the code from
that test that exposed the bug and placed it in fma-mutate.ll, where
it tests several mutation opportunities including the "bad" one.

llvm-svn: 220290
2014-10-21 13:02:37 +00:00
Oliver Stannard cdb8db8d3c [ARM] NEON 32-bit scalar moves are also available in VFPv2
The 32-bit variants of the NEON scalar<->GPR move instructions are
also available in VFPv2. The 8- and 16-bit variants do require NEON.

Note that the checks in the test file are all -DAG because they are
checking a mixture of stdout and stderr, and the ordering is not
guaranteed.

llvm-svn: 220288
2014-10-21 11:49:14 +00:00
Yuri Gorshenin 171eb8dbeb [asan-asm-instrumentation] Fixed memory accesses with rbp as a base or an index register.
Summary: Fixed memory accesses with rbp as a base or an index register.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5819

llvm-svn: 220283
2014-10-21 10:22:27 +00:00
Oliver Stannard 38e6d45a46 [Thumb2] LDRS?[BH] cannot load to the PC
The Thumb2 LDRS?[BH] instructions are not valid when the destination
register is the PC (these encodings are used for preload hints).

llvm-svn: 220278
2014-10-21 09:14:15 +00:00
Chandler Carruth aa72a6dd3b Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.

To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.

These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.

I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.

llvm-svn: 220277
2014-10-21 09:00:40 +00:00
Zoran Jovanovic 592239d498 [mips][microMIPS] Implement ADDU16 and SUBU16 instructions
Differential Revision: http://reviews.llvm.org/D5118

llvm-svn: 220276
2014-10-21 08:44:58 +00:00
Zoran Jovanovic 81ceebc56e [mips][microMIPS] Implement AND16, NOT16, OR16 and XOR16 instructions
Differential Revision: http://reviews.llvm.org/D5117

llvm-svn: 220275
2014-10-21 08:32:40 +00:00
Rafael Espindola c606bfe660 Fix a bit of confusion about .set and produce more readable assembly.
Every target we support has support for assembly that looks like

a = b - c
.long a

What is special about MachO is that the above combination suppresses the
production of a relocation.

With this change we avoid producing the intermediary labels when they don't
add any value.

llvm-svn: 220256
2014-10-21 01:17:30 +00:00
Paul Robinson f60e0a160f Do not attribute static allocas to the call site's DebugLoc.
When functions are inlined, instructions without debug information are
attributed to the call site's DebugLoc. After inlining, inlined static
allocas are moved to the caller's entry block, adjacent to the caller's
original static alloca instructions. By retaining the call site's
DebugLoc, these instructions could cause instructions that were
subsequently inserted at the entry block to pick up the same DebugLoc.

Patch by Wolfgang Pieb!

llvm-svn: 220255
2014-10-21 01:00:55 +00:00
Rafael Espindola f16a66973c Make this test a bit more strict.
llvm-svn: 220253
2014-10-21 00:47:49 +00:00
Chandler Carruth 97192421e1 Teach lit to filter the host LDFLAGS down from the build system and into
the CGO build environment. This lets things like -rpath propagate down
to the C++ code that is built along side the Go bindings when testing
them.

Patch by Peter Collingbourne, and verified that it works by me.

llvm-svn: 220252
2014-10-21 00:36:28 +00:00
Lang Hames 2d0d096bd1 [MCJIT] Temporarily revert r220245 - it broke several bots.
(See e.g. http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/17653)

llvm-svn: 220249
2014-10-21 00:24:02 +00:00
Philip Reames bf9676f7f0 Extend the verifier to validate range metadata on calls and invokes.
Range metadata applies to loads, call, and invokes.  We were validating that metadata applied to loads was correct according to the LangRef, but we were not validating metadata applied to calls or invokes.  This change extracts the checking functionality to a common location, reuses it for all valid locations, and adds a simple test to ensure a misused range on a call gets reported.

llvm-svn: 220246
2014-10-20 23:52:07 +00:00
Lang Hames 84801c217c [MCJIT] Make MCJIT honor symbol visibility settings when populating the global
symbol table.

Patch by Anthony Pesch. Thanks Anthony!

llvm-svn: 220245
2014-10-20 23:39:54 +00:00
Quentin Colombet 06355199f1 [X86] Fix a bug in the lowering of the mask of VSELECT.
X86 code to lower VSELECT messed a bit with the bits set in the mask of VSELECT
when it knows it can be lowered into BLEND. Indeed, only the high bits need to be
set for those and it optimizes those accordingly.
However, when the mask is a compile time constant, the lowering will be handled
by the generic optimizer and those modifications will generate bad code in the
generic optimizer.

This patch fixes that by preventing the optimization if the VSELECT will be
handled by the generic optimizer.

<rdar://problem/18675020>

llvm-svn: 220242
2014-10-20 23:13:30 +00:00
Philip Reames cdb72f369f Introduce a 'nonnull' metadata on Load instructions.
The newly introduced 'nonnull' metadata is analogous to existing 'nonnull' attributes, but applies to load instructions rather than call arguments or returns.  Long term, it would be nice to combine these into a single construct.   The value of the load is allowed to vary between successive loads, but null is not a valid value to be loaded by any load marked nonnull.

Reviewed by: Hal Finkel
Differential Revision:  http://reviews.llvm.org/D5220

llvm-svn: 220240
2014-10-20 22:40:55 +00:00
Simon Pilgrim 2f9548a3ef [X86] Memory folding for commutative instructions (updated)
This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.

Updated version of r219584 (reverted in r219595) - the commutation attempt now explicitly ensures that neither of the commuted source operands are tied to the destination operand / register, which was the source of all the regressions that occurred with the original patch attempt.

Added additional regression test case provided by Joerg Sonnenberger.

Differential Revision: http://reviews.llvm.org/D5818

llvm-svn: 220239
2014-10-20 22:14:22 +00:00
Tim Northover 23075ccee7 ARM: rework Thumb1 frame index rewriting
The previous code had a few problems, motivating the choices here.

1. It could create instructions clobbering CPSR, but the incoming MachineInstr
   didn't reflect this. A potential source of corruption. This is why the patch
   has a new PseudoInst for before lowering.
2. Similarly, there was some code to handle the incoming instruction not being
   ARMCC::AL, but this would have caused massive problems if it was actually
   invoked when a complex offset needing more than one instruction was requested.
3. It wasn't designed to handle unaligned pointers (or offsets). These should
   probably be minimised anyway, but the code needs to deal with them properly
   regardless.
4. It had some rather dubious ad-hoc code to avoid calling
   emitThumbRegPlusImmediate, a function which should be designed to do precisely
   this job.

We seem to cover the common cases correctly now, and hopefully can enhance
emitThumbRegPlusImmediate to handle any extra optimisations we need to add in
future.

llvm-svn: 220236
2014-10-20 21:28:41 +00:00
Gerolf Hoflehner c47baed0b5 [AArch64] test case for compfail fixed by r219748
llvm-svn: 220206
2014-10-20 16:08:33 +00:00
Oliver Stannard 6672f37ed2 [Thumb2] RFE, SRS and "SUBS pc, lr" are undefined on v7M
These instructions are related to the v7[AR] exception model, and are
not defined on v7M.

llvm-svn: 220204
2014-10-20 15:37:35 +00:00
Oliver Stannard e8f63a54b4 [ARM] Do not select SMULW[BT] or SMLAW[BT]
The current instruction selection patterns for SMULW[BT] and SMLAW[BT]
are incorrect. These instructions multiply a 32-bit and a 16-bit value
(both signed) and return the top 32 bits of the 48-bit result. This
preserves the 16 bits of overflow, whereas the patterns they currently
match truncate the result to 16 bits then sign extend.

To select these instructions, we would need to match an ISD::SMUL_LOHI,
a sign extend, two shifts and an or. There is no way to match SMUL_LOHI
in an instruction pattern as it defines multiple values, so this would
have to be done in C++. I have raised
http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct
selection of these instructions.

This fixes http://llvm.org/bugs/show_bug.cgi?id=19396

llvm-svn: 220196
2014-10-20 11:30:35 +00:00
Oliver Stannard fce039240a [Thumb] Fix crash in Thumb1RegisterInfo::rewriteFrameIndex
This function can, for some offsets from the SP, split one instruction
into two. Since it re-uses the original instruction as the first
instruction of the result, we need ensure its result register is not
marked as dead before we use it in the second instruction.

llvm-svn: 220194
2014-10-20 11:00:18 +00:00
Chandler Carruth a32038b006 Fix a miscompile introduced in r220178.
The original code had an implicit assumption that if the test for
allocas or globals was reached, the two pointers were not equal. With my
changes to make the pointer analysis more powerful here, I also had to
guard against circumstances where the results weren't useful. That in
turn violated the assumption and gave rise to a circumstance in which we
could have a store with both the queried pointer and stored pointer
rooted at *the same* alloca. Clearly, we cannot ignore such a store.
There are other things we might do in this code to better handle the
case of both pointers ending up at the same alloca or global, but it
seems best to at least make the test explicit in what it intends to
check.

I've added tests for both the alloca and global case here.

llvm-svn: 220190
2014-10-20 10:03:01 +00:00
Chandler Carruth 6665d62117 Fix a somewhat subtle pair of issues with JumpThreading I introduced in
r220178. First, the creation routine doesn't insert prior to the
terminator of the basic block provided, but really at the end of the
basic block. Instead, get the terminator and insert before that. The
next issue was that we need to ensure multiple PHI node entries for
a single predecessor re-use the same cast instruction rather than
creating new ones.

All of the logic here was without tests previously. I've reduced and
added a test case from the test suite that crashed without both of these
fixes.

llvm-svn: 220186
2014-10-20 05:34:36 +00:00
Chandler Carruth eeec35ae1c Teach the load analysis driving core instcombine logic and other bits of
logic to look through pointer casts, making them trivially stronger in
the face of loads and stores with intervening pointer casts.

I've included a few test cases that demonstrate the kind of folding
instcombine can do without pointer casts and then variations which
obfuscate the logic through bitcasts. Without this patch, the variations
all fail to optimize fully.

This is more important now than it has been in the past as I've started
moving the load canonicialization to more closely follow the value type
requirements rather than the pointer type requirements and thus this
needs to be prepared for more pointer casts. When I made the same change
to stores several test cases regressed without logic along these lines
so I wanted to systematically improve matters first.

llvm-svn: 220178
2014-10-20 00:24:14 +00:00
Chandler Carruth b5f4c32830 Add a datalayout string to this test so that it exercises the full gamut
of InstCombine rather than just the bits enabled when datalayout is
optional.

The primary fixes here are because now things are little endian.

In good news, silliness like this seems like it will be going away as
we've got pretty stong consensus on dropping optional datalayout
entirely.

llvm-svn: 220176
2014-10-20 00:11:31 +00:00
Bill Schmidt 941a3244ff [PowerPC] Clean up -mattr=+vsx tests to always specify -mcpu
We recently discovered an issue that reinforces what a good idea it is
to always specify -mcpu in our code generation tests, particularly for
-mattr=+vsx.  This patch ensures that all tests that specify
-mattr=+vsx also specify -mcpu=pwr7 or -mcpu=pwr8, as appropriate.

Some of the uses of -mattr=+vsx added recently don't make much sense
(when specified for -mtriple=powerpc-apple-darwin8 or -march=ppc32,
for example).  For cases like this I've just removed the extra VSX
test commands; there's enough coverage without them.

llvm-svn: 220173
2014-10-19 21:29:21 +00:00
Bill Schmidt 87982a1e9b [PowerPC] Temporarily disable VSX for PowerPC fast-isel tests
Patch by Bill Seurer; some comment formatting changes by me.

There are a few PowerPC test cases for FastISel support that currently
fail with VSX support enabled.  The temporary workaround under
discussion in http://reviews.llvm.org/D5362 helps, but the tests still
fail because they specify -fast-isel-abort, and the VSX workaround
punts back to SelectionDAG.  We have plans to fix FastISel permanently
for VSX, but until that's in place these tests are preventing us from
enabling VSX by default.  Therefore we are adding -mattr=-vsx to these
tests until the full support is ready.

llvm-svn: 220172
2014-10-19 20:48:47 +00:00
Bill Schmidt fff860979d [PowerPC] Re-enable VSX test line for fma.ll with -mcpu=pwr7
The VSX testing variant in test/CodeGen/PowerPC/fma.ll had to be
disabled because of unexpected behavior on many of the builders.  I
tracked this down to a situation that occurs when the VSX attribute is
enabled for a target that disables the MI early scheduling pass.  This
patch adds -mcpu=pwr7 to make this predictable.  The other issue will
be addressed separately.

llvm-svn: 220171
2014-10-19 20:27:56 +00:00
Chandler Carruth bc6378defb Do a better and more complete job of preserving metadata when combining
loads.

This handles many more cases than just the AA metadata, some of them
suggested by Hal in his review of the AA metadata handling patch. I've
tried to test this behavior where tractable to do so.

I'll point out that I have specifically *not* included a test for
debuginfo because it was going to require 2 or 3 times as much work to
craft some input which would survive the "helpful" stripping of debug
info metadata that doesn't match the desired schema. This is another
good example of why the current state of write-ability for our debug
info metadata is unacceptable. I spent over 30 minutes trying to conjure
some test case that would survive, even copying from other debug info
tests, but it always failed to survive with no explanation of why or how
I might fix it. =[

llvm-svn: 220165
2014-10-19 10:46:46 +00:00
Chandler Carruth 5b8cd2f73c Move previously dead code to handle computing the known bits of an alias
up to where it actually works as intended. The problem is that
a GlobalAlias isa GlobalValue and so the prior block handled all of the
cases.

This allows us to constant fold based on the actual constant expression
in the global alias. As an example, see the last function in the newly
added test case which explicitly aligns an unaligned pointer using
constant expression math. Without this change, we fail to see that and
fold an alignment test to zero.

llvm-svn: 220164
2014-10-19 09:06:56 +00:00
David Majnemer 312c3e5f39 InstCombine: (sub (or A B) (xor A B)) --> (and A B)
The following implements the transformation:
(sub (or A B) (xor A B)) --> (and A B).

Patch by Ankur Garg!

Differential Revision: http://reviews.llvm.org/D5719

llvm-svn: 220163
2014-10-19 08:32:32 +00:00
David Majnemer 59939acd26 InstCombine: Optimize icmp eq/ne (shl Const2, A), Const1
The following implements the optimization for sequences of the form:
icmp eq/ne (shl Const2, A), Const1

Such sequences can be transformed to:
icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2))

This handles only the equality operators for now. Other operators need
to be handled.

Patch by Ankur Garg!

llvm-svn: 220162
2014-10-19 08:23:08 +00:00
Chandler Carruth a801dd5799 Fix a long-standing miscompile in the load analysis that was uncovered
by my refactoring of this code.

The method isSafeToLoadUnconditionally assumes that the load will
proceed with the preferred type alignment. Given that, it has to ensure
that the alloca or global is at least that aligned. It has always done
this historically when a datalayout is present, but has never checked it
when the datalayout is absent. When I refactored the code in r220156,
I exposed this path when datalayout was present and that turned the
latent bug into a patent bug.

This fixes the issue by just removing the special case which allows
folding things without datalayout. This isn't worth the complexity of
trying to tease apart when it is or isn't safe without actually knowing
the preferred alignment.

llvm-svn: 220161
2014-10-19 08:17:50 +00:00
Chandler Carruth be9dccd64d Preserve AA metadata when combining (cast (load (...))) -> (load (cast
(...))).

llvm-svn: 220141
2014-10-18 11:00:12 +00:00
Chandler Carruth 2f75fcfef3 [InstCombine] Do an about-face on how LLVM canonicalizes (cast (load
...)) and (load (cast ...)): canonicalize toward the former.

Historically, we've tried to load using the type of the *pointer*, and
tried to match that type as closely as possible removing as many pointer
casts as we could and trading them for bitcasts of the loaded value.
This is deeply and fundamentally wrong.

Repeat after me: memory does not have a type! This was a hard lesson for
me to learn working on SROA.

There is only one thing that should actually drive the type used for
a pointer, and that is the type which we need to use to load from that
pointer. Matching up pointer types to the loaded value types is very
useful because it minimizes the physical size of the IR required for
no-op casts. Similarly, the only thing that should drive the type used
for a loaded value is *how that value is used*! Again, this minimizes
casts. And in fact, the *only* thing motivating types in any part of
LLVM's IR are the types used by the operations in the IR. We should
match them as closely as possible.

I've ended up removing some tests here as they were testing bugs or
behavior that is no longer present. Mostly though, this is just cleanup
to let the tests continue to function as intended.

The only fallout I've found so far from this change was SROA and I have
fixed it to not be impeded by the different type of load. If you find
more places where this change causes optimizations not to fire, those
too are likely bugs where we are assuming that the type of pointers is
"significant" for optimization purposes.

llvm-svn: 220138
2014-10-18 06:36:22 +00:00
Chandler Carruth 71009cad95 Remove a test that was ported from the old llvm-gcc frontend test suite.
This test is pretty awesome. It is claiming to test devirtualization.
However, the code in question is not in fact devirtualized by LLVM. If
you take the original C++ test case and run it through Clang at -O3 we
fail to devirtualize it completely. It also isn't a sufficiently focused
test case.

The *reason* we fail to devirtualize it isn't because of any missing
instcombine though. Instead, it is because we fail to emit an available
externally vtable and thus the vtable is just an external and completely
opaque. If I cause the vtable to be emitted, we successfully
devirtualize things.

Anyways, I'm just removing it because it is providing negative value at
this point: it isn't representative of the output of Clang really, LLVM
isn't doing the transform it claims to be testing, LLVM's failure to do
the transform isn't actually an LLVM bug at all and we shouldn't be
testing for it here, and finally the test is written in such a way that
it will trivially pass even when the point of the test is failing.

llvm-svn: 220137
2014-10-18 06:36:18 +00:00
Nick Kledzik 4e1ef9951d [llvm-objdump] don't test timestamp dump as that is time zone dependent
llvm-svn: 220123
2014-10-18 02:28:01 +00:00
Nick Kledzik 600f245b8a [llvm-objdump] enhance test case for mach-o -private-headers
llvm-svn: 220120
2014-10-18 01:50:55 +00:00
Nick Kledzik 3b2aa057e6 [llvm-objdump] Fix mach-o binding decompression error
llvm-svn: 220119
2014-10-18 01:21:02 +00:00
Chandler Carruth 2dc9682e59 [SROA] Change how SROA does vector-based promotion of allocas to handle
cases where the alloca type, the load types, and the store types used
all disagree.

Previously, the only way that vector-based promotion occured was if the
alloca type was a vector type. This was one of the *very* few remaining
uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns
out it was a bad idea.

The alloca type can change very easily based on the mixture of types
loaded and stored to that alloca. We shouldn't be relying on it as
a signal for very much. Instead, the source of truth should be loads and
stores. We should canonicalize the loads and stores as much as possible
and then rely on them exclusively in SROA.

When looking and loads and stores, we may find many different candidate
vector types. This change will let SROA try all of them to find a vector
type which is a viable way to promote the entire alloca to a vector
register.

With this change, it becomes possible to do better canonicalization and
optimization of loads and stores without breaking SROA in random ways,
and that should allow fixing a core source of performance loss in hot
numerical loops such as those in Eigen.

llvm-svn: 220116
2014-10-18 00:44:02 +00:00
Aaron Watry 8114437a8f R600/SI: Add global atomicrmw xchg
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220110
2014-10-17 23:33:03 +00:00
Aaron Watry d672ee2a47 R600/SI: Add global atomicrmw xor
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220109
2014-10-17 23:33:01 +00:00
Aaron Watry 8a911e6926 R600/SI: Add global atomicrmw or
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220108
2014-10-17 23:32:59 +00:00
Aaron Watry 58c9992f15 R600/SI: Add global atomicrmw min/umin
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220107
2014-10-17 23:32:57 +00:00
Aaron Watry 29f295d7a5 R600/SI: Add global atomicrmw max/umax
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220106
2014-10-17 23:32:56 +00:00
Aaron Watry 621278034c R600/SI: Add global atomicrmw and
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220105
2014-10-17 23:32:54 +00:00
Aaron Watry 328f1bae8e R600/SI: Add global atomicrmw sub
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220104
2014-10-17 23:32:52 +00:00
Aaron Watry 28682cf205 R600/SI: Fix/add tests for atomicrmw add
The previous tests claimed to test constant offsets in the function name,
but the tests weren't actually testing them.

Clone the tests, and do testing of all combinations of the following:
1) with/without constant pointer offset
2) 32/64-bit addressing modes
3) Usage and non-usage of the return value from the atomicrmw

Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220103
2014-10-17 23:32:50 +00:00
Aaron Watry 1d13d36520 R600: Rename atomic_load global tests to atomic_add
The function name now matches what it's actually testing.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220102
2014-10-17 23:32:49 +00:00
Evgeniy Stepanov e08633e900 [msan] Fix handling of byval arguments with large alignment.
MSan param-tls slots are 8-byte aligned. This change clips
alignment of memcpy into param-tls to 8.

llvm-svn: 220101
2014-10-17 23:29:44 +00:00
Pete Cooper 230332f4fe Check for dynamic alloca's when selecting lifetime intrinsics.
TL;DR: Indexing maps with [] creates missing entries.

The long version:

When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime.  Trouble is, we don't first check to see if this is a dynamic alloca.

On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime.  0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots.

PEI would later trigger an assert because it expects the stack protector to not be dead.

This fix ensures that we only get frame indices for static allocas, ie, those in the map.  Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken.

rdar://problem/18672951

llvm-svn: 220099
2014-10-17 22:59:33 +00:00
Bill Schmidt 296bb31f64 [PowerPC] Disable +vsx RUN line for fma.ll due to inconsistency on other builders
llvm-svn: 220094
2014-10-17 21:32:22 +00:00
Rafael Espindola 7da1ea83a9 Revert "TRE: make TRE a bit more aggressive"
This reverts commit r219899.

This also updates byval-tail-call.ll to make it clear what was breaking.
Adding r219899 again will cause the load/store to disappear.

llvm-svn: 220093
2014-10-17 21:25:48 +00:00
Bill Schmidt a087d74250 [PowerPC] Change liveness testing in VSX FMA mutation pass
With VSX enabled, LLVM crashes when compiling
test/CodeGen/PowerPC/fma.ll.  I traced this to the liveness test
that's revised in this patch. The interval test is designed to only
work for virtual registers, but in this case the AddendSrcReg is
physical. Since there is already a walk of the MIs between the
AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg
in that loop.  At Hal Finkel's request, I converted the liveness test
to an assert restricted to virtual registers.

I've changed the fma.ll test to have VSX and non-VSX variants so we
can test both kinds of multiply-adds.

llvm-svn: 220090
2014-10-17 21:02:44 +00:00
Peter Collingbourne ce80084446 Disable ccache for go tests.
Should fix llvm-clang-lld-x86_64-debian-fast bot.

llvm-svn: 220071
2014-10-17 18:32:36 +00:00
Matt Arsenault d282ada508 R600/SI: Allow commuting with source modifiers
llvm-svn: 220066
2014-10-17 18:00:48 +00:00
Matt Arsenault 6d3cd544bb R600/SI: Allow comuting fp immediates
llvm-svn: 220062
2014-10-17 18:00:39 +00:00
Peter Collingbourne fde87a0cdf We also need to catch OSError here.
llvm-svn: 220058
2014-10-17 17:46:46 +00:00
Matt Arsenault 83a535ff6b R600/SI: Remove SI_BUFFER_RSRC pseudo
Just use REG_SEQUENCE directly, so there are fewer
instructions to need to deal with later.

llvm-svn: 220056
2014-10-17 17:42:56 +00:00
Juergen Ributzka ad2363f9ee [Stackmaps] Enable invoking the patchpoint intrinsic.
Patch by Kevin Modzelewski
Reviewers: atrick, ributzka
Reviewed By: ributzka
Subscribers: llvm-commits, reames

Differential Revision: http://reviews.llvm.org/D5634

llvm-svn: 220055
2014-10-17 17:39:00 +00:00
Andrea Di Biagio c48cb86f05 [X86] Fix missed selection of non-temporal store of zero vector.
When the input to a store instruction was a zero vector, the backend
always selected a normal vector store regardless of the non-temporal
hint. This is fixed by this patch.

This fixes PR19370.

llvm-svn: 220054
2014-10-17 17:27:06 +00:00
James Molloy f497d5511d [AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering.
We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same.

While we're at it, avoid writing to std::vector::end()...

Bug found with random testing and a lot of coffee.

llvm-svn: 220051
2014-10-17 17:06:31 +00:00
Rafael Espindola 44c661b611 Don't crash if find_executable return None.
This was crashing when trying to run the tests on Windows.

llvm-svn: 220048
2014-10-17 16:07:43 +00:00
Bill Schmidt 2d1128acb2 [PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation
Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64
types, but does not yet use lxvw4x and stxvw4x for 4x32 types.  This
patch adds that support.

As with lxvd2x/stxvd2x, this involves straightforward overriding of
the patterns normally recognized for lvx/stvx, with preference given
to the VSX patterns when VSX is enabled.

In addition, the logic for permitting misaligned memory accesses is
modified so that v4r32 and v4i32 are treated the same as v2f64 and
v2i64 when VSX is enabled.  Finally, the DAG generation for unaligned
loads is changed to just use a normal LOAD (which will become lxvw4x)
on P8 and later hardware, where unaligned loads are preferred over
lvsl/lvx/lvx/vperm.

A number of tests now generate the VSX loads/stores instead of
lvx/stvx, so this patch adds VSX variants to those tests.  I've also
added <4 x float> tests to the vsx.ll test case, and created a
vsx-p8.ll test case to be used for testing code generation for the
P8Vector feature.  For now, that simply tests the unaligned load/store
behavior.

This has been tested along with a temporary patch to enable the VSX
and P8Vector features, with no new regressions encountered with or
without the temporary patch applied.

llvm-svn: 220047
2014-10-17 15:13:38 +00:00
Jan Vesely b535c902e6 R600: Add EG to FMA test
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220045
2014-10-17 14:45:27 +00:00
Jan Vesely af62cf4db0 SelectionDAG: Add sext_inreg optimizations
v2: use dyn_cast
    fixup comments
v3: use cast

Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220044
2014-10-17 14:45:25 +00:00
Vasileios Kalintiris 238692beb9 [mips] Add support for COP1's Branch-On-Cond-Likely instructions
Summary: Depends on D5782

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5802

llvm-svn: 220042
2014-10-17 14:08:28 +00:00
Vasileios Kalintiris 6d1e64896d [mips] Add support for COP0's Branch-On-Cond-Likely instructions
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5782

llvm-svn: 220036
2014-10-17 12:38:35 +00:00
Hal Finkel dd38c0b876 [DSE] Remove no-data-layout-only type-based overlap checking
DSE's overlap checking contained special logic, used only when no DataLayout
was available, which inferred a complete overwrite when the pointee types were
equal. This logic seems fine for regular loads/stores, but does not work for
memcpy and friends. Instead of fixing this, I'm just removing it.
Philosophically, transformations should not contain enhanced behavior used only
when data layout is lacking (data layout should be strictly additive), and
maintaining these rarely-tested code paths seems not worthwhile at this stage.

Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case
(slightly reduced from that provided by Aliaksei) replaces the original
contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few
other tests have been updated to have a data layout.

llvm-svn: 220035
2014-10-17 11:56:00 +00:00
Rafael Espindola b66130209b Add back commits r219835 and a fixed version of r219829.
The only difference from r219829 is using

getOrCreateSectionSymbol(*ELFSec)

instead of

GetOrCreateSymbol(ELFSec->getSectionName())

in ELFObjectWriter which causes us to use the correct section symbol even if
we have multiple sections with the same name.

Original messages:

r219829:
Correctly handle references to section symbols.

When processing assembly like

.long .text

we were creating a new undefined symbol .text. GAS on the other hand would
handle that as a reference to the .text section.

This patch implements that by creating the section symbols earlier so that
they are visible during asm parsing.

The patch also updates llvm-readobj to print the symbol number in the relocation
dump so that the test can differentiate between two sections with the same name.

r219835:
Allow forward references to section symbols.

llvm-svn: 220021
2014-10-17 01:48:58 +00:00
Bill Schmidt 32e9c6465b [PPC] Adjust some PowerPC tests to account for presence/absence of VSX
Patch by Bill Seurer; committed on his behalf.

These test cases generate slightly different code sequences when VSX
is activated and thus fail. The update turns off VSX explicitly for
the existing checks and then adds a second set of checks for most of
them that test the VSX instruction output.

llvm-svn: 220019
2014-10-17 01:41:22 +00:00
Rafael Espindola 0eb2cec936 Add a test that would have found the bug in r219829.
llvm-svn: 220016
2014-10-17 01:34:23 +00:00
Akira Hatanaka 0d0c78180d ARM: Fix a bug which was causing convergence failure in constant-island pass.
The bug is in ARMConstantIslands::createNewWater where the upper bound of the
new water split point is computed:

// This could point off the end of the block if we've already got constant
// pool entries following this block; only the last one is in the water list.
// Back past any possible branches (allow for a conditional and a maximally
// long unconditional).
if (BaseInsertOffset + 8 >= UserBBI.postOffset()) {
  BaseInsertOffset = UserBBI.postOffset() - UPad - 8;
  DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset));
}

The split point is supposed to be somewhere between the machine instruction that
loads from the constant pool entry and the end of the basic block, before branch
instructions. The code above is fine if the basic block is large enough and
there are a sufficient number of instructions following the machine instruction.
However, if the machine instruction is near the end of the basic block,
BaseInsertOffset can point to the machine instruction or another instruction
that precedes it, and this can lead to convergence failure.

This commit fixes this bug by ensuring BaseInsertOffset is larger than the
offset of the instruction following the constant-loading instruction.

rdar://problem/18581150

llvm-svn: 220015
2014-10-17 01:31:47 +00:00
Rafael Espindola 4544a4062c Revert commit r219835 and r219829.
Revert "Correctly handle references to section symbols."
Revert "Allow forward references to section symbols."

Rui found a regression I am debugging.

llvm-svn: 220010
2014-10-17 01:06:02 +00:00
Peter Zotov 9e40430102 [OCaml] Add Llvm.instr_clone.
llvm-svn: 220008
2014-10-17 01:02:40 +00:00
Alexander Potapenko 7aaf514092 [llvm-symbolizer] Introduce the -dsym-hint option.
llvm-symbolizer will consult one of the .dSYM paths passed via -dsym-hint
if it fails to find the .dSYM bundle at the default location.

llvm-svn: 220004
2014-10-17 00:50:19 +00:00
Peter Collingbourne 659670d933 Add our own copy of the find_executable function to cope with installations
that do not have the distutils.spawn package. Should hopefully fix the
aarch64 buildbot.

llvm-svn: 219991
2014-10-16 23:43:20 +00:00
Peter Collingbourne 82e3e373b3 Initial version of Go bindings.
This code is based on the existing LLVM Go bindings project hosted at:
https://github.com/go-llvm/llvm

Note that all contributors to the gollvm project have agreed to relicense
their changes under the LLVM license and submit them to the LLVM project.

Differential Revision: http://reviews.llvm.org/D5684

llvm-svn: 219976
2014-10-16 22:48:02 +00:00
Robin Morisset e2de06bef6 Erase fence insertion from SelectionDAGBuilder.cpp (NFC)
Summary:
Backends can use setInsertFencesForAtomic to signal to the middle-end that
montonic is the only memory ordering they can accept for
stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger
ordering to fences + monotonic accesses is currently living in
SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it
for several reasons:
- There is lots of redundancy to avoid: extremely similar logic already
  exists in AtomicExpand.
- The current code in SelectionDAGBuilder does not use any target-hooks, it
  does the same transformation for every backend that requires it
- As a result it is plain *unsound*, as it was apparently designed for ARM.
  It happens to mostly work for the other targets because they are extremely
  conservative, but Power for example had to switch to AtomicExpand to be
  able to use lwsync safely (see r218331).
- Because it produces IR-level fences, it cannot be made sound ! This is noted
  in the C++11 standard (section 29.3, page 1140):
```
Fences cannot, in general, be used to restore sequential consistency for atomic
operations with weaker ordering semantics.
```
It can also be seen by the following example (called IRIW in the litterature):
```
atomic<int> x = y = 0;
int r1, r2, r3, r4;
Thread 0:
  x.store(1);
Thread 1:
  y.store(1);
Thread 2:
  r1 = x.load();
  r2 = y.load();
Thread 3:
  r3 = y.load();
  r4 = x.load();
```
r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst.
But if they are lowered to monotonic accesses, no amount of fences can prevent it..

This patch does three things (I could cut it into parts, but then some of them
would not be tested/testable, please tell me if you would prefer that):
- it provides a default implementation for emitLeadingFence/emitTrailingFence in
terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder.
As we saw above, this is unsound, but the best that can be done without knowing
the targets well (and there is a comment warning about this risk).
- it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default
implementation (that exactly replicates the logic of SelectionDAGBuilder, so no
functional change)
- it finally erase this logic from SelectionDAGBuilder as it is dead-code.

Ideally, each target would define its own override for emitLeading/TrailingFence
using target-specific fences, but I do not know the Sparc/Mips/XCore memory model
well enough to do this, and they appear to be dealing fine with the ARM-inspired
default expansion for now (probably because they are overly conservative, as
Power was). If anyone wants to compile fences more agressively on these
platforms, the long comment should make it clear why he should first override
emitLeading/TrailingFence.

Test Plan: make check-all, no functional change

Reviewers: jfb, t.p.northover

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5474

llvm-svn: 219957
2014-10-16 20:34:57 +00:00
Matt Arsenault a3fe7c62d1 R600: Fix nonsensical implementation of computeKnownBits for BFE
This was resulting in invalid simplifications of sdiv

llvm-svn: 219953
2014-10-16 20:07:40 +00:00
Rafael Espindola 11aaaeebe0 Delete -std-compile-opts.
These days -std-compile-opts was just a silly alias for -O3.

llvm-svn: 219951
2014-10-16 20:00:02 +00:00
Bjorn Steinbrink d20816fde9 Allow call-slop optzn for destinations with a suitable dereferenceable attribute
Summary:
Currently, call slot optimization requires that if the destination is an
argument, the argument has the sret attribute. This is to ensure that
the memory access won't trap. In addition to sret, we can also allow the
optimization to happen for arguments that have the new dereferenceable
attribute, which gives the same guarantee.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5832

llvm-svn: 219950
2014-10-16 19:43:08 +00:00
Sanjay Patel c699a6117b fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)
If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().

In the simplest case, this:

   y = sqrt(x * x);

becomes this:

   y = fabs(x);

This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.

Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to
use function-level attributes to do this optimization. This needs to be fixed
for both the intrinsics and in the backend.

Differential Revision: http://reviews.llvm.org/D5787

llvm-svn: 219944
2014-10-16 18:48:17 +00:00
Juergen Ributzka 03a0611061 [AArch64] Fix miscompile of sdiv-by-power-of-2.
When the constant divisor was larger than 32bits, then the optimized code
generated for the AArch64 backend would emit the wrong code, because the shift
was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would
loose the upper 32bits.

This fixes rdar://problem/18678801.

llvm-svn: 219934
2014-10-16 16:41:15 +00:00
Vasileios Kalintiris 167c372118 [mips] Account for endianess when expanding BuildPairF64/ExtractElementF64 nodes.
Summary:
In order to support big endian targets for the BuildPairF64 nodes we
just need to swap the low/high pair registers. Additionally, for the
ExtractElementF64 nodes we have to calculate the correct stack offset
with respect to the node's register/operand that we want to extract.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5753

llvm-svn: 219931
2014-10-16 15:41:51 +00:00
Vasileios Kalintiris 711028f718 [mips] Marked the DI/EI instruction aliases as MIPS32r2
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5751

llvm-svn: 219927
2014-10-16 15:23:52 +00:00
Akira Hatanaka 5c221ef98f Reapply r219832 - InstCombine: Narrow switch instructions using known bits.
The code committed in r219832 asserted when it attempted to shrink a switch
statement whose type was larger than 64-bit.

llvm-svn: 219902
2014-10-16 06:00:46 +00:00
Saleem Abdulrasool 7f52921976 TRE: make TRE a bit more aggressive
Make tail recursion elimination a bit more aggressive.  This allows us to get
tail recursion on functions that are just branches to a different function.  The
fact that the function takes a byval argument does not restrict it from being
optimised into just a tail call.

llvm-svn: 219899
2014-10-16 03:27:30 +00:00
Akira Hatanaka 40c2cf4afc Revert r219832.
llvm-svn: 219884
2014-10-16 01:17:02 +00:00
Sanjoy Das 360b1ed5f2 Revert "r219834 - Teach ScalarEvolution to sharpen range information"
This change breaks the asan buildbots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13468

llvm-svn: 219878
2014-10-15 23:46:04 +00:00
Hal Finkel 68dc3c7ab2 Preserve non-byval pointer alignment attributes using @llvm.assume when inlining
For pointer-typed function arguments, enhanced alignment can be asserted using
the 'align' attribute. When inlining, if this enhanced alignment information is
not otherwise available, preserve it using @llvm.assume-based alignment
assumptions.

llvm-svn: 219876
2014-10-15 23:44:41 +00:00
Adam Nemet 4285c1f8cc [AVX512] Add DQ subvector inserts
In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4
respectively.  These are matched by "Alt" Pat<>'s (Alt stands for alternative
VTs).

Since DQ has native support for these intructions, I peeled off the non-"Alt"
part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are
derived from this multiclass.  The "Alt" Pat<>'s are disabled with DQ.

Fixes <rdar://problem/18426089>

llvm-svn: 219874
2014-10-15 23:42:17 +00:00
Adam Nemet 2b71ca5d88 [AVX512] Add SKX testing to avx512-insert-extract.ll
This is in preparation to adding DQ subvector inserts to this testcase.

llvm-svn: 219873
2014-10-15 23:42:14 +00:00
Adam Nemet f3aba14dad [AVX512] Fix test to produce a defined value
We're inserting into a 8 wide vector, so the index should be < 8.

llvm-svn: 219872
2014-10-15 23:42:11 +00:00
Tom Stellard c8d7920ad9 R600/SI: Fix bug where immediates were being used in DS addr operands
The SelectDS1Addr1Offset complex pattern always tries to store constant
lds pointers in the offset operand and store a zero value in the addr operand.
Since the addr operand does not accept immediates, the zero value
needs to first be copied to a register.

This newly created zero value will not go through normal instruction
selection, so we need to manually insert a V_MOV_B32_e32 in the complex
pattern.

This bug was hidden by the fact that if there was another zero value
in the DAG that had not been selected yet, then the CSE done by the DAG
would use the unselected node for the addr operand rather than the one
that was just created.  This would lead to the zero value being selected
and the DAG automatically inserting a V_MOV_B32_e32 instruction.

llvm-svn: 219848
2014-10-15 21:08:59 +00:00
Rafael Espindola 78206c3576 Allow forward references to section symbols.
llvm-svn: 219835
2014-10-15 19:30:18 +00:00
Sanjoy Das 90c2f1455a Teach ScalarEvolution to sharpen range information.
If x is known to have the range [a, b) in a loop predicated by (icmp
ne x, a), its range can be sharpened to [a + 1, b).  Get
ScalarEvolution and hence IndVars to exploit this fact.
    
This change triggers an optimization to widen-loop-comp.ll, so it had
to be edited to get it to pass.

phabricator: http://reviews.llvm.org/D5639
llvm-svn: 219834
2014-10-15 19:25:28 +00:00
Akira Hatanaka 5bb9346a45 InstCombine: Narrow switch instructions using known bits.
Truncate the operands of a switch instruction to a narrower type if the upper
bits are known to be all ones or zeros.

rdar://problem/17720004

llvm-svn: 219832
2014-10-15 19:05:50 +00:00
Juergen Ributzka f82c987a5c Reapply "[FastISel][AArch64] Add custom lowering for GEPs."
This is mostly a copy of the existing FastISel GEP code, but we have to
duplicate it for AArch64, because otherwise we would bail out even for simple
cases. This is because the standard fastEmit functions don't cover MUL at all
and ADD is lowered very inefficientily.

The original commit had a bug in the add emit logic, which has been fixed.

llvm-svn: 219831
2014-10-15 18:58:07 +00:00
Rafael Espindola a74b5e6823 Correctly handle references to section symbols.
When processing assembly like

.long .text

we were creating a new undefined symbol .text. GAS on the other hand would
handle that as a reference to the .text section.

This patch implements that by creating the section symbols earlier so that
they are visible during asm parsing.

The patch also updates llvm-readobj to print the symbol number in the relocation
dump so that the test can differentiate between two sections with the same name.

llvm-svn: 219829
2014-10-15 18:55:30 +00:00
Matt Arsenault 1a74aff846 R600/SI: Also try to use 0 base for misaligned 8-byte DS loads.
llvm-svn: 219823
2014-10-15 18:06:43 +00:00
Matt Arsenault 7b68fdf3c0 R600: Fix miscompiles when BFE has multiple uses
SimplifyDemandedBits would break the other uses of the operand.

llvm-svn: 219819
2014-10-15 17:58:34 +00:00
Hal Finkel 3b7fc86677 [SLPVectorize] Basic ephemeral-value awareness
The SLP vectorizer should not vectorize ephemeral values. These are used to
express information to the optimizer, and vectorizing them does not lead to
faster code (because the ephemeral values are dropped prior to code generation,
vectorized or not), and obscures the information the instructions are
attempting to communicate (the logic that interprets the arguments to
@llvm.assume generically does not understand vectorized conditions).

Also, uses by ephemeral values are free (because they, and the necessary
extractelement instructions, will be dropped prior to code generation).

llvm-svn: 219816
2014-10-15 17:35:01 +00:00
Derek Schuff 05fb735f3a [MC] Make bundle alignment mode setting idempotent and support nested bundles
Summary:
Currently an error is thrown if bundle alignment mode is set more than once
per module (either via the API or the .bundle_align_mode directive). This
change allows setting it multiple times as long as the alignment doesn't
change.

Also nested bundle_lock groups are currently not allowed. This change allows
them, with the effect that the group stays open until all nests are exited,
and if any of the bundle_lock directives has the align_to_end flag, the
group becomes align_to_end.

These changes make the bundle aligment simpler to use in the compiler, and
also better match the corresponding support in GNU as.

Reviewers: jvoung, eliben

Differential Revision: http://reviews.llvm.org/D5801

llvm-svn: 219811
2014-10-15 17:10:04 +00:00
Juergen Ributzka 42379d4cf7 Revert "[FastISel][AArch64] Add custom lowering for GEPs."
This breaks our internal build bots. Reverting it to get the bots green again.

llvm-svn: 219776
2014-10-15 04:55:48 +00:00
Jingyue Wu 2954280f6a [MachineSink] Use the real post dominator tree
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in
isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo.
The old heuristics caused instructions to sink unnecessarily, and might create
register pressure.

This is the second try of the fix. The first one (D4814) caused a performance
regression due to failing to sink instructions out of loops (PR21115). This
patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower
one regardless of whether the target block post-dominates the source.

Thanks Alexey Volkov for reporting PR21115! 

Test Plan:
Added a NVPTX codegen test to verify that our change prevents the backend from
over-sinking. It also shows the unnecessary register pressure caused by
over-sinking.

Added an X86 test to verify we can sink instructions out of loops regardless of
the dominance relationship. This test is reduced from Alexey's test in PR21115.

Updated an affected test in X86.

Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime
performance. Results are attached separately in the review thread.

Reviewers: Jiangning, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D5633

llvm-svn: 219773
2014-10-15 03:27:43 +00:00
Nick Kledzik 51d2c2bf85 [llvm-objdump] Update error message and add test case for mach-o file with bad library ordinals
llvm-svn: 219746
2014-10-14 23:29:38 +00:00
Gerolf Hoflehner a4c96d02a2 [AAarch64] Optimize CSINC-branch sequence
Peephole optimization that generates a single conditional branch
for csinc-branch sequences like in the examples below. This is
possible when the csinc sets or clears a register based on a condition
code and the branch checks that register. Also the condition
code may not be modified between the csinc and the original branch.

Examples:

1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44
   to b.<invCC>

2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44
   to b.<CC>


rdar://problem/18506500

llvm-svn: 219742
2014-10-14 23:07:53 +00:00
Hal Finkel 1a600faba0 [LoopVectorize] Ignore @llvm.assume for cost estimates and legality
A few minor changes to prevent @llvm.assume from interfering with loop
vectorization. First, treat @llvm.assume like the lifetime intrinsics, which
are scalarized (but don't otherwise interfere with the legality checking).
Second, ignore the cost of ephemeral instructions in the loop (these will go
away anyway during CodeGen).

Alignment assumptions and other uses of @llvm.assume can often end up inside of
loops that should be vectorized (this is not uncommon for assumptions generated
by __attribute__((align_value(n))), for example).

llvm-svn: 219741
2014-10-14 22:59:49 +00:00
David Majnemer 3416b48980 MC, COFF: Make bigobj test compatible with python3
No functionality change intended.

llvm-svn: 219739
2014-10-14 22:35:11 +00:00
Simon Pilgrim a798e9ffdf [X86][SSE] pslldq/psrldq shuffle mask decodes
Patch to provide shuffle decodes and asm comments for the sse pslldq/psrldq SSE2/AVX2 byte shift instructions.

Differential Revision: http://reviews.llvm.org/D5598

llvm-svn: 219738
2014-10-14 22:31:34 +00:00
David Majnemer 0b011b337f MC: Rewrite bigobj test in python
This makes the test easier to work with.  No functionality change
intended.

llvm-svn: 219737
2014-10-14 22:26:49 +00:00
Tim Northover cf6ce0c8f7 ARM: remove ARM/Thumb distinction for preferred alignment.
Thumb1 has legitimate reasons for preferring 32-bit alignment of types
i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be
a multiple of 4. However, this is a trade-off betweem code size and RAM usage;
the DataLayout string is not the best place to represent it even if desired.

So this patch removes the extra Thumb requirements, hopefully making ARM and
Thumb completely compatible in this respect.

llvm-svn: 219734
2014-10-14 22:12:17 +00:00
Tim Northover 9a4c043d67 ARM: allow misaligned local variables in Thumb1 mode.
There's no hard requirement on LLVM to align local variable to 32-bits, so the
Thumb1 frame handling needs to be able to deal with variables that are only
naturally aligned without falling over.

llvm-svn: 219733
2014-10-14 22:12:14 +00:00
David Majnemer 5fdfa70a1f Add a test for writing COFF BigObj
llvm-svn: 219729
2014-10-14 21:47:53 +00:00
Juergen Ributzka 4dfd590eaa [FastISel][AArch64] Add custom lowering for GEPs.
This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail
out even for simple cases, because the standard fastEmit functions don't cover
MUL and ADD is lowered inefficientily.

llvm-svn: 219726
2014-10-14 21:41:23 +00:00
Hans Wennborg f6aafeee60 [x86 asm] allow fwait alias in both At&t and Intel modes (PR21208)
Differential Revision: http://reviews.llvm.org/D5741

llvm-svn: 219725
2014-10-14 21:41:17 +00:00
Tim Northover aa09ac6e83 ARM: set preferred aggregate alignment to 32 universally.
Before, ARM and Thumb mode code had different preferred alignments, which could
lead to some rather unexpected results. There's justification for reducing it
from the default 64-bits (wasted space), but I don't think there is for going
below 32-bits.

There's no actual ABI change here, just to reassure people.

llvm-svn: 219719
2014-10-14 20:57:26 +00:00
Hal Finkel db5f86a9bf [CFL-AA] CFL-AA should not assert on an va_arg instruction
The CFL-AA implementation was missing a visit* routine for va_arg instructions,
causing it to assert when run on a function that had one. For now, handle these
in a conservative way.

Fixes PR20954.

llvm-svn: 219718
2014-10-14 20:51:26 +00:00
Sanjay Patel 0ca42bb5a8 Optimize away fabs() calls when input is squared (known positive).
Eliminate library calls and intrinsic calls to fabs when the input 
is a squared value.

Note that no unsafe-math / fast-math assumptions are needed for
this optimization.

Differential Revision: http://reviews.llvm.org/D5777

llvm-svn: 219717
2014-10-14 20:43:11 +00:00
Juergen Ributzka cd11a2806b [FastISel][AArch64] Fix sign-/zero-extend folding when SelectionDAG is involved.
Sign-/zero-extend folding depended on the load and the integer extend to be
both selected by FastISel. This cannot always be garantueed and SelectionDAG
might interfer. This commit adds additonal checks to load and integer extend
lowering to catch this.

Related to rdar://problem/18495928.

llvm-svn: 219716
2014-10-14 20:36:02 +00:00
David Majnemer dad2103801 InstCombine: Don't miscompile X % ((Pow2 << A) >>u B)
We assumed that A must be greater than B because the right hand side of
a remainder operator must be nonzero.

However, it is possible for A to be less than B if Pow2 is a power of
two greater than 1.

Take for example:
i32 %A = 0
i32 %B = 31
i32 Pow2 = 2147483648

((Pow2 << 0) >>u 31) is non-zero but A is less than B.

This fixes PR21274.

llvm-svn: 219713
2014-10-14 20:28:40 +00:00
Jan Vesely e5121f3c10 Reapply "R600: Add new intrinsic to read work dimensions"
This effectively reverts revert 219707. After fixing the test to work with
new function name format and renamed intrinsic.

Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219710
2014-10-14 20:05:26 +00:00
Hal Finkel 171c2ec008 Revert "r216914 - Revert: [APFloat] Fixed a bug in method 'fusedMultiplyAdd'"
Reapply r216913, a fix for PR20832 by Andrea Di Biagio. The commit was reverted
because of buildbot failures, and credit goes to Ulrich Weigand for isolating
the underlying issue (which can be confirmed by Valgrind, which does helpfully
light up like the fourth of July). Uli explained the problem with the original
patch as:

  It seems the problem is calling multiplySignificand with an addend of category
  fcZero; that is not expected by this routine.  Note that for fcZero, the
  significand parts are simply uninitialized, but the code in (or rather, called
  from) multiplySignificand will unconditionally access them -- in effect using
  uninitialized contents.

This version avoids using a category == fcZero addend within
multiplySignificand, which avoids this problem (the Valgrind output is also now
clean).

Original commit message:

[APFloat] Fixed a bug in method 'fusedMultiplyAdd'.

When folding a fused multiply-add builtin call, make sure that we propagate the
correct result in the case where the addend is zero, and the two other operands
are finite non-zero.

Example:
  define double @test() {
    %1 = call double @llvm.fma.f64(double 7.0, double 8.0, double 0.0)
    ret double %1
  }

Before this patch, the instruction simplifier wrongly folded the builtin call
in function @test to constant 'double 7.0'.
With this patch, method 'fusedMultiplyAdd' correctly evaluates the multiply and
propagates the expected result (i.e. 56.0).

Added test fold-builtin-fma.ll with the reproducible from PR20832 plus extra
test cases to verify the behavior of method 'fusedMultiplyAdd' in the presence
of NaN/Inf operands.

This fixes PR20832.

llvm-svn: 219708
2014-10-14 19:23:07 +00:00
Rafael Espindola db3f0a24ec Revert "R600: Add new intrinsic to read work dimensions"
This reverts commit r219705.

CodeGen/R600/work-item-intrinsics.ll was failing on linux.

llvm-svn: 219707
2014-10-14 18:58:04 +00:00
Jan Vesely 86187d231a R600: Add new intrinsic to read work dimensions
v2: Add SI lowering
    Add test

v3: Place work dimensions after the kernel arguments.
v4: Calculate offset while lowering arguments
v5: rebase
v6: change prefix to AMDGPU

Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219705
2014-10-14 18:52:07 +00:00
Matt Arsenault e775f5fe76 R600/SI: Use DS offsets for constant addresses
Use 0 as the base address for a constant address, so if
we have a constant address we can save moves and form
read2/write2s.

llvm-svn: 219698
2014-10-14 17:21:19 +00:00
Hal Finkel a3f23e3725 [LVI] Check for @llvm.assume dominating the edge branch
When LazyValueInfo uses @llvm.assume intrinsics to provide edge-value
constraints, we should check for intrinsics that dominate the edge's branch,
not just any potential context instructions. An assumption that dominates the
edge's branch represents a truth on that edge. This is specifically useful, for
example, if multiple predecessors assume a pointer to be nonnull, allowing us
to simplify a later null comparison.

The test case, and an initial patch, were provided by Philip Reames. Thanks!

llvm-svn: 219688
2014-10-14 16:04:49 +00:00
Robert Khasanov 1a77f6664e [AVX512] Extended avx512_binop_rm to DQ/VL subsets.
Added encoding tests.

llvm-svn: 219686
2014-10-14 15:13:56 +00:00
Robert Khasanov 545d1b7726 [AVX512] Extended avx512_binop_rm to BW/VL subsets.
Added encoding tests.

llvm-svn: 219685
2014-10-14 14:36:19 +00:00
Bradley Smith 698e08f4cf [AArch64] Fix crash with empty/pseudo-only blocks in A53 erratum (835769) workaround
llvm-svn: 219684
2014-10-14 14:02:41 +00:00
Hao Liu 3cb826ca10 [AArch64]Select wide immediate offset into [Base+XReg] addressing mode
e.g Currently we'll generate following instructions if the immediate is too wide:
    MOV  X0, WideImmediate
    ADD  X1, BaseReg, X0
    LDR  X2, [X1, 0]

    Using [Base+XReg] addressing mode can save one ADD as following:
    MOV  X0, WideImmediate
    LDR  X2, [BaseReg, X0]

    Differential Revision: http://reviews.llvm.org/D5477

llvm-svn: 219665
2014-10-14 06:50:36 +00:00
Marcello Maggioni 5bbe3df63f Switch to select optimization for two-case switches
This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block
and a test to check that this condition is handled.

llvm-svn: 219656
2014-10-14 01:58:26 +00:00
David Majnemer db0773089f InstCombine: Fix miscompile in X % -Y -> X % Y transform
We assumed that negation operations of the form (0 - %Z) resulted in a
negative number.  This isn't true if %Z was originally negative.
Substituting the negative number into the remainder operation may result
in undefined behavior because the dividend might be INT_MIN.

This fixes PR21256.

llvm-svn: 219639
2014-10-13 22:37:51 +00:00
David Majnemer a252138942 InstCombine: Don't miscompile (x lshr C1) udiv C2
We have a transform that changes:
  (x lshr C1) udiv C2
into:
  x udiv (C2 << C1)

However, it is unsafe to do so if C2 << C1 discards any of C2's bits.

This fixes PR21255.

llvm-svn: 219634
2014-10-13 21:48:30 +00:00
Timur Iskhodzhanov 1ee5ac87e2 Add VS2012-generated test inputs for test/tools/llvm-readobj/codeview-linetables.test
llvm-svn: 219621
2014-10-13 17:03:13 +00:00
Filipe Cabecinhas 9d7bd78ffa Fix a broadcast related regression on the vector shuffle lowering.
Summary: Test by Robert Lougher!

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5745

llvm-svn: 219617
2014-10-13 16:16:16 +00:00
Renato Golin 16ea8ba3bc Adds support for the Cortex-A17 to the ARM backend
Patch by Matthew Wahab.

llvm-svn: 219606
2014-10-13 10:22:19 +00:00
Daniel Sanders 642daf0c0c [mips] Mark redundant instructions with a comment in test/CodeGen/Mips/Fast-ISel/icmpa.ll.
llvm-svn: 219605
2014-10-13 10:18:02 +00:00
Bradley Smith f2a801d8ac [AArch64] Add workaround for Cortex-A53 erratum (835769)
Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is
possible for a 64-bit multiply-accumulate instruction in AArch64 state to
generate an incorrect result.  The details are quite complex and hard to
determine statically, since branches in the code may exist in some
 circumstances, but all cases end with a memory (load, store, or prefetch)
instruction followed immediately by the multiply-accumulate operation.

The safest work-around for this issue is to make the compiler avoid emitting
multiply-accumulate instructions immediately after memory instructions and the
simplest way to do this is to insert a NOP.

This patch implements such work-around in the backend, enabled via the option
-aarch64-fix-cortex-a53-835769.

The work-around code generation is not enabled by default.

llvm-svn: 219603
2014-10-13 10:12:35 +00:00
Yuri Gorshenin 46853b55fa [asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register.
Summary: [asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5599

llvm-svn: 219602
2014-10-13 09:37:47 +00:00
NAKAMURA Takumi 75a0240056 Revert r219584, "[X86] Memory folding for commutative instructions."
It broke i686 selfhosting.

llvm-svn: 219595
2014-10-13 04:17:34 +00:00
Joerg Sonnenberger 5ca10d0edb Revert r219223, it creates invalid PHI nodes.
llvm-svn: 219587
2014-10-12 17:16:04 +00:00
Benjamin Kramer 240b85eec5 InstCombine: Turn (x != 0 & x <u C) into the canonical range check form (x-1 <u C-1)
llvm-svn: 219585
2014-10-12 14:02:34 +00:00
Simon Pilgrim 77ac26d279 [X86] Memory folding for commutative instructions.
This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.

This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too.

Differential Revision: http://reviews.llvm.org/D5701

llvm-svn: 219584
2014-10-12 10:52:55 +00:00
NAKAMURA Takumi 6fd86f1678 llvm/test/CodeGen: Some tests don't REQUIRE asserts any more. Remove them.
llvm-svn: 219581
2014-10-12 06:47:47 +00:00
NAKAMURA Takumi 7198a199aa Suppress llvm-ar's MRI tests for now on win32, since line_iterator is incompatible to CRLF.
llvm-svn: 219579
2014-10-11 22:21:27 +00:00
David Majnemer fe7fccff11 InstCombine: Don't fold (X <<s log(INT_MIN)) /s INT_MIN to X
Consider the case where X is 2.  (2 <<s 31)/s-2147483648 is zero but we
would fold to X.  Note that this is valid when we are in the unsigned
domain because we require NUW: 2 <<u 31 results in poison.

This fixes PR21245.

llvm-svn: 219568
2014-10-11 10:20:04 +00:00
David Majnemer cb9d596655 InstCombine, InstSimplify: (%X /s C1) /s C2 isn't always 0 when C1 * C2 overflow
consider:
C1 = INT_MIN
C2 = -1

C1 * C2 overflows without a doubt but consider the following:
%x = i32 INT_MIN

This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1.

N. B.  Move the unsigned version of this transform to InstSimplify, it
doesn't create any new instructions.

This fixes PR21243.

llvm-svn: 219567
2014-10-11 10:20:01 +00:00
David Majnemer 3cac85e071 InstCombine: mul to shl shouldn't preserve nsw
consider:
mul i32 nsw %x, -2147483648

this instruction will not result in poison if %x is 1

however, if we transform this into:
shl i32 nsw %x, 31

then we will be generating poison because we just shifted into the sign
bit.

This fixes PR21242.

llvm-svn: 219566
2014-10-11 10:19:52 +00:00
Reed Kotler 62de6b96b5 Add basic conditional branches in mips fast-isel
Summary: Implement the most basic form of conditional branches in Mips fast-isel.

Test Plan:
br1.ll
run 4 flavors of test-suite. mips32 r1/r2 and at -O0/O2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D5583

llvm-svn: 219556
2014-10-11 00:55:18 +00:00
Sanjay Patel ad8b666624 Return undef on FP <-> Int conversions that overflow (PR21330).
The LLVM Lang Ref states for signed/unsigned int to float conversions:
"If the value cannot fit in the floating point value, the results are undefined."

And for FP to signed/unsigned int:
"If the value cannot fit in ty2, the results are undefined."

This matches the C definitions.

The existing behavior pins to infinity or a max int value, but that may just
lead to more confusion as seen in:
http://llvm.org/bugs/show_bug.cgi?id=21130

Returning undef will hopefully lead to a less silent failure.

Differential Revision: http://reviews.llvm.org/D5603

llvm-svn: 219542
2014-10-10 23:00:21 +00:00
Matt Arsenault 61cc9083d0 R600/SI: Change how DS offsets are printed
Match SC by using offset/offset0/offset1 and printing
in decimal.

llvm-svn: 219537
2014-10-10 22:16:07 +00:00
Matt Arsenault fe0a2e677b R600/SI: Match read2/write2 stride 64 versions
llvm-svn: 219536
2014-10-10 22:12:32 +00:00
Matt Arsenault 410332860d R600/SI: Add load / store machine optimizer pass.
Currently this only functions to match simple cases
where ds_read2_* / ds_write2_* instructions can be used.

In the future it might match some of the other weird
load patterns, such as direct to LDS loads.

Currently enabled only with a subtarget feature to enable
easier testing.

llvm-svn: 219533
2014-10-10 22:01:59 +00:00
Sanjoy Das 1f05c51e5e This patch teaches ScalarEvolution to pick and use !range metadata.
It also makes it more aggressive in querying range information by
adding a call to isKnownPredicateWithRanges to
isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond.

phabricator: http://reviews.llvm.org/D5638

Reviewed by: atrick, hfinkel

llvm-svn: 219532
2014-10-10 21:22:34 +00:00
Reed Kotler 1f64ecab79 Implement floating point compare for mips fast-isel
Summary: Expand SelectCmp to handle floating point compare

Test Plan:
fpcmpa.ll
run 4 flavors of test-suite, mips32 r1/r2 O0/O2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D5567

llvm-svn: 219530
2014-10-10 20:46:28 +00:00
Rafael Espindola b275797e31 llvm-ar: Start adding support for mri scripts.
I was quiet surprised to find this feature being used. Fortunately the uses
I found look fairly simple. In fact, they are just a very verbose version
of the regular ar commands.

Start implementing it then by parsing the script and setting the command
variables as if we had a regular command line.

This patch adds just enough support to create an empty archive and do a bit
of error checking. In followup patches I will implement at least addmod
and addlib.

From the description in the manual, even the more general case should not
be too hard to implement if needed. The features that don't map 1:1 to
the simple command line are

* Reading from multiple archives.
* Creating multiple archives.

llvm-svn: 219521
2014-10-10 18:33:51 +00:00
Reed Kotler 497311ab99 implement integer compare in mips fast-isel
Summary: implement SelectCmp (integer compare ) in mips fast-isel

Test Plan:
icmpa.ll
also ran 4 test-suite flavors mips32 r1/r2 O0/O2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits, rfuhler, mcrosier

Differential Revision: http://reviews.llvm.org/D5566

llvm-svn: 219518
2014-10-10 17:39:51 +00:00
Mark Heffernan 2beab5f0b4 This patch de-pessimizes the calculation of loop trip counts in
ScalarEvolution in the presence of multiple exits. Previously all
loops exits had to have identical counts for a loop trip count to be
considered computable. This pessimization was implemented by calling
getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock)
inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME
in the comments of that function). The pessimization was added to fix
a corner case involving undefined behavior (pr/16130). This patch more
precisely handles the undefined behavior case allowing the pessimization
to be removed.

ControlsExit replaces IsSubExpr to more precisely track the case where
undefined behavior is expected to occur. Because undefined behavior is
tracked more precisely we can remove MustExit from ExitLimit. MustExit
was used to track the case where the limit was computed potentially
assuming undefined behavior even if undefined behavior didn't necessarily
occur.

llvm-svn: 219517
2014-10-10 17:39:11 +00:00
Hal Finkel 7a87f8a670 [MiSched] Fix a logic error in tryPressure()
Fixes a logic error in the MachineScheduler found by Steve Montgomery (and
confirmed by Andy). This has gone unfixed for months because the fix has been
found to introduce some small performance regressions. However, Andy has
recommended that, at this point, we fix this to avoid further dependence on the
incorrect behavior (and then follow-up separately on any regressions), and I
agree.

Fixes PR18883.

llvm-svn: 219512
2014-10-10 17:06:20 +00:00
Reed Kotler 12f9488e33 Implement floating point to integer conversion in mips fast-isel
Summary: Add the ability to convert 64 or 32 bit floating point values to integer in mips fast-isel

Test Plan:
fpintconv.ll
ran 4 flavors of test-suite with no errors, misp32 r1/r2 O0/O2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits, rfuhler, mcrosier

Differential Revision: http://reviews.llvm.org/D5562

llvm-svn: 219511
2014-10-10 17:00:46 +00:00
Frederic Riss b3c9912a45 [dwarfdump] Prettyprint DW_AT_APPLE_property_attribute bitfield values.
This change depends on the ApplePropertyString helper that I sent spearately.
Not sure how you want this tested: as a tool test by adding a binary to dump, or as an llvm test starting from an IR file?

Reviewers: dblaikie, samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5689

llvm-svn: 219507
2014-10-10 15:51:10 +00:00
Frederic Riss d4de180e19 [dwarfdump] Resolve also variable specifications/abstract_origins.
DW_AT_specification and DW_AT_abstract_origin resolving was only performed
on subroutine DIEs because it used the getSubroutineName method. Introduce
a more generic getName() and use it to dump the reference attributes.

Testcases have been updated to check the printed names instead of the offsets
except when the name could be ambiguous.

Reviewers: dblaikie, samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5625

llvm-svn: 219506
2014-10-10 15:51:02 +00:00
Zoran Jovanovic 98bd58ca33 [mips][microMIPS] Implement ADDIUSP instruction
Differential Revision: http://reviews.llvm.org/D5084

llvm-svn: 219500
2014-10-10 14:37:30 +00:00
Zoran Jovanovic 95e14e711d [mips][microMIPS] Implement JR16 instruction
Differential Revision: http://reviews.llvm.org/D5062

llvm-svn: 219498
2014-10-10 14:02:44 +00:00
Zoran Jovanovic b26f889afa [mips][microMIPS] Implement ADDIUS5 instruction
Differential Revision: http://reviews.llvm.org/D5049

llvm-svn: 219495
2014-10-10 13:45:34 +00:00
Zoran Jovanovic b39a174f11 ps][microMIPS] Implement JRC instruction
Differential Revision: http://reviews.llvm.org/D5045

llvm-svn: 219494
2014-10-10 13:31:18 +00:00
Zoran Jovanovic 6097bad3f8 [mips][microMIPS] Implement JALRS16 instruction
Differential Revision: http://reviews.llvm.org/D5027

llvm-svn: 219493
2014-10-10 13:22:28 +00:00
David Majnemer b30753d731 Add tests for r219479.
llvm-svn: 219480
2014-10-10 06:59:05 +00:00
Arnold Schwaighofer d7d010eb2a SimplifyCFG: Don't convert phis into selects if we could remove undef behavior
instead

We used to transform this:

  define void @test6(i1 %cond, i8* %ptr) {
  entry:
    br i1 %cond, label %bb1, label %bb2

  bb1:
    br label %bb2

  bb2:
    %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

into this:

  define void @test6(i1 %cond, i8* %ptr) {
    %ptr.2 = select i1 %cond, i8* null, i8* %ptr
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

because the simplifycfg transformation into selects would happen to happen
before the simplifycfg transformation that removes unreachable control flow
(We have 'unreachable control flow' due to the store to null which is undefined
behavior).

The existing transformation that removes unreachable control flow in simplifycfg
is:

  /// If BB has an incoming value that will always trigger undefined behavior
  /// (eg. null pointer dereference), remove the branch leading here.
  static bool removeUndefIntroducingPredecessor(BasicBlock *BB)

Now we generate:

  define void @test6(i1 %cond, i8* %ptr) {
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

I did not see any impact on the test-suite + externals.

rdar://18596215

llvm-svn: 219462
2014-10-10 01:27:02 +00:00
David Majnemer efbe94890c obj2yaml, COFF: Handle long section names
Long section names are represented as a slash followed by a numeric
ASCII string.  This number is an offset into a string table.

Print the appropriate entry in the string table instead of the less
enlightening /4.

N.B.  yaml2obj already does the right thing, this test exercises both
sides of the (de-)serialization.

llvm-svn: 219458
2014-10-10 00:17:57 +00:00
Sanjay Patel 3d497cd778 Improve sqrt estimate algorithm (fast-math)
This patch changes the fast-math implementation for calculating sqrt(x) from:
y = 1 / (1 / sqrt(x))
to:
y = x * (1 / sqrt(x))

This has 2 benefits: less code / faster code and one less estimate instruction 
that may lose precision.

The only target that will be affected (until http://reviews.llvm.org/D5658 is approved)
is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf
or vector sqrtf and 4 less flops for a double-precision sqrt. 
We also eliminate a constant load and extra register usage.

Differential Revision: http://reviews.llvm.org/D5682

llvm-svn: 219445
2014-10-09 21:26:35 +00:00
Samuel Antao 1194b8fd40 Fix bug in GPR to FPR moves in PPC64LE.
The current implementation of GPR->FPR register moves uses a stack slot. This mechanism writes a double word and reads a word. In big-endian the load address must be displaced by 4-bytes in order to get the right value. In little endian this is no longer required. This patch fixes the issue and adds LE regression tests to fast-isel-conversion which currently expose this problem.

llvm-svn: 219441
2014-10-09 20:42:56 +00:00
Chad Rosier bd64d46188 [Reassociate] Don't canonicalize X - undef to X + (-undef).
Phabricator Revision: http://reviews.llvm.org/D5674
PR21205

llvm-svn: 219434
2014-10-09 20:06:29 +00:00
Hal Finkel cbbd3df836 Revert "[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information.""
This reverts commit r219135 -- still causing miscompiles in SPEC it seems...

llvm-svn: 219432
2014-10-09 19:48:12 +00:00
Tom Stellard 3457a8495a R600/SI: Legalize CopyToReg during instruction selection
The instruction emitter will crash if it encounters a CopyToReg
node with a non-register operand like FrameIndex.

llvm-svn: 219428
2014-10-09 19:06:00 +00:00
Tom Stellard 8dd392e135 R600/SI: Legalize INSERT_SUBREG instructions during PostISelFolding
LLVM assumes INSERT_SUBREG will always have register operands, so
we need to legalize non-register operands, like FrameIndexes, to
avoid random assertion failures.

llvm-svn: 219420
2014-10-09 18:09:15 +00:00
Bill Schmidt cb34fd09cd [PPC64] VSX indexed-form loads use wrong instruction format
The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x
incorrectly use the XForm_1 instruction format, rather than the
XX1Form instruction format.  This is likely a pasto when creating
these instructions, which were based on lvx and so forth.  This patch
uses the correct format.

The existing reformatting test (test/MC/PowerPC/vsx.s) missed this
because the two formats differ only in that XX1Form has an extension
to the target register field in bit 31.  The tests for these
instructions used a target register of 7, so the default of 0 in bit
31 for XForm_1 didn't expose a problem.  For register numbers 32-63
this would be noticeable.  I've changed the test to use higher
register numbers to verify my change is effective.

llvm-svn: 219416
2014-10-09 17:51:35 +00:00
Andrea Di Biagio 458a669f49 [InstCombine] Fix wrong folding of constant comparisons involving ashr and negative values.
This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we
wrongly computed the distance between the highest bits set of two negative
values.

This fixes PR21222.

Differential Revision: http://reviews.llvm.org/D5700

llvm-svn: 219406
2014-10-09 12:41:49 +00:00
Robert Khasanov d5b14f7994 [AVX512] Extended avx512_binop_rm for AVX512VL subsets.
Added avx512_binop_rm_vl multiclass for VL subset
Added encoding tests

llvm-svn: 219390
2014-10-09 08:38:48 +00:00
Adam Nemet 47b2d5f1e0 [AVX512] Intrinsics for vextract*x4
This adds the Pat<>'s for the intrinsics.  These are necessary because we
don't lower these intrinsics to SDNodes but match them directly.  See the
rational in the previous commit.

llvm-svn: 219362
2014-10-08 23:25:37 +00:00
Adam Nemet 2b5cdbb3de [AVX512] Add asm-only support for vextract*x4 masking variants
These derive from the new asm-only masking definitions.

Unfortunately I wasn't able to find a ISel pattern that we could legally
generate for the masking variants.  The problem is that since the destination
is v4* we would need VK4 register classes and v4i1 value types to express the
masking.  These are however not legal types/classes in AVX512f but only in VL,
so things get complicated pretty quickly.  We can revisit this question later
if we have a more pressing need to express something like this.

So the ISel patterns are empty for the masking instructions and the next patch
will add Pat<>s instead to match the intrinsics calls with instructions.

llvm-svn: 219361
2014-10-08 23:25:33 +00:00
Robin Morisset 6f3d04e4b6 [X86] Don't transform atomic-load-add into an inc/dec when inc/dec is slow
llvm-svn: 219357
2014-10-08 23:16:23 +00:00
Robin Morisset f9e8721564 [X86] Avoid generating inc/dec when slow for x.atomic_store(1 + x.atomic_load())
Summary:
I had forgotten to check for NotSlowIncDec in the patterns that can generate
inc/dec for the above pattern (added in D4796).
This currently applies to Atom Silvermont, KNL and SKX.

Test Plan: New checks on atomic_mi.ll

Reviewers: jfb, nadav

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5677

llvm-svn: 219336
2014-10-08 19:38:18 +00:00
David Majnemer ac07703842 Inliner: Non-local functions in COMDATs shouldn't be dropped
A function with discardable linkage cannot be discarded if its a member
of a COMDAT group without considering all the other COMDAT members as
well.  This sort of thing is already handled by GlobalOpt/GlobalDCE.

This fixes PR21206.

llvm-svn: 219335
2014-10-08 19:32:32 +00:00
Timur Iskhodzhanov 5fcaeebb72 Fix COFF section index relocation should be 16 bits, not 32
Original patch by Andrey Guskov!
http://reviews.llvm.org/D5651

llvm-svn: 219327
2014-10-08 18:01:49 +00:00
Rafael Espindola 8280fbbf9f Correctly compute the size of common symbols in COFF.
llvm-svn: 219324
2014-10-08 17:37:19 +00:00
Rafael Espindola 589b36aae7 Print symbol sizes in this test in preparation for fixing COFF common sizes.
llvm-svn: 219320
2014-10-08 17:19:42 +00:00
Justin Bogner 894eff7a9f Revert "[InstCombine] re-commit r218721 with fix for pr21199"
This seems to cause a miscompile when building clang, which causes a
bootstrapped clang to fail or crash in several of its tests.

See:
  http://lab.llvm.org:8013/builders/clang-x86_64-darwin11-RA/builds/1184
  http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/7813

This reverts commit r219282.

llvm-svn: 219317
2014-10-08 16:30:22 +00:00
Robert Khasanov b51bb22611 [AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VPCMP/VPCMPU{BWDQ}
Added CMP_MASK_CC intrinsic type.
Added tests for intrinsics.

Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com>

llvm-svn: 219316
2014-10-08 15:49:26 +00:00
Renato Golin 0595a26c25 Emit unaligned access build attribute for ARM
Patch by Charlie Turner.

llvm-svn: 219301
2014-10-08 12:26:22 +00:00
David Majnemer 1b3b70e371 GlobalOpt: Don't drop unused memberes of a Comdat
A linkonce_odr member of a COMDAT shouldn't be dropped if we need to
keep the entire COMDAT group.

This fixes PR21191.

llvm-svn: 219283
2014-10-08 07:23:31 +00:00
Gerolf Hoflehner e2ff5b9223 [InstCombine] re-commit r218721 with fix for pr21199
The icmp-select-icmp optimization targets select-icmp.eq
only. This is now ensured by testing the branch predicate
explictly. This commit also includes the test case for pr21199.

llvm-svn: 219282
2014-10-08 06:42:19 +00:00
David Majnemer d7586046ee COFF: Don't oversize COMMON symbols when targeting BFD ld
COFF normally doesn't allow us to describe the alignment of COMMON
symbols.

It turns out that most linkers use the symbol size as a hint as to how
aligned the symbol should be.

However the BFD folks have added a .drectve command, which we
now support as of r219229, that allows us to specify the alignment
precisely.  With this in mind, stop rounding sizes up.

llvm-svn: 219281
2014-10-08 06:38:53 +00:00
David Majnemer 5fbe324ff9 llvm-dwarfdump: Add support for some COFF relocations
DWARF in COFF utilizes several relocations.  Implement support for them
in RelocVisitor to support llvm-dwarfdump.

llvm-svn: 219280
2014-10-08 06:38:50 +00:00
Chad Rosier d9d0f86a79 [AArch64] Generate vector signed/unsigned mul and mla/mls long.
Phabricator Revision: http://reviews.llvm.org/D5589
Patch by Balaram Makam <bmakam@codeaurora.org>!!

llvm-svn: 219276
2014-10-08 02:31:24 +00:00
Rui Ueyama b67154d01b llvm-readobj: add test for r219228
llvm-svn: 219274
2014-10-08 02:06:11 +00:00