Commit Graph

35585 Commits

Author SHA1 Message Date
Teresa Johnson 832a6790f6 [ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly
Summary:
This change serializes out and in the SourceFileName to LLVM assembly
so that it is preserved through "llvm-dis | llvm-as". This is
necessary to ensure that the global identifiers created for local values
in the module summary index are the same even if the bitcode is
streamed out and read back from LLVM assembly.

Serializing the summary itself to LLVM assembly is in progress.

Reviewers: joker.eph

Subscribers: llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D18588

llvm-svn: 264869
2016-03-30 14:00:02 +00:00
Simon Pilgrim 9490b56a89 [X86][SSE] Test the legalization of vector comparison results
We are currently doing a REALLY bad job of packing results of vector comparisons into the legalized <X x i1> result equivalents - a mixture of PACKSS/PMOVMSKB would be much better here.

llvm-svn: 264867
2016-03-30 13:55:00 +00:00
Simon Pilgrim ab305a9d4c [X86][SSE] Added tests for clearing upper bits of vector elements
Patterns based on PR6455

llvm-svn: 264857
2016-03-30 11:43:26 +00:00
James Molloy 8e46cd05a1 [VectorUtils] Don't try and truncate PHIs to a smaller bitwidth
We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because both induction and reduction PHIs are truncated by other means.

However, we weren't bailing out in all the places we should have, and we ended up by returning a PHI to be truncated, which has caused PR27018.

This fixes PR17018.

llvm-svn: 264852
2016-03-30 10:11:43 +00:00
Chandler Carruth 8e06a10d1f [x86] Fix a horrible bug in our lowering of x86 floating point atomic
operations.

Specifically, we had code that tried to badly approximate reconstructing
all of the possible variations on addressing modes in two x86
instructions based on those in one pseudo instruction. This is not the
first bug uncovered with doing this, so stop doing it altogether.
Instead generically and pedantically copy every operand from the address
over to both new instructions, and strip kill flags from any register
operands.

This fixes a subtle bug seen in the wild where we would mysteriously
drop parts of the addressing mode, causing for example the index
argument in the added test case to just be completely ignored.

Hypothetically, this was an extremely bad miscompile because it actually
caused a predictable and leveragable write of a 64bit quantity to an
unintended offset (the first element of the array intead of whatever
other element was intended). As a consequence, in theory this could even
have introduced security vulnerabilities.

However, this was only something that could happen with an atomic
floating point add. No other operation could trigger this bug, so it
seems extremely unlikely to have occured widely in the wild.

But it did in fact occur, and frequently in scientific applications
which were using relaxed atomic updates of a floating point value after
adding a delta. Those would end up being quite badly miscompiled by
LLVM, which is how we found this. Of course, this often looks like
a race condition in the code, but it was actually a miscompile.

I suspect that this whole RELEASE_FADD thing was a complete mistake.
There is no such operation, and I worry that anything other than add
will get remarkably worse codegeneration. But that's not for this
change....

llvm-svn: 264845
2016-03-30 08:41:59 +00:00
George Burgess IV 49cad7d70b [MemorySSA] Make the visitor more careful with calls.
Prior to this patch, the MemorySSA caching visitor would cache all
calls that it visited. When paired with phi optimization, this can be
problematic. Consider:

define void @foo() {
  ; 1 = MemoryDef(liveOnEntry)
  call void @clobberFunction()
  br i1 undef, label %if.end, label %if.then

if.then:
  ; MemoryUse(??)
  call void @readOnlyFunction()
  ; 2 = MemoryDef(1)
  call void @clobberFunction()
  br label %if.end

if.end:
  ; 3 = MemoryPhi(...)
  ; MemoryUse(?)
  call void @readOnlyFunction()
  ret void
}

When optimizing MemoryUse(?), we visit defs 1 and 2, so we note to
cache them later. We ultimately end up not being able to optimize
passed the Phi, so we set MemoryUse(?) to point to the Phi. We then
cache the clobbering call for def 1 to be the Phi.

This commit changes this behavior so that we wipe out any calls
added to VisistedCalls while visiting the defs of a phi we couldn't
optimize.

Aside: With this patch, we now can bootstrap clang/LLVM without a
single MemorySSA verifier failure. Woohoo. :)

llvm-svn: 264820
2016-03-30 03:12:08 +00:00
Xinliang David Li a55fd1a9dc [PGO] Handle invoke inst in IR based icall instrumentation
Differential Revision: http://reviews.llvm.org/D18580

llvm-svn: 264818
2016-03-30 02:16:07 +00:00
George Burgess IV 82ee942a8c [MemorySSA] Change how the walker views/walks visited phis.
This patch teaches the caching MemorySSA walker a few things:

1. Not to walk Phis we've walked before. It seems that we tried to do
   this before, but it didn't work so well in cases like:

define void @foo() {
  %1 = alloca i8
  %2 = alloca i8
  br label %begin

begin:
  ; 3 = MemoryPhi({%0,liveOnEntry},{%end,2})
  ; 1 = MemoryDef(3)
  store i8 0, i8* %2
  br label %end

end:
  ; MemoryUse(?)
  load i8, i8* %1
  ; 2 = MemoryDef(1)
  store i8 0, i8* %2
  br label %begin
}

Because we wouldn't put Phis in Q.Visited until we tried to visit them.
So, when trying to optimize MemoryUse(?):
  - We would visit 3 above
    - ...Which would make us put {%0,liveOnEntry} in Q.Visited
    - ...Which would make us visit {%0,liveOnEntry}
    - ...Which would make us put {%end,2} in Q.Visited
    - ...Which would make us visit {%end,2}
      - ...Which would make us visit 3
        - ...Which would realize we've already visited everything in 3
        - ...Which would make us conservatively return 3.

In the added test-case, (@looped_visitedonlyonce) this behavior would
cause us to give incorrect results. Specifically, we'd visit 4 twice
in the same query, but on the second visit, we'd skip while.cond because
it had been visited, visit if.then/if.then2, and cache "1" as the
clobbering def on the way back.

2. If we try to walk the defs of a {Phi,MemLoc} and see it has been
   visited before, just hand back the Phi we're trying to optimize.

I promise this isn't as terrible as it seems. :)

We now insert {Phi,MemLoc} pairs just before walking the Phi's upward
defs. So, we check the cache for the {Phi,MemLoc} pair before checking
if we've already walked the Phi.

The {Phi,MemLoc} pair is (almost?) always guaranteed to have a cache
entry if we've already fully walked it, because we cache as we go.

So, if the {Phi,MemLoc} pair isn't in cache, either:
 (a) we must be in the process of visiting it (in which case, we can't
     give a better answer in a cache-as-we-go DFS walker)

 (b) we visited it, but didn't cache it on the way back (...which seems
     to require `ModifyingAccess` to not dominate `StartingAccess`,
     so I'm 99% sure that would be an error. If it's not an error, I
     haven't been able to get it to happen locally, so I suspect it's
     rare.)

- - - - -

As a consequence of this change, we no longer skip upward defs of phis,
so we can kill the `VisitedOnlyOne` check. This gives us better accuracy
than we had before, at the cost of potentially doing a bit more work
when we have a loop.

llvm-svn: 264814
2016-03-30 00:26:26 +00:00
Adam Nemet 1428d41f9a [LoopDataPrefetch] Centralize the tuning cl::opts under the pass
This is effectively NFC, minus the renaming of the options
(-cyclone-prefetch-distance -> -prefetch-distance).

The change was requested by Tim in D17943.

llvm-svn: 264806
2016-03-29 23:45:52 +00:00
Anna Zaks 1a470b6f7c [tsan] Do not instrument reads/writes to instruction profile counters.
We have known races on profile counters, which can be reproduced by enabling
-fsanitize=thread and -fprofile-instr-generate simultaneously on a
multi-threaded program. This patch avoids reporting those races by not
instrumenting the reads and writes coming from the instruction profiler.

llvm-svn: 264805
2016-03-29 23:19:40 +00:00
Duncan P. N. Exon Smith e8eb94a9a5 ADCE: Remove debug info intrinsics in dead scopes
During ADCE, track which debug info scopes still have live references
from the code, and delete debug info intrinsics for the dead ones.

These intrinsics describe the locations of variables (in registers or
stack slots).  If there's no code left corresponding to a variable's
scope, then there's no way to reference the variable in the debugger and
it doesn't matter what its value is.

I add a DEBUG printout when the described location in an SSA register,
in case it helps some trying to track down why locations get lost.
However, we still delete these; the scope itself isn't attached to any
real code, so the ship has already sailed.

llvm-svn: 264800
2016-03-29 22:57:12 +00:00
Adrian Prantl 4a09777b37 Upgrade some wildly anachronistic debug info in testcases.
llvm-svn: 264797
2016-03-29 22:34:30 +00:00
Sanjay Patel f48f7d74e2 use FileCheck and auto-check-generation script for exact checking
1. Removed the run line for mingw32 and made the Darwin triples unknown.
   This is a test of 32-bit vs. 64-bit platform and the underlying hardware.
   We have other tests for checking behavioral differences of the OS platform.

2. Changed the CPU specifiers to the attributes they were meant to represent.
   Any CPU that doesn't have SSE4.2 is assumed to have slow unaligned 16-byte accesses,
   so it won't use those here.
 
3. Although the stores really could all be CHECK-DAG, I left them as CHECK-NEXT to
   show the strange behavior of the instruction scheduler in the SLOW_32 case.

4. The odd-looking instructions are due to the use of a null pointer in the IR, so
   we have integer immediate store addresses. Cute.

llvm-svn: 264796
2016-03-29 22:27:39 +00:00
Kevin Enderby 2d431c3dba Fix some bugs in the posix output of llvm-nm. Which is documented on
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/nm.html .

1) For Mach-O files the code was not printing the values in hex as is the default.
2) The values printed had leading zeros which they should not have.
3) The address for undefined symbols was printed as spaces instead of 0.
4) With the -A option with posix output for an archive did not use square
brackets around the archive member name.

rdar://25311883 and rdar://25299678

llvm-svn: 264778
2016-03-29 20:18:07 +00:00
Ryan Govostes 23851940e5 Revert "[asan] Make the global_metadata_darwin.ll test require El Capitan or newer"
llvm-svn: 264764
2016-03-29 18:27:24 +00:00
Ryan Govostes 4fdc1f0a94 [asan] Make the global_metadata_darwin.ll test require El Capitan or newer
llvm-svn: 264758
2016-03-29 17:58:49 +00:00
Nirav Dave 2aab7f4358 Add support for no-jump-tables
Add function soft attribute to the generation of Jump Tables in CodeGen
as initial step towards clang support of gcc's no-jump-table support

Reviewers: hans, echristo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18321

llvm-svn: 264756
2016-03-29 17:46:23 +00:00
Manman Ren f46262e0b7 Swift Calling Convention: add swiftself attribute.
Differential Revision: http://reviews.llvm.org/D17866

llvm-svn: 264754
2016-03-29 17:37:21 +00:00
Sanjay Patel 7c61507991 [x86] add tests to show current memset codegen
llvm-svn: 264748
2016-03-29 17:09:27 +00:00
Sanjay Patel 32de9628e3 regenerate checks
llvm-svn: 264738
2016-03-29 16:11:29 +00:00
Junmo Park 32ddc544c3 fix CHECK_NOT -> CHECK-NOT
llvm-svn: 264706
2016-03-29 07:53:07 +00:00
Junmo Park e389f8d961 fixed typo - CHECK-LABEL
llvm-svn: 264704
2016-03-29 07:03:27 +00:00
Elena Demikhovsky 0053eb9479 fixed typo - CHECK-LABEL
llvm-svn: 264702
2016-03-29 06:49:38 +00:00
Elena Demikhovsky 95629caaa9 AVX-512: fixed a bug in fp_to_uint pattern on KNL
Fixed fp_to_uint instruction selection on KNL.
One pattern was missing for <4 x double> to <4 x i32>

Differential Revision: http://reviews.llvm.org/D18512

llvm-svn: 264701
2016-03-29 06:33:41 +00:00
Duncan P. N. Exon Smith bb7ce3b850 BitcodeReader: Allow METADATA_STRINGS to only have !""
Support parsing a METADATA_STRINGS record that only has a single piece
of metadata, !"".  Fixes a corner case in r264551.

llvm-svn: 264699
2016-03-29 05:25:17 +00:00
Hyojin Sung 4673f10568 [SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops
When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes
is currently used to recognize potential loops of which the block is the header and keep the block.
However, the current algorithm fails if the loops' exit condition is evaluated only with volatile
values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested
loop, the loop is collapsed into a single loop which prevent later optimizations from being
applied (e.g., transforming nested loops into simplified forms and loop vectorization).
    
The patch augments the existing PHI node-based check by adding a pre-test if the BB actually
belongs to a set of loop headers and not eliminating it if yes.

llvm-svn: 264697
2016-03-29 04:08:57 +00:00
Hal Finkel fa7057a415 [PowerPC] Refactor popcnt[dw] target features
Instead of using two feature bits, one to indicate the availability of the
popcnt[dw] instructions, and another to indicate whether or not they're fast,
use a single enum. This allows more consistent control via target attribute
strings, and via Clang's command line.

llvm-svn: 264690
2016-03-29 01:36:01 +00:00
Kyle Butt 5e241b11ed [Codegen] Decrease minimum jump table density.
Minimum density for both optsize and non optsize are now options
-sparse-jump-table-density (default 10) for non optsize functions
-dense-jump-table-density (default 40) for optsize functions, which
matches the current default. This improves several benchmarks at google
at the cost of a small codesize increase. For code compiled with -Os,
the old behavior continues

llvm-svn: 264689
2016-03-29 00:23:41 +00:00
Sanjay Patel 3e9664fd60 regenerate checks
llvm-svn: 264677
2016-03-28 22:12:21 +00:00
Sanjay Patel 1f867c6f9c fix checks: *_DAG -> *-DAG
llvm-svn: 264676
2016-03-28 22:11:06 +00:00
Vedant Kumar 32ca438c57 [Coverage] Fix the expected counts in instrprof-comdat.h
llvm-svn: 264675
2016-03-28 22:10:40 +00:00
Sanjay Patel 53f9ae288a fix CHECK_NEXT -> CHECK-NEXT
llvm-svn: 264674
2016-03-28 22:03:07 +00:00
Sanjay Patel d9ebff8bab fix CHECK_DAG -> CHECK-DAG
llvm-svn: 264673
2016-03-28 22:00:38 +00:00
Sanjay Patel 52fe9d0efe fix CHECK_NEXT -> CHECK-NEXT
llvm-svn: 264672
2016-03-28 21:58:27 +00:00
Sanjay Patel 8f9e26964a fix CHECK_LABEL -> CHECK-LABEL
llvm-svn: 264671
2016-03-28 21:56:48 +00:00
Sanjay Patel a1e5ef624c trailing whitespace
llvm-svn: 264670
2016-03-28 21:52:53 +00:00
Evgeniy Stepanov f575b2687c Remove personality for declarations in CloneModule.
Personality is copied as part of copyFunctionAttributes, but it is
invalid on a declaration. Remove the personality attribute it the
function body is not cloned.

Also add a verifier run over output modules in the llvm-split tool.

llvm-svn: 264667
2016-03-28 21:37:02 +00:00
Simon Pilgrim d3df400fa9 [X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op for all their scalar elements.
If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors.

The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent.

Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least.

Differential Revision: http://reviews.llvm.org/D18492

llvm-svn: 264666
2016-03-28 21:33:52 +00:00
Sanjay Patel 7092de4cd2 fix CHECK_NEXT -> CHECK-NEXT
llvm-svn: 264661
2016-03-28 21:14:24 +00:00
Vedant Kumar 86705ba5b1 Reapply (2x) "[PGO] Fix name encoding for ObjC-like functions"
Function names in ObjC can have spaces in them. This interacts poorly
with name compression, which uses spaces to separate PGO names. Fix the
issue by using a different separator and update a test.

I chose "\01" as the separator because 1) it's non-printable, 2) we
strip it from PGO names, and 3) it's the next natural choice once "\00"
is discarded (that one's overloaded).

What's changed since the original commit?

- I fixed up the covmap-V2 binary format tests using a linux VM.
- I weakened the CHECK lines in instrprof-comdat.h to account for the
  fact that there have been bugfixes to clang coverage. These will be
  fixed up in a follow-up.
- I added an assert to make sure we don't get bitten by this again.
- I constructed the c-general.profraw file without name compression
  enabled to appease some bots.

Differential Revision: http://reviews.llvm.org/D18516

llvm-svn: 264658
2016-03-28 21:06:42 +00:00
Adrian Prantl faebbb053d Add an IR Verifier check for orphaned DICompileUnits.
A DICompileUnit that is not listed in llvm.dbg.cu will cause assertion
failures and/or crashes in the backend. The Verifier should reject this.

rdar://problem/25369499

llvm-svn: 264657
2016-03-28 21:06:26 +00:00
Adam Nemet 64fa75391c [LVers] Change CHECK_LABEL to CHECK-LABEL (underscore->dash)
Thanks to Sanjoy for catching this.

llvm-svn: 264656
2016-03-28 21:04:13 +00:00
Ryan Govostes d1268bd8db [asan] Fix testcase for r264645
llvm-svn: 264652
2016-03-28 20:42:56 +00:00
Evgeniy Stepanov a023f79db1 Handle section vs global name conflict.
This is a fix for PR26941.

When there is both a section and a global definition with the same
name, the global wins.

Section symbols are not added to the symbol table; section references
are left undefined and fixed up in the object writer unless they've
been satisfied by some other definition.

llvm-svn: 264649
2016-03-28 20:36:28 +00:00
Ryan Govostes 653f9d0273 [asan] Support dead code stripping on Mach-O platforms
On OS X El Capitan and iOS 9, the linker supports a new section
attribute, live_support, which allows dead stripping to remove dead
globals along with the ASAN metadata about them.

With this change __asan_global structures are emitted in a new
__DATA,__asan_globals section on Darwin.

Additionally, there is a __DATA,__asan_liveness section with the
live_support attribute. Each entry in this section is simply a tuple
that binds together the liveness of a global variable and its ASAN
metadata structure. Thus the metadata structure will be alive if and
only if the global it references is also alive.

Review: http://reviews.llvm.org/D16737
llvm-svn: 264645
2016-03-28 20:28:57 +00:00
Vedant Kumar 476a94d9ef Revert "Reapply "[PGO] Fix name encoding for ObjC-like functions""
This reverts commit r264641 to investigate why c-general.test is failing
on the bots.

llvm-svn: 264643
2016-03-28 20:20:40 +00:00
Vedant Kumar f20b6cec1c Reapply "[PGO] Fix name encoding for ObjC-like functions"
Function names in ObjC can have spaces in them. This interacts poorly
with name compression, which uses spaces to separate PGO names. Fix the
issue by using a different separator and update a test.

I chose "\01" as the separator because 1) it's non-printable, 2) we
strip it from PGO names, and 3) it's the next natural choice once "\00"
is discarded (that one's overloaded).

This reverts the revert commit beaf3d18. What's changed?

- I fixed up the covmap-V2 binary format tests using a linux VM.
- I updated the expected counts in instrprof-comdat.h to account for
  the fact that there have been bugfixes to clang coverage.
- I added an assert to make sure we don't get bitten by this again.

Differential Revision: http://reviews.llvm.org/D18516

llvm-svn: 264641
2016-03-28 20:12:07 +00:00
Matthias Braun b74eb41d58 MIRParser: Add %subreg.xxx syntax for subregister index operands
Differential Revision: http://reviews.llvm.org/D18279

llvm-svn: 264608
2016-03-28 18:18:46 +00:00
Matthias Braun 2bd8eeb6b7 CodeGen: Correct specification of PHI nodes
They do have a def machine operand.

Fixing the definition is necessary for an upcoming patch.

Differential Revision: http://reviews.llvm.org/D18384

llvm-svn: 264607
2016-03-28 18:18:41 +00:00
Haicheng Wu 6a6bc750d5 [AArch64] Do not lower scalar sdiv/udiv to a shifts + mul sequence when optimizing for minsize
Mimic what x86 does when optimizing sdiv/udiv for minsize.

llvm-svn: 264606
2016-03-28 18:17:07 +00:00
Reid Kleckner ba85781f58 Revert "[SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops"
This reverts commit r264596.

It does not compile.

llvm-svn: 264604
2016-03-28 18:07:40 +00:00
Hal Finkel 7059d41622 [PowerPC] On the A2, popcnt[dw] are very slow
The A2 cores support the popcntw/popcntd instructions, but they're microcoded,
and slower than our default software emulation. Specifically, popcnt[dw] take
approximately 74 cycles, whereas our software emulation takes only 24-28
cycles.

I've added a new target feature to indicate a slow popcnt[dw], instead of just
removing the existing target feature from the a2/a2q processor models, because:
  1. This allows us to return more accurate information via the TTI interface
     (I recognize that this currently makes no practical difference)
  2. Is hopefully easier to understand (it allows the core's features to match
     its manual while still having the desired effect).

llvm-svn: 264600
2016-03-28 17:52:08 +00:00
Hyojin Sung 0ada5b0d14 [SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops
When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes
is currently used to recognize potential loops of which the block is the header and keep the block.
However, the current algorithm fails if the loops' exit condition is evaluated only with volatile
values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested
loop, the loop is collapsed into a single loop which prevent later optimizations from being 
applied (e.g., transforming nested loops into simplified forms and loop vectorization).

The patch augments the existing PHI node-based check by adding a pre-test if the BB actually 
belongs to a set of loop headers and not eliminating it if yes. 

llvm-svn: 264596
2016-03-28 17:22:25 +00:00
Derek Schuff ad154c837e Introduce MachineFunctionProperties and the AllVRegsAllocated property
MachineFunctionProperties represents a set of properties that a MachineFunction
can have at particular points in time. Existing examples of this idea are
MachineRegisterInfo::isSSA() and MachineRegisterInfo::tracksLiveness() which
will eventually be switched to use this mechanism.
This change introduces the AllVRegsAllocated property; i.e. the property that
all virtual registers have been allocated and there are no VReg operands
left.

With this mechanism, passes can declare that they require a particular property
to be set, or that they set or clear properties by implementing e.g.
MachineFunctionPass::getRequiredProperties(). The MachineFunctionPass base class
verifies that the requirements are met, and handles the setting and clearing
based on the delcarations. Passes can also directly query and update the current
properties of the MF if they want to have conditional behavior.

This change annotates the target-independent post-regalloc passes; future
changes will also annotate target-specific ones.

Reviewers: qcolombet, hfinkel

Differential Revision: http://reviews.llvm.org/D18421

llvm-svn: 264593
2016-03-28 17:05:30 +00:00
Hemant Kulkarni 274457e5e8 [llvm-size] Implement --common option
Differential Revision: http://reviews.llvm.org/D16820

llvm-svn: 264591
2016-03-28 16:48:10 +00:00
Tom Stellard a76bcc2ea1 AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions
Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together.  The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.

This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.

shader-db stats:

2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18451

llvm-svn: 264589
2016-03-28 16:10:13 +00:00
Davide Italiano 6db1dcbf6b [SimplifyLibCalls] Transform printf("%s", "a") -> putchar('a').
llvm-svn: 264588
2016-03-28 15:54:01 +00:00
Krzysztof Parzyszek 2d65ea74dc [Hexagon] Improve handling of unaligned vector loads and stores
llvm-svn: 264584
2016-03-28 15:43:03 +00:00
Krzysztof Parzyszek bb63f66686 [Hexagon] Only use restore functions for single register at -Oz
llvm-svn: 264581
2016-03-28 14:52:21 +00:00
Douglas Katzman d0c11cf7ad Sparc: silently ignore .proc assembler directive
Differential Revision: http://reviews.llvm.org/D18463

llvm-svn: 264579
2016-03-28 14:00:11 +00:00
Jacques Pienaar fcef3e4617 [lanai] Add Lanai backend.
Add the Lanai backend to lib/Target.

General Lanai backend discussion on llvm-dev thread "[RFC] Lanai backend" (http://lists.llvm.org/pipermail/llvm-dev/2016-February/095118.html).

Differential Revision: http://reviews.llvm.org/D17011

llvm-svn: 264578
2016-03-28 13:09:54 +00:00
Chuang-Yu Cheng d5eb774eb6 [Power9] Implement new altivec instructions: bcd* series
This patch implements the following altivec instructions:

- Decimal Convert From/to National/Zoned/Signed-QWord:
    bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq.

- Decimal Copy-Sign/Set-Sign:
    bcdcpsgn. bcdsetsgn.

- Decimal Shift/Unsigned-Shift/Shift-and-Round:
    bcds. bcdus. bcdsr.

- Decimal (Unsigned) Truncate:
    bcdtrunc. bcdutrunc.

Total 13 instructions

Thanks Amehsan's advice! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D17838

llvm-svn: 264568
2016-03-28 09:04:23 +00:00
Chuang-Yu Cheng 80722719eb [Power9] Implement new vsx instructions: insert, extract, test data class, min/max, reverse, permute, splat
This change implements the following vsx instructions:

- Scalar Insert/Extract
    xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp

- Vector Insert/Extract
    xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp
    xxextractuw xxinsertw

- Scalar/Vector Test Data Class
    xststdcdp xststdcsp xststdcqp
    xvtstdcdp xvtstdcsp

- Maximum/Minimum
    xsmaxcdp xsmaxjdp
    xsmincdp xsminjdp

- Vector Byte-Reverse/Permute/Splat
    xxbrd xxbrh xxbrq xxbrw
    xxperm xxpermr
    xxspltib

30 instructions

Thanks Nemanja for invaluable discussion! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D16842

llvm-svn: 264567
2016-03-28 08:34:28 +00:00
Elena Demikhovsky 83f0647d85 AVX-512: Fixed ICMP instruction selection for i1 operands
ICMP instruction selection fails on SKX and KNL for i1 operand.
I use XOR to resolve:
(A == B) is equivalent to (A xor B) == 0

Differential Revision: http://reviews.llvm.org/D18511

llvm-svn: 264566
2016-03-28 07:47:58 +00:00
Chuang-Yu Cheng 5663848996 [Power9] Implement new vsx instructions: quad-precision move, fp-arithmetic
This change implements the following vsx instructions:

- quad-precision move
    xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp

- quad-precision fp-arithmetic
    xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o)
    xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o)

22 instructions

Thanks Nemanja and Kit for careful review and invaluable discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D16110

llvm-svn: 264565
2016-03-28 07:38:01 +00:00
NAKAMURA Takumi a51d6ea990 llvm/test/Transforms/FunctionImport/funcimport.ll: -stats REQUIRES +Asserts.
llvm-svn: 264561
2016-03-28 02:14:49 +00:00
Duncan P. N. Exon Smith 6565a0d4b2 Reapply ~"Bitcode: Collect all MDString records into a single blob"
Spiritually reapply commit r264409 (reverted in r264410), albeit with a
bit of a redesign.

Firstly, avoid splitting the big blob into multiple chunks of strings.

r264409 imposed an arbitrary limit to avoid a massive allocation on the
shared 'Record' SmallVector.  The bug with that commit only reproduced
when there were more than "chunk-size" strings.  A test for this would
have been useless long-term, since we're liable to adjust the chunk-size
in the future.

Thus, eliminate the motivation for chunk-ing by storing the string sizes
in the blob.  Here's the layout:

    vbr6: # of strings
    vbr6: offset-to-blob
    blob:
       [vbr6]: string lengths
       [char]: concatenated strings

Secondly, make the output of llvm-bcanalyzer readable.

I noticed when debugging r264409 that llvm-bcanalyzer was outputting a
massive blob all in one line.  Past a small number, the strings were
impossible to split in my head, and the lines were way too long.  This
version adds support in llvm-bcanalyzer for pretty-printing.

    <STRINGS abbrevid=4 op0=3 op1=9/> num-strings = 3 {
      'abc'
      'def'
      'ghi'
    }

From the original commit:

Inspired by Mehdi's similar patch, http://reviews.llvm.org/D18342, this
should (a) slightly reduce bitcode size, since there is less record
overhead, and (b) greatly improve reading speed, since blobs are super
cheap to deserialize.

llvm-svn: 264551
2016-03-27 23:17:54 +00:00
Duncan P. N. Exon Smith 456c9968e5 Support: Implement StreamingMemoryObject::getPointer
The implementation is fairly obvious.  This is preparation for using
some blobs in bitcode.

For clarity (and perhaps future-proofing?), I moved the call to
JumpToBit in BitstreamCursor::readRecord ahead of calling
MemoryObject::getPointer, since JumpToBit can theoretically (a) read
bytes, which (b) invalidates the blob pointer.

This isn't strictly necessary the two memory objects we have:

  - The return of RawMemoryObject::getPointer is valid until the memory
    object is destroyed.

  - StreamingMemoryObject::getPointer is valid until the next chunk is
    read from the stream.  Since the JumpToBit call is only going ahead
    to a word boundary, we'll never load another chunk.

However, reordering makes it clear by inspection that the blob returned
by BitstreamCursor::readRecord will be valid.

I added some tests for StreamingMemoryObject::getPointer and
BitstreamCursor::readRecord.

llvm-svn: 264549
2016-03-27 23:00:59 +00:00
Teresa Johnson 569af59b14 Use DAG check to try to appease bot
Try to appease
http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/34772. This was
the only check that didn't use DAG and it wasn't found.

llvm-svn: 264538
2016-03-27 15:36:43 +00:00
Teresa Johnson d29478f70e [ThinLTO] Add optional import message and statistics
Summary:
Add a statistic to count the number of imported functions. Also, add a
new -print-imports option to emit a trace of imported functions, that
works even for an NDEBUG build.

Note that emitOptimizationRemark does not work for the above printing as
it expects a Function object and DebugLoc, neither of which we have
with summary-based importing.

This is part 2 of D18487, the first part was committed separately as
r264536.

Reviewers: joker.eph

Subscribers: llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D18487

llvm-svn: 264537
2016-03-27 15:27:30 +00:00
Hal Finkel 0b37175ca6 [PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based loop legality
Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc
function calls (fmax[f], etc.) generally map to function calls after lowering.
For some vector types with QPX at least, however, we can legally lower these,
and we don't need to prohibit CTR-based loops on their account.

It turned out, however, that the logic that checked the opcodes associated with
intrinsics was broken (it would set the Opcode variable, but that variable was
later checked only if set for some otherwise-external function call.

This fixes the latter problem and adds the FMAX/MINNUM mappings.

llvm-svn: 264532
2016-03-27 05:40:56 +00:00
Michael Kruse ff379b69b2 [Verifier] Reject PHIs using defs from own block.
Reject the following IR as malformed (assuming that %entry, %next are
not in a loop):

    next:
      %y = phi i32 [ 0, %entry ]
      %x = phi i32 [ %y, %entry ]

Such PHI nodes came up in PR26718. While there was no consensus on
whether or not this is valid IR, most opinions on that bug and in a
discussion on the llvm-dev mailing list tended towards a
"strict interpretation" (term by Joseph Tremoulet) of PHI node uses.
Also, the language reference explicitly states that "the use of each
incoming value is deemed to occur on the edge from the corresponding
predecessor block to the current block" and
`DominatorTree::dominates(Instruction*, Use&)` uses this definition as
well.

For the code mentioned in PR15384, clang does not compile to such PHIs
(anymore?). The test case still hangs when replacing `%tmp6` with `%tmp`
in revisions before r176366 (where PR15384 has been fixed). The
occurrence of %tmp6 therefore was probably unintentional. Its value is
not used except in other PHIs.

Reviewers: majnemer, reames, JosephTremoulet, bkramer, grosser, jdoerfert, kparzysz, sanjoy

Differential Revision: http://reviews.llvm.org/D18443

llvm-svn: 264528
2016-03-26 23:32:57 +00:00
Sanjay Patel 796db35f62 [SimplifyCFG] propagate branch metadata when creating select (PR26636)
llvm-svn: 264527
2016-03-26 23:30:50 +00:00
Sanjay Patel 342f7c7e10 minimize test cases
These are tests for store transforms. 
The loads, adds, and geps were irrelevant.

llvm-svn: 264526
2016-03-26 23:09:25 +00:00
David Blaikie 4dd03f0e12 llvm-dwp: Include the dwo name (if available) when diagnosing duplicate CU IDs from dwp input files
If you're building dwps from other dwps, it can be hard to track down a
duplicate CU ID if it comes from two compilations of the same file in
different modes, etc. By including the .dwo path (which is hopefully
more unique than the file path) it can help track down where the
duplicates came from.

llvm-svn: 264520
2016-03-26 20:32:14 +00:00
Simon Pilgrim dcdf85033c [X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targets
Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization

llvm-svn: 264517
2016-03-26 18:32:13 +00:00
Simon Pilgrim e4dbeb40c6 [X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targets
Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization

Differential Revision: http://reviews.llvm.org/D18307

llvm-svn: 264512
2016-03-26 15:44:55 +00:00
Simon Pilgrim 3eef33a806 [X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors
Currently this is to mainly to prevent scalarization of integer division by constants.

Differential Revision: http://reviews.llvm.org/D18307

llvm-svn: 264511
2016-03-26 15:27:20 +00:00
Simon Pilgrim 7b36cdaecf [X86][SSE] Added v64i8 vector integer multiply tests
llvm-svn: 264510
2016-03-26 09:50:06 +00:00
Simon Pilgrim 7379a70677 [X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 multiplies.
Only pre-AVX512BW targets need to split v32i8 vectors.

llvm-svn: 264509
2016-03-26 09:44:27 +00:00
David Majnemer b549ab02b4 [PowerPC] Disable the CTR optimization in the presence of {min,max}num
The minnum and maxnum intrinsics get lowered to libcalls which
invalidates the CTR optimization.

This fixes PR27083.

llvm-svn: 264508
2016-03-26 09:42:31 +00:00
Simon Pilgrim 9a5f19f509 [X86][SSE] Refreshed vector integer multiply tests
Add all 256-bit vector tests.
Added AVX512F/AVX512BW test targets.
Renamed tests something more meaningful.

llvm-svn: 264507
2016-03-26 09:35:48 +00:00
Chuang-Yu Cheng 065969ec8e [Power9] Implement new altivec instructions: permute, count zero, extend sign, negate, parity, shift/rotate, mul10
This change implements the following vector operations:
- vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw
- vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d
- vnegd vnegw
- vprtybd vprtybq vprtybw
- vbpermd vpermr
- vrlwnm vrlwmi vrldnm vrldmi vslv vsrv
- vmul10cuq vmul10uq vmul10ecuq vmul10euq

28 instructions

Thanks Nemanja, Kit for invaluable hints and discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

Phabricator: http://reviews.llvm.org/D15887
llvm-svn: 264504
2016-03-26 05:46:11 +00:00
Mehdi Amini 01e321306b ThinLTO: use the callgraph from the combined index to drive the FunctionImporter
Summary:
Now that the summary contains the full reference/call graph, we can
replace the existing function importer that loads and inspect the IR
to iteratively walk the call graph by a traversal based purely on the
summary information. Decouple the actual importing decision from any
IR manipulation.

Reviewers: tejohnson

Subscribers: llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D18343

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264503
2016-03-26 05:40:34 +00:00
Richard Smith 1fd6d1fd36 Stop testing the unspecified order in which the OnDiskHashTable stores entries.
llvm-svn: 264487
2016-03-26 02:02:59 +00:00
Philip Reames b5681138e4 Allow value forwarding past release fences in GVN
A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-load forwarding across a release fence.  

We choose to be much more conservative about stores.  In theory, nothing prevents us from shifting a store from after a release fence to before it, and then eliminating the preceeding (previously fenced) store.  Doing this without actually moving the second store is likely also legal, but we chose to be conservative at this time.

The LangRef indicates only atomic loads and stores are effected by fences. This patch chooses to be far more conservative then that. 

This is the GVN companion to http://reviews.llvm.org/D11434 which applied the same logic in EarlyCSE and has been baking in tree for a while now.

Differential Revision: http://reviews.llvm.org/D11436

llvm-svn: 264472
2016-03-25 22:40:35 +00:00
David Majnemer 020e890a19 [X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddr
We forgot to add the second machine operand to our ADJCALLSTACKDOWN,
resulting in crashes in PEI.

This fixes PR27071.

llvm-svn: 264465
2016-03-25 21:49:11 +00:00
Jun Bum Lim 36c53fe147 [MachineCopyPropagation] Expose more dead copies across instructions with regmasks
When encountering instructions with regmasks, instead of cleaning up all the
elements in MaybeDeadCopies map, remove only the instructions erased. By keeping
more instruction in MaybeDeadCopies, this change will expose more dead copies
across instructions with regmasks.

llvm-svn: 264462
2016-03-25 21:15:35 +00:00
Nirav Dave fa250cad37 Prevent construction of cycle in DAG store merge
When merging stores in DAGCombiner, add check to ensure that no
dependenices exist that would cause the construction of a cycle in our
DAG.  This may happen if one store has a data dependence on another
instruction (e.g. a load) which itself has a (chain) dependence on
another store being merged. These stores cannot be merged safely and
doing so results in a cycle that is discovered in LegalizeDAG.

This test is only done in cases where Antialias analysis is used (UseAA)
as non-AA store merge candidates will be merged logically after all
loads which have been checked to not alias.

Reviewers: ahatanak, spatel, niravd, arsenm, hfinkel, tstellarAMD, jyknight

Subscribers: llvm-commits, tberghammer, danalbert, srhines

Differential Revision: http://reviews.llvm.org/D18336

llvm-svn: 264461
2016-03-25 21:06:30 +00:00
Sanjay Patel 69632447b1 [InstSimplify] regenerate checks using a script
I didn't notice any significant changes in the actual checks here;
all of these tests already used FileCheck, so a script can batch
update them in one shot.

This commit is just to show the value of automating this process: 
We have uniform formatting as opposed to a mish-mash of check
structure that changes based on individual prefs and the current
fashion. This makes it simpler to update when we find a bug or
make an enhancement.

llvm-svn: 264457
2016-03-25 20:12:25 +00:00
Sanjoy Das d4c783335b [RS4GC] Lower calls to @llvm.experimental.deoptimize
This changes RS4GC to lower calls to ``@llvm.experimental.deoptimize``
to gc.statepoints wrapping ``__llvm_deoptimize``, and changes
``callsGCLeafFunction`` to recognize ``@llvm.experimental.deoptimize``
as a non GC leaf function.

I've had to hard code the ``"__llvm_deoptimize"`` name in
RewriteStatepointsForGC; since ``TargetLibraryInfo`` is available only
during codegen.  This isn't without precedent in the codebase, so I'm
not overtly concerned.

llvm-svn: 264456
2016-03-25 20:12:13 +00:00
Saleem Abdulrasool 750a90df6a ARM: maintain BB ordering when expanding WIN__DBZCHK
It is possible to have a fallthrough MBB prior to MBB placement.  The original
addition of the BB would result in reordering the BB as not preceding the
successor.  Because of the fallthrough nature of the BB, we could end up
executing incorrect code or even a constant pool island!  Insert the spliced BB
into the same location to avoid that.

Thanks to Tim Northover for invaluable hints and Fiora for the discussion on
what may have been occurring!

llvm-svn: 264454
2016-03-25 19:48:06 +00:00
Hans Wennborg 5f916d3df4 [X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsize
64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes,
respectively, whereas and/or with 8-bit immediate is only three bytes.

Since these instructions imply an additional memory read (which the CPU could
elide, but we don't think it does), restrict these patterns to minsize functions.

Differential Revision: http://reviews.llvm.org/D18374

llvm-svn: 264440
2016-03-25 18:11:31 +00:00
Sanjay Patel cd7d3ae7cc [InstCombine] use FileCheck for better checking
(testing script for autogeneration of check lines)

llvm-svn: 264438
2016-03-25 18:03:40 +00:00
Sanjay Patel d3d1179463 [InstCombine] use FileCheck for better checking
(testing script for autogeneration of check lines)

llvm-svn: 264437
2016-03-25 18:03:17 +00:00
Sanjay Patel 721fec09b5 [InstCombine] use FileCheck for better checking
(testing script for autogeneration of check lines)

llvm-svn: 264435
2016-03-25 18:03:01 +00:00
Sanjay Patel 1395cf0d3c [InstCombine] use FileCheck for better checking
(testing script for autogeneration of check lines)

llvm-svn: 264434
2016-03-25 18:02:14 +00:00
Sanjay Patel bfbac177d2 [InstCombine] use FileCheck for better checking
(testing script for autogeneration of check lines)

llvm-svn: 264433
2016-03-25 18:01:55 +00:00
Sanjay Patel 08da4b7cd8 [InstCombine] use FileCheck for better checking
(testing script for autogeneration of check lines)

llvm-svn: 264432
2016-03-25 18:01:37 +00:00
Sanjay Patel 8f22390137 [InstCombine] use FileCheck for better checking
(testing script for autogeneration of check lines)

llvm-svn: 264431
2016-03-25 18:01:23 +00:00
Sanjay Patel 5270746978 [InstCombine] use FileCheck for better checking
(testing script for autogeneration of check lines)

llvm-svn: 264430
2016-03-25 18:01:04 +00:00
Reid Kleckner f6f04f8fc8 Consider regmasks when computing register-based DBG_VALUE live ranges
Now register parameters that aren't saved to the stack or CSRs are
considered dead after the first call. Previously the debugger would show
whatever was in the register.

Fixes PR26589

Reviewers: aprantl

Differential Revision: http://reviews.llvm.org/D17211

llvm-svn: 264429
2016-03-25 17:54:46 +00:00
Sanjay Patel 246e7f7057 [InstCombine] consolidate regression tests of the ancients (2002)
Testing out the check-generator-script that's now in the utils folder.

llvm-svn: 264424
2016-03-25 17:16:32 +00:00
Adrian Prantl 5979790e42 Document the purpose of this testcase.
llvm-svn: 264421
2016-03-25 16:49:57 +00:00
Hemant Kulkarni 966b3ac502 [llvm-readobj] Impl GNU style program headers print
readelf -lW

Differential Revision: http://reviews.llvm.org/D18372

llvm-svn: 264415
2016-03-25 16:04:48 +00:00
Duncan P. N. Exon Smith fc8110041f Revert "Bitcode: Collect all MDString records into a single blob"
This reverts commit r264409 since it failed to bootstrap:
http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build/8302/

llvm-svn: 264410
2016-03-25 15:22:27 +00:00
Duncan P. N. Exon Smith fdbf0a5af8 Bitcode: Collect all MDString records into a single blob
Optimize output of MDStrings in bitcode.  This emits them in big blocks
(currently 1024) in a pair of records:
  - BULK_STRING_SIZES: the sizes of the strings in the block, and
  - BULK_STRING_DATA: a single blob, which is the concatenation of all
    the strings.

Inspired by Mehdi's similar patch, http://reviews.llvm.org/D18342, this
should (a) slightly reduce bitcode size, since there is less record
overhead, and (b) greatly improve reading speed, since blobs are super
cheap to deserialize.

I needed to add support for blobs to streaming input to get the test
suite passing.
  - StreamingMemoryObject::getPointer reads ahead and returns the
    address of the blob.
  - To avoid a possible reallocation of StreamingMemoryObject::Bytes,
    BitstreamCursor::readRecord needs to move the call to JumpToEnd
    forward so that getPointer is the last bitstream operation.

llvm-svn: 264409
2016-03-25 14:40:18 +00:00
David L Kreitzer 8d441eb936 Enable non-power-of-2 #pragma unroll counts.
Patch by Evgeny Stupachenko.

Differential Revision: http://reviews.llvm.org/D18202

llvm-svn: 264407
2016-03-25 14:24:52 +00:00
Matt Arsenault 8c8fcb2585 AMDGPU: Cost model for basic integer operations
This resolves bug 21148 by preventing promotion to
i64 induction variables.

llvm-svn: 264376
2016-03-25 01:16:40 +00:00
Hans Wennborg 4ae5119eeb X86: Use push-pop for materializing 8-bit immediates for minsize (take 2)
This is the same as r255936, with added logic for avoiding clobbering of the
red zone (PR26023).

Differential Revision: http://reviews.llvm.org/D18246

llvm-svn: 264375
2016-03-25 01:10:56 +00:00
Matt Arsenault 9651813ee0 AMDGPU: Partially implement getArithmeticInstrCost for FP ops
llvm-svn: 264374
2016-03-25 01:00:32 +00:00
Duncan P. N. Exon Smith efe16c8eb4 IR: Stop upgrading !llvm.loop attachments via MDString
Remove logic to upgrade !llvm.loop by changing the MDString tag
directly.  This old logic would check (and change) arbitrary strings
that had nothing to do with loop metadata.  Instead, check !llvm.loop
attachments directly, and change which strings get attached.

Rather than updating the assembly-based upgrade, drop it entirely.  It
has been quite a while since we supported upgrading textual IR.

llvm-svn: 264373
2016-03-25 00:56:13 +00:00
Saleem Abdulrasool 0dab98d926 ARM: fix optimised division on WoA
We did not have an explicit branch to the continuation BB.  When the check was
hoisted, this could permit control follow to fall through into the division
trap.  Add the explicit branch to the continuation basic block to ensure that
code execution is correct.

llvm-svn: 264370
2016-03-25 00:34:11 +00:00
Matt Arsenault 51d702812d TTI: Report 0 cost for free addrspacecasts
llvm-svn: 264369
2016-03-25 00:26:29 +00:00
Matt Arsenault 8e9aa0acc8 TTI: Use 0 for cost of fabs if free
Ideally this would also happen for fneg, but that
isn't a distinct operation in the IR.

llvm-svn: 264368
2016-03-25 00:26:22 +00:00
Matt Arsenault 59767cea79 AMDGPU: TTI: Make insertelement free.
We don't want to have a cost to scalarizing operations.

llvm-svn: 264364
2016-03-25 00:14:11 +00:00
Manman Ren 9dd8c14674 CXX TLS: collect return blocks after SelectAllBasicBlocks.
It is incorrect to get the corresponding MBB for a ReturnInst before
SelectAllBasicBlocks since SelectAllBasicBlocks can change the
correspondence between a ReturnInst and the MBB it is in.

PR27062

llvm-svn: 264358
2016-03-24 23:21:29 +00:00
Sanjoy Das 731c67fed2 Lower varargs correctly in deopt bundle lowering
Earlier we were ignoring varargs in LowerCallSiteWithDeoptBundle because
populateCallLoweringInfo does not set CallLoweringInfo::IsVarArg.

llvm-svn: 264354
2016-03-24 22:37:52 +00:00
David Blaikie ce7c6cfe0e llvm-dwp: Coalesce code for reading the CU's DW_AT_GNU_dwo_id and DW_AT_name
Going to be reading the DW_AT_GNU_dwo_name shortly as well, and there
was already enough duplication here that it was worth refactoring
rather than adding even more.

llvm-svn: 264350
2016-03-24 22:17:08 +00:00
Mike Aizatsky 6b510a4c01 [sancov] renaming statistics fields.
llvm-svn: 264349
2016-03-24 21:49:55 +00:00
Matthias Braun ae81c29352 LiveInterval: Fix Distribute() failing on liveranges with unused VNInfos
This fixes http://llvm.org/PR26991

llvm-svn: 264345
2016-03-24 21:41:38 +00:00
David Majnemer e09d035dad [LoopStrengthReduce] Don't hoist into a catchswitch
We try to hoist the insertion point as high as possible to encourage
sharing.  However, we must be careful not to hoist into a catchswitch as
it is both an EHPad and a terminator.

llvm-svn: 264344
2016-03-24 21:40:22 +00:00
Eric Christopher b979d51afa Finish the incomplete 'd' inline asm constraint support for PPC by
making sure we give it a register and mark it as a register constraint.

llvm-svn: 264340
2016-03-24 21:04:52 +00:00
Eric Christopher 8c95d53d45 Reorder check lines, comments in test and remove unnecessary IR.
llvm-svn: 264339
2016-03-24 21:04:47 +00:00
Sanjoy Das 6bcfe31820 Match call and target calling conventions in test
Fixes an issue in rL264329.

llvm-svn: 264337
2016-03-24 20:51:24 +00:00
Reid Kleckner 01bc66a8ce Revert "Recommitted r263424 "Supporting all entities declared in lexical scope in LLVM debug info." After fixing PR26942 (the fix is included in this commit)."
This reverts commit r264280.

This broke building Chromium for iOS. We'll upload a reproducer to the
PR soon.

llvm-svn: 264334
2016-03-24 20:38:49 +00:00
Sanjoy Das df9ae70f49 Add lowering support for llvm.experimental.deoptimize
Summary:
Only adds support for "naked" calls to llvm.experimental.deoptimize.
Support for round-tripping through RewriteStatepointsForGC will come
as a separate patch (should be simpler than this one).

Reviewers: reames

Subscribers: sanjoy, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18429

llvm-svn: 264329
2016-03-24 20:23:29 +00:00
Krzysztof Parzyszek c9d4caa32c [Hexagon] Add support for run-time stack overflow checking
Patch by Sundeep Kushwaha.

llvm-svn: 264328
2016-03-24 20:20:07 +00:00
Krzysztof Parzyszek 181fdbd174 [Hexagon] Generate PIC-specific versions of save/restore routines
In PIC mode, the registers R14, R15 and R28 are reserved for use by
the PLT handling code. This causes all functions to clobber these
registers. While this is not new for regular function calls, it does
also apply to save/restore functions, which do not follow the standard
ABI conventions with respect to the volatile/non-volatile registers.

Patch by Jyotsna Verma.

llvm-svn: 264324
2016-03-24 19:18:48 +00:00
Sanjoy Das c0c59fe14e [Statepoints] Fix yet another issue around gc pointer uniqueing
Given that StatepointLowering now uniques derived pointers before
putting them in the per-statepoint spill map, we may end up with missing
entries for derived pointers when we visit a gc.relocate on a pointer
that was de-duplicated away.

Fix this by keeping two maps, one mapping gc pointers to their
de-duplicated values, and one mapping a de-duplicated value to the slot
it is spilled in.

llvm-svn: 264320
2016-03-24 18:57:39 +00:00
David Blaikie 0b214e4a2a [debuginfo] Include dwo_name in the split unit to improve dwp diagnostics
When multiple DWP files are merged together and duplicate DWO IDs are
found it's currently difficult to give an actionable error message - the
DW_AT_name of the CU could be provided, but might be identical (if the
same source file is built into two different configurations), which
doesn't help the user identify the problem.

When no intermediate DWP files are generated, the path to the two DWO
files could be provided - but is lost once the DWOs are merged into a
DWP.

So, include the name of the DWO (dwo_name) in the split file so that
collissions involving a source CU from a DWP can be better diagnosed.

(improvements to llvm-dwp using this to come shortly)

llvm-svn: 264316
2016-03-24 18:37:08 +00:00
Adam Nemet 7aba60c853 [LLE] Check for mismatching types between the store and the load earlier
isDependenceDistanceOfOne asserts that the store and the load access
through the same type.  This function is also used by
removeDependencesFromMultipleStores so we need to make sure we filter
out mismatching types before reaching this point.

Now we do this when the initial candidates are gathered.

This is a refinement of the fix made in r262267.

Fixes PR27048.

llvm-svn: 264313
2016-03-24 17:59:26 +00:00
Simon Atanasyan 26fe92d19f [MC][mips] Add MipsMCInstrAnalysis class and register it as MC instruction analyzer
The `MipsMCInstrAnalysis` class overrides the `evaluateBranch` method
and calculates target addresses for branch and calls instructions.
That allows llvm-objdump to print functions' names in branch instructions
in the disassemble mode.

Differential Revision: http://reviews.llvm.org/D18209

llvm-svn: 264309
2016-03-24 17:18:14 +00:00
Sanjoy Das 8f42b7b3cd Remove unnecessary redirect from test
llvm-svn: 264308
2016-03-24 17:18:00 +00:00
Rafael Espindola fe26864440 Fix gold tests for llvm-readobj format change.
llvm-svn: 264306
2016-03-24 16:45:41 +00:00
Simon Atanasyan b7807a0c8e [llvm-readobj] Decode st_other symbol's flags
The patch supports common STV_xxx visibility flags and MIPS specific
STO_MIPS_xxx flags.

Differential Revision: http://reviews.llvm.org/D18447

llvm-svn: 264300
2016-03-24 16:10:37 +00:00
Elena Demikhovsky 95f3173ce9 AVX-512: Generate KTEST instead of TEST fir i1 vectors
KTEST instruction may be used instead of TEST in this case:

%int_sel3 = bitcast <8 x i1> %sel3 to i8
%res = icmp eq i8 %int_sel3, zeroinitializer
br i1 %res, label %L2, label %L1

Differential Revision: http://reviews.llvm.org/D18444

llvm-svn: 264298
2016-03-24 15:53:45 +00:00
Tim Northover 4498eff9bb CodeGen: extend RHS when splitting ATOMIC_CMP_SWAP_WITH_SUCCESS.
If the operation's type has been promoted during type legalization, we
need to account for the fact that the high bits of the comparison
operand are likely unspecified.

The LHS is usually zero-extended, but MIPS sign extends it, so we have
to be slightly careful.

Patch by Simon Dardis.

llvm-svn: 264296
2016-03-24 15:38:38 +00:00
Rafael Espindola e1c42ac12b Fix another case where we were unconditionally linking linkonce GVs.
With this I think that now llvm-link,  lld and the gold plugin should
agree on which symbol is kept.

llvm-svn: 264292
2016-03-24 15:23:01 +00:00
Rafael Espindola 42e0323768 Fix resolution of linkonce symbols in comdats.
After comdat processing, the symbols still go through regular symbol
resolution.

We were not doing it for linkonce symbols since they are lazy linked.

This fixes pr27044.

llvm-svn: 264288
2016-03-24 14:58:44 +00:00
Daniel Sanders 15f8fb6f83 [mips] Range check vsplat_simm5 and vsplat_simm10
Summary:

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D18177

llvm-svn: 264287
2016-03-24 14:53:40 +00:00
Pirama Arumuga Nainar dc45aef2d8 Remove unsafe AssertZext after promoting result of FP_TO_FP16
Summary:
Some target lowerings of FP_TO_FP16, for instance ARM's vcvtb.f16.f32
instruction, do not guarantee that the top 16 bits are zeroed out.
Remove the unsafe AssertZext and add tests to exercise this.

Reviewers: jmolloy, sbaranga, kristof.beyls, aadg

Subscribers: llvm-commits, srhines, aemerson

Differential Revision: http://reviews.llvm.org/D18426

llvm-svn: 264285
2016-03-24 14:06:03 +00:00
Nemanja Ivanovic 5ebc92dbe1 [PowerPC] Disable direct moves for extractelement and bitcast in 32-bit mode
This patch corresponds to review:
http://reviews.llvm.org/D17711

It disables direct moves on these operations in 32-bit mode since the patterns
assume 64-bit registers. The final patch is slightly different from the
Phabricator review as the bitcast operations needed to be disabled in 32-bit
mode as well. This fixes PR26617.

llvm-svn: 264282
2016-03-24 13:40:33 +00:00
Amjad Aboud 6ff7e10052 Recommitted r263424 "Supporting all entities declared in lexical scope in LLVM debug info."
After fixing PR26942 (the fix is included in this commit).

Differential Revision: http://reviews.llvm.org/D18350

llvm-svn: 264280
2016-03-24 13:30:16 +00:00
Daniel Sanders 837f15187b [mips] Range check simm10
Summary:

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D18148

llvm-svn: 264279
2016-03-24 13:26:59 +00:00
Simon Pilgrim 572ca71573 [X86][XOP] Support for VPPERM byte shuffle instruction
This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode.

Differential Revision: http://reviews.llvm.org/D18189

llvm-svn: 264260
2016-03-24 11:52:43 +00:00
Zlatko Buljan 94af4cbcf4 [mips][microMIPS] Add CodeGen support for DIV, MOD, DIVU, MODU, DDIV, DMOD, DDIVU and DMODU instructions
Differential Revision: http://reviews.llvm.org/D17137

llvm-svn: 264248
2016-03-24 09:22:45 +00:00
James Molloy ee675880b8 [llvm-nm] Correct -P ELF output
Correctly add a space between the address and size when outputting in posix mode (-P).

llvm-svn: 264247
2016-03-24 09:18:09 +00:00
Hrvoje Varga 2cb74ac3c3 [mips][microMIPS] Implement MTC*, MTHC* and DMTC* instructions
Differential Revision: http://reviews.llvm.org/D17328

llvm-svn: 264246
2016-03-24 08:02:09 +00:00
Hrvoje Varga dbea1a1e51 [mips][microMIPS] Fix for "Cannot copy registers" assertion
Differential Revision: http://reviews.llvm.org/D17068

llvm-svn: 264245
2016-03-24 06:05:35 +00:00
Adam Nemet 279784ffc4 [LAA] Support memchecks involving loop-invariant addresses
We used to only allow SCEVAddRecExpr for pointer expressions in order to
be able to compute the bounds.  However this is also trivially possible
for loop-invariant addresses (scUnknown) since then the bounds are the
address itself.

Interestingly, we used allow this for the special case when the
loop-invariant address happens to also be an SCEVAddRecExpr (in an outer
loop).

There are a couple more loops that are vectorized in SPEC after this.
My guess is that the main reason we don't see more because for example a
loop-invariant load is vectorized into a splat vector with several
vector-inserts.  This is likely to make the vectorization unprofitable.
I.e. we don't notice that a later LICM will move all of this out of the
loop so the cost estimate should really be 0.

llvm-svn: 264243
2016-03-24 04:28:47 +00:00
Simon Pilgrim 0110890a79 [X86][SSE] Added tests to ensure that consecutive loads including any/all volatiles are not combined
llvm-svn: 264225
2016-03-24 00:14:37 +00:00
Paul Robinson f81836bd18 [PS4] Guarantee an instruction after a 'noreturn' call.
We need the "return address" of a noreturn call to be within the
bounds of the calling function; TrapUnreachable turns 'unreachable'
into a 'ud2' instruction, which has that desired effect.

Differential Revision: http://reviews.llvm.org/D18414

llvm-svn: 264224
2016-03-24 00:10:03 +00:00
Rafael Espindola 1ee9fbd842 Fix lazy linking of comdat members.
If not for lazy linking of linkonce GVs, comdats are just a
preprocessing before symbol resolution.

Lazy linking complicates it since when we pick a visible member of
comdat, we have to make sure the rest of it passes symbol resolution
too.

llvm-svn: 264223
2016-03-24 00:06:03 +00:00
Mike Aizatsky 5c79bb364a [sancov] -print-coverage-stats option to print various coverage statistics.
Differential Revision: http://reviews.llvm.org/D18418

llvm-svn: 264222
2016-03-24 00:00:08 +00:00
Matt Arsenault 30d37a74da AMDGPU: Remove atomic inc/dec patterns
There is no benefit to these since materializing the constant 1
requires the same number of instructions as materializing uint_max

llvm-svn: 264215
2016-03-23 23:23:38 +00:00
Matt Arsenault 0a30e456b4 AMDGPU: Promote alloca should skip volatiles
llvm-svn: 264214
2016-03-23 23:17:29 +00:00
Matt Arsenault f43c2a0b49 AMDGPU: Insert moves of frame index to value operands
Strengthen tests of storing frame indices.

Right now this just creates irrelevant scheduling changes.

We don't want to have multiple frame index operands
on an instruction. There seem to be various assumptions
that at least the same frame index will not appear twice
in the LocalStackSlotAllocation pass.

There's no reason to have this happen, and it just
makes it easy to introduce bugs where the immediate
offset is appplied to the storing instruction when it should
really be applied to the value being stored as a separate
add.

This might not be sufficient. It might still be problematic
to have an add fi, fi situation, but that's even less unlikely
to happen in real code.

llvm-svn: 264200
2016-03-23 21:49:25 +00:00
Cong Hou 94710840fb Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
Currently, AnalyzeBranch() fails non-equality comparison between floating points
on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
function can modify the branch by reversing the conditional jump and removing
unconditional jump if there is a proper fall-through. However, in the case of
non-equality comparison between floating points, this can turn the branch
"unanalyzable". Consider the following case:

jne.BB1
jp.BB1
jmp.BB2
.BB1:
...
.BB2:
...

AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
removed:

jne.BB1
jnp.BB2
.BB1:
...
.BB2:
...

However, AnalyzeBranch() cannot analyze this branch anymore as there are two
conditional jumps with different targets. This may disable some optimizations
like block-placement: in this case the fall-through behavior is enforced even if
the fall-through block is very cold, which is suboptimal.

Actually this optimization is also done in block-placement pass, which means we
can remove this optimization from AnalyzeBranch(). However, currently
X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
negation conditions for them.

In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
Here only the second conditional jump is reversed. This is valid as we only need
them to do this "unconditional jump removal" optimization.


Differential Revision: http://reviews.llvm.org/D11393

llvm-svn: 264199
2016-03-23 21:45:37 +00:00
Rafael Espindola f2e71244c6 Fix logic for which symbols to keep with comdats.
If a comdat is dropped, all symbols in it are dropped.
If a comdat is kept, the symbols survive to pass regular symbol
resolution.
With this patch we do that for all global symbols.

The added test is a copy of test/tools/gold/X86/comdat.ll that we now
pass.

llvm-svn: 264192
2016-03-23 21:16:33 +00:00
Kevin Enderby 5afbc1cda7 Fix a crash in running llvm-objdump -t with an invalid Mach-O file already
in the test suite. While this is not really an interesting tool and option to run
on a Mach-O file to show the symbol table in a generic libObject format
it shouldn’t crash.

The reason for the crash was in MachOObjectFile::getSymbolType() when it was
calling MachOObjectFile::getSymbolSection() without checking its return value
for the error case.

What makes this fix require a fair bit of diffs is that the method getSymbolType() is
in the class ObjectFile defined without an ErrorOr<> so I needed to add that all
the sub classes.  And all of the uses needed to be updated and the return value
needed to be checked for the error case.

The MachOObjectFile version of getSymbolType() “can” get an error in trying to
come up with the libObject’s internal SymbolRef::Type when the Mach-O symbol
symbol type is an N_SECT type because the code is trying to select from the
SymbolRef::ST_Data or SymbolRef::ST_Function values for the SymbolRef::Type.
And it needs the Mach-O section to use isData() and isBSS to determine if
it will return SymbolRef::ST_Data.

One other possible fix I considered is to simply return SymbolRef::ST_Other
when MachOObjectFile::getSymbolSection() returned an error.  But since in
the past when I did such changes that “ate an error in the libObject code” I
was asked instead to push the error out of the libObject code I chose not
to implement the fix this way.

As currently written both the COFF and ELF versions of getSymbolType()
can’t get an error.  But if isReservedSectionNumber() wanted to check for
the two known negative values rather than allowing all negative values or
the code wanted to add the same check as in getSymbolAddress() to use
getSection() and check for the error then these versions of getSymbolType()
could return errors.

At the end of the day the error printed now is the generic “Invalid data was
encountered while parsing the file” for object_error::parse_failed.  In the
future when we thread Lang’s new TypedError for recoverable error handling
though libObject this will improve.  And where the added // Diagnostic(…
comment is, it would be changed to produce and error message
like “bad section index (42) for symbol at index 8” for this case.

llvm-svn: 264187
2016-03-23 20:27:00 +00:00
Kyle Butt 613112826e Codegen: [PPC] Word Rotates are Zero Extending.
Add Word rotates to the list of instructions that are zero extending.
This allows them to be used in dot form to compare with zero.

llvm-svn: 264183
2016-03-23 19:51:22 +00:00
George Burgess IV 0e4898685f Fix bugs in the MemorySSA walker.
There are a few bugs in the walker that this patch addresses.
Primarily:
- Caching can break when we have multiple BBs without phis
- We weren't optimizing some phis properly
- Because of how the DFS iterator works, there were times where we
  wouldn't cache any results of our DFS

I left the test cases with FIXMEs in, because I'm not sure how much
effort it will take to get those to work (read: We'll probably
ultimately have to end up redoing the walker, or we'll have to come up
with some creative caching tricks), and more test coverage = better.

Differential Revision: http://reviews.llvm.org/D18065

llvm-svn: 264180
2016-03-23 18:31:55 +00:00
Simon Pilgrim b5fb65d43e [X86] Regenerated WidenArith test
llvm-svn: 264157
2016-03-23 14:00:28 +00:00
Oliver Stannard aa77b1e025 [AArch64] Replace some uses of report_fatal_error with reportError in AArch64 ELF object writer
If we can't handle a relocation type, report it as an error in the source,
rather than asserting. I've added a more descriptive message and a test for the
only cases of this that I've been able to trigger.

Differential Revision: http://reviews.llvm.org/D18388

llvm-svn: 264156
2016-03-23 13:45:03 +00:00
Andrey Turetskiy 6a3d561ea0 [X86] Introduction of FeatureX87.
Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87.

Differential Revision: http://reviews.llvm.org/D13979

llvm-svn: 264148
2016-03-23 11:13:54 +00:00
Hrvoje Varga c45baf212a [mips][microMIPS] Delay slot filler modifications
Differential Revision: http://reviews.llvm.org/D18181

llvm-svn: 264147
2016-03-23 10:29:38 +00:00
Valery Pykhtin c0a77c5064 [AMDGPU] Fix missing assembler predicates.
Differential Revision: http://reviews.llvm.org/D18351

llvm-svn: 264137
2016-03-23 04:27:26 +00:00
Sanjoy Das e58ca59cf4 [StatepointLowering] Schedule gc relocates before uniqueing them
Otherwise we can see an "unexpected" gc.relocate that we uniqued away.

llvm-svn: 264127
2016-03-23 02:24:07 +00:00
Justin Lebar e87e1c6cdd [NVVM] Remove noduplicate attribute from synchronizing intrinsics.
Summary:
I've completed my audit of all the code that looks at noduplicate and
added handling of convergent where appropriate, so we no longer need
noduplicate on these intrinsics.

Reviewers: jholewinski

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D18168

llvm-svn: 264107
2016-03-22 22:08:01 +00:00
Rafael Espindola 370d528a05 Drop comdats from the dst module if they are not selected.
A really unfortunate design of llvm-link and related libraries is that
they operate one module at a time.

This means they can copy a GV to the destination module that should not
be there in the final result because a later bitcode file takes
precedence.

We already handled cases like a strong GV replacing a weak for example.

One case that is not currently handled is a comdat replacing another.
This doesn't happen in ELF, but with COFF largest selection kind it is
possible.

In "llvm-link a.ll b.ll" if the selected comdat was from a.ll,
everything will work and we will not copy the comdat from b.ll.

But if we run "llvm-link b.ll a.ll", we fail to delete the already
copied comdat from b.ll. This patch fixes that.

llvm-svn: 264103
2016-03-22 21:35:47 +00:00
George Burgess IV d4febd1612 Keep CodeGenPrepare from preserving the domtree.
CGP modifies the domtree in some cases, so saying that it preserves the
domtree is a lie. We'll be able to selectively preserve it with the new
pass manager.

Differential Revision: http://reviews.llvm.org/D16893

llvm-svn: 264099
2016-03-22 21:25:08 +00:00
Matthias Braun 68bb2931cc Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"
This commit broke LTO builds. Reverting it to unbreak the bots while the
issue is investigated. See also:

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160321/341002.html

This reverts r263158

llvm-svn: 264088
2016-03-22 20:24:34 +00:00
Simon Pilgrim cc41495eb8 [X86][AVX] Added AVX1 tests for 256-bit vector idiv-by-constant
Prep work based on feedback for D18307 

llvm-svn: 264086
2016-03-22 20:10:49 +00:00
Simon Pilgrim c6f5fe3d69 [SelectionDAG] Ensure constant folded legalized vector element types are compatible with the BUILD_VECTOR type
Found during fuzz testing - 32-bit x86 targets were legalizing a <2 x i1> compare result to <2 x i32> when <2 x i64> was expected.

llvm-svn: 264085
2016-03-22 19:59:53 +00:00
Tim Northover b49a8a9dbb CodeGen: check return types match when emitting tail call to builtin.
We were just completely ignoring the types when determining whether we could
safely emit a libcall as a tail call. This is clearly wrong.

Theoretically, we could dig deeper looking for incidental matches (much like
the generic code in Analysis.cpp does), but it's probably not worth it for the
few libcalls that exist.

llvm-svn: 264084
2016-03-22 19:14:38 +00:00
Sanjoy Das bfecef5e1b Remove unnecessary branch from test
(Addresses post commit review by Reid Kleckner)

llvm-svn: 264083
2016-03-22 18:45:41 +00:00
Adam Nemet 8b47e0d0ea [LoopVersioning] Relax an assert for LCSSA PHIs
When you have multiple LCSSA (single-operand) PHIs that are converted
into two-operand PHIs due to versioning, only assert that the PHI
currently being converted has a single operand.  I.e. we don't want to
check PHIs that were converted earlier in the loop.

Fixes PR27023.

Thanks to Karl-Johan Karlsson for the minimized testcase!

llvm-svn: 264081
2016-03-22 18:38:15 +00:00
Sanjoy Das eb5037cadc Allow lowering call sites with both funclets and deopt state
Lowering funclets is a no-op, so we can just go ahead and lower the
deopt state.

llvm-svn: 264078
2016-03-22 18:10:39 +00:00
Dan Gohman 665d7e3838 [WebAssembly] Implement the rotate instructions.
llvm-svn: 264076
2016-03-22 18:01:49 +00:00
Simon Pilgrim 25fb4177fb [X86][SSE] Reapplied: Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Reapplied with a fix for PR26953 (missing vector widening legalization).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 264062
2016-03-22 16:22:08 +00:00
Daniel Sanders 97297770a6 [mips] Range check simm7.
Summary:
Also renamed li_simm7 to li16_imm since it's not a simm7 and has an unusual
encoding (it's a uimm7 except that 0x7f represents -1).

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18145

llvm-svn: 264056
2016-03-22 14:40:00 +00:00
Daniel Sanders 0f17d0da4a [mips] Range check simm5.
Summary:
We can't check the error message for this one because there's another lw/sw
available that covers a larger range. We therefore check the transition
between the two sizes.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D18144

llvm-svn: 264054
2016-03-22 14:29:53 +00:00
Daniel Sanders 946dee3b5b [mips] Range check vsplat_uimm[1234568].
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18143

llvm-svn: 264053
2016-03-22 14:17:41 +00:00
Daniel Sanders 93fa4ce9b7 [mips] Range check uimm4_ptr, remove uimm6_ptr, and use correctly sized immediates in MSA copy/insert.
Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18142

llvm-svn: 264052
2016-03-22 13:58:53 +00:00
Zinovy Nis 07ac2bd4d0 [PATCH] Force LoopReroll to reset the loop trip count value after reroll.
It's a bug fix. 
For rerolled loops SE trip count remains unchanged. It leads to incorrect work of the next passes.
My patch just resets SE info for rerolled loop forcing SE to re-evaluate it next time it requested.
I also added a verifier call in the exisitng test to be sure no invalid SE data remain. Without my fix this test would fail with -verify-scev.

Differential Revision: http://reviews.llvm.org/D18316

llvm-svn: 264051
2016-03-22 13:50:57 +00:00
Marina Yatsina 33ef7dad18 [ELF][gcc compatibility]: support section names with special characters (e.g. "/")
Adding support for section names with special characters in them (e.g. "/").
GCC successfully compiles such section names.
This also fixes PR24520.

Differential Revision: http://reviews.llvm.org/D15678

llvm-svn: 264038
2016-03-22 11:23:15 +00:00
Sanjoy Das dd1d72ce92 Appease the windows buildbots
The guess is that the stdout/stderr ordering may differ between windows
/ unix.

llvm-svn: 264019
2016-03-22 02:11:57 +00:00
Sanjoy Das 38bfc22161 Add "first class" lowering for deopt operand bundles
Summary:
After this change, deopt operand bundles can be lowered directly by
SelectionDAG into STATEPOINT instructions (which are then lowered to a
call or sequence of nop, with an associated __llvm_stackmaps entry0.
This obviates the need to round-trip deoptimization state through
gc.statepoint via RewriteStatepointsForGC.

Reviewers: reames, atrick, majnemer, JosephTremoulet, pgavlin

Subscribers: sanjoy, mcrosier, majnemer, llvm-commits

Differential Revision: http://reviews.llvm.org/D18257

llvm-svn: 264015
2016-03-22 00:59:13 +00:00
George Burgess IV 3887a41725 [MemorySSA] Consider def-only BBs for live-in calculations.
If we have a BB with only MemoryDefs, live-in calculations will ignore
it. This means we get results like this:

define void @foo(i8* %p) {
  ; 1 = MemoryDef(liveOnEntry)
  store i8 0, i8* %p
  br i1 undef, label %if.then, label %if.end

if.then:
  ; 2 = MemoryDef(1)
  store i8 1, i8* %p
  br label %if.end

if.end:
  ; 3 = MemoryDef(1)
  store i8 2, i8* %p
  ret void
}

...When there should be a MemoryPhi in the `if.end` BB.

This patch fixes that behavior.

llvm-svn: 263991
2016-03-21 21:25:39 +00:00
Krzysztof Parzyszek 67e6ae5e2a Remove leftover options from multiline.ll
I added -march=hexagon to force using Hexagon target when testing
locally, and I forgot to take it out.

llvm-svn: 263990
2016-03-21 21:25:01 +00:00
Rafael Espindola 7ff714c339 Add a testcase that would have found the bug in r263971.
llvm-svn: 263988
2016-03-21 21:09:38 +00:00
Rafael Espindola 9219fe79b9 Revert "[llvm-objdump] Printing relocations in executable and shared object files. This partially reverts r215844 by removing test objdump-reloc-shared.test which stated GNU objdump doesn't print relocations, it does."
This reverts commit r263971.
It produces the wrong results for .rela.dyn. I will add a test.

llvm-svn: 263987
2016-03-21 20:59:15 +00:00
Krzysztof Parzyszek 738c6277a6 Unxfail test/DebugInfo/Generic/multiline.ll on Hexagon
llvm-svn: 263986
2016-03-21 20:55:59 +00:00
Nicolai Haehnle 213e87f2ee AMDGPU: Add SIWholeQuadMode pass
Summary:
Whole quad mode is already enabled for pixel shaders that compute
derivatives, but it must be suspended for instructions that cause a
shader to have side effects (i.e. stores and atomics).

This pass addresses the issue by storing the real (initial) live mask
in a register, masking EXEC before instructions that require exact
execution and (re-)enabling WQM where required.

This pass is run before register coalescing so that we can use
machine SSA for analysis.

The changes in this patch expose a problem with the second machine
scheduling pass: target independent instructions like COPY implicitly
use EXEC when they operate on VGPRs, but this fact is not encoded in
the MIR. This can lead to miscompilation because instructions are
moved past changes to EXEC.

This patch fixes the problem by adding use-implicit operands to
target independent instructions. Some general codegen passes are
relaxed to work with such implicit use operands.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18162

llvm-svn: 263982
2016-03-21 20:28:33 +00:00
Krzysztof Parzyszek b14f4fd0de [Hexagon] Add handling fixups and instruction relaxation
llvm-svn: 263981
2016-03-21 20:27:17 +00:00
Krzysztof Parzyszek c6f1e1a709 [Hexagon] Properly encode registers in duplex instructions
llvm-svn: 263980
2016-03-21 20:13:33 +00:00
Dan Gohman c8d7f14506 [WebAssembly] Implement the eqz instructions.
llvm-svn: 263976
2016-03-21 19:54:41 +00:00
Colin LeMahieu cdaf644c48 [llvm-objdump] Printing relocations in executable and shared object files. This partially reverts r215844 by removing test objdump-reloc-shared.test which stated GNU objdump doesn't print relocations, it does.
In executable and shared object ELF files, relocations in the file contain the final virtual address rather than section offset so this is adjusted to display section offset.

Differential revision: http://reviews.llvm.org/D15965

llvm-svn: 263971
2016-03-21 19:14:50 +00:00
Tom Stellard 92339e888f AMDGPU/SI: Fix threshold calculation for branching when exec is zero
Summary:
When control flow is implemented using the exec mask, the compiler will
insert branch instructions to skip over the masked section when exec is
zero if the section contains more than a certain number of instructions.

The previous code would only count instructions in successor blocks,
and this patch modifies the code to start counting instructions in all
blocks between the start and end of the branch.

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18282

llvm-svn: 263969
2016-03-21 18:56:58 +00:00
Matt Arsenault cb38a6bd35 AMDGPU: Remove SignBitIsZero for mubuf scratch offsets
These instructions do not have the same negative base
address problem that DS instructions do on SI.

llvm-svn: 263964
2016-03-21 18:02:18 +00:00
Peter Collingbourne 86b9fbe980 ARM: Better codegen for 64-bit compares.
This introduces a custom lowering for ISD::SETCCE (introduced in r253572)
that allows us to emit a short code sequence for 64-bit compares.

Before:

	push	{r7, lr}
	cmp	r0, r2
	mov.w	r0, #0
	mov.w	r12, #0
	it	hs
	movhs	r0, #1
	cmp	r1, r3
	it	ge
	movge.w	r12, #1
	it	eq
	moveq	r12, r0
	cmp.w	r12, #0
	bne	.LBB1_2
@ BB#1:                                 @ %bb1
	bl	f
	pop	{r7, pc}
.LBB1_2:                                @ %bb2
	bl	g
	pop	{r7, pc}

After:

	push	{r7, lr}
	subs	r0, r0, r2
	sbcs.w	r0, r1, r3
	bge	.LBB1_2
@ BB#1:                                 @ %bb1
	bl	f
	pop	{r7, pc}
.LBB1_2:                                @ %bb2
	bl	g
	pop	{r7, pc}

Saves around 80KB in Chromium's libchrome.so.

Some notes on this patch:

- I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I
  introduced (nothing else needs them). However, they are necessary in
  order to avoid poor codegen, and they seem similar to existing combines
  in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to
  (brcond Compare)).

- No support for Thumb-1. This is in principle possible, but we'd need
  to implement ARMISD::SUBE for Thumb-1.

Differential Revision: http://reviews.llvm.org/D15256

llvm-svn: 263962
2016-03-21 18:00:02 +00:00
Renato Golin 2b6b7ffd6c [ARM] Add Cortex-A32 support
Adding Cortex-A32 as an available target in the ARM backend.

Patch by Sam Parker.

llvm-svn: 263956
2016-03-21 17:29:01 +00:00
Hemant Kulkarni a11fbe1cb1 [llvm-readobj] Impl GNU style symbols printing
Implements "readelf -sW and readelf -DsW"

Differential Revision: http://reviews.llvm.org/D18224

llvm-svn: 263952
2016-03-21 17:18:23 +00:00
Matt Arsenault b96b57347a AMDGPU: Add frexp_mant intrinsic
llvm-svn: 263948
2016-03-21 16:11:05 +00:00
Matt Arsenault 155dda9134 Implement constant folding for bitreverse
llvm-svn: 263945
2016-03-21 15:00:35 +00:00
Silviu Baranga f875e4fd92 [IndVars] Fix PR26974: make sure replaceCongruentIVs doesn't break LCSSA
Summary:
replaceCongruentIVs can break LCSSA when trying to replace IV increments
since it tries to replace all uses of a phi node with another phi node
while both of the phi nodes are not necessarily in the processed loop.
This will cause an assert in IndVars.

To fix this, we add a check to make sure that the replacement maintains
LCSSA.

Reviewers: sanjoy

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18266

llvm-svn: 263941
2016-03-21 12:44:29 +00:00
Silviu Baranga 46030585b3 [DAGCombine] Catch the case where extract_vector_elt can cause an any_ext while processing AND SDNodes
Summary:
extract_vector_elt can cause an implicit any_ext if the types don't
match. When processing the following pattern:

  (and (extract_vector_elt (load ([non_ext|any_ext|zero_ext] V))), c)

DAGCombine was ignoring the possible extend, and sometimes removing
the AND even though it was required to maintain some of the bits
in the result to 0, resulting in a miscompile.

This change fixes the issue by limiting the transformation only to
cases where the extract_vector_elt doesn't perform the implicit
extend.

Reviewers: t.p.northover, jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18247

llvm-svn: 263935
2016-03-21 11:43:46 +00:00
Elena Demikhovsky 39a0020f2d Fixed -mcpu flag
"core-avx" does not exist; I changed to "nehalem"

llvm-svn: 263932
2016-03-21 11:06:20 +00:00
Simon Pilgrim 4af44f3c13 [X86][SSE] Add vector integer division by constant tests
Expanded tests and split into sdiv/srem and udiv/urem cases for 128 and 256 bit vectors.

llvm-svn: 263917
2016-03-20 21:46:58 +00:00
Jingyue Wu 1375560bdb [NVPTX] Adds a new address space inference pass.
Summary:
The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable
to convert the address space of a pointer induction variable. This patch adds a
new pass called NVPTXInferAddressSpaces that overcomes that limitation using a
fixed-point data-flow analysis (see the file header comments for details).

The new pass is experimental and not enabled by default. Users can turn
it on by setting the -nvptx-use-infer-addrspace flag of llc.

Reviewers: jholewinski, tra, jlebar

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D17965

llvm-svn: 263916
2016-03-20 20:59:20 +00:00
Simon Pilgrim c44472a5bc [X86][SSE] Detect zeroable shuffle elements from different value types
Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask.

Differential Revision: http://reviews.llvm.org/D14261

llvm-svn: 263906
2016-03-20 15:45:42 +00:00
Igor Breger 3ea8af5108 AVX512BW: Enable v32i1/v64i1 BUILD_VECTOR
Differential Revision: http://reviews.llvm.org/D18211

llvm-svn: 263898
2016-03-20 13:09:43 +00:00
Mehdi Amini 43165d913a Expose IRBuilder::CreateAtomicCmpXchg as LLVMBuildAtomicCmpXchg in the C API.
Summary: Also expose getters and setters in the C API, so that the change can be tested.

Reviewers: nhaehnle, axw, joker.eph

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18260

From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
llvm-svn: 263886
2016-03-19 21:28:28 +00:00
David Majnemer abae6b588b [SimplifyLibCalls] Only consider sinpi/cospi functions within the same function
The sinpi/cospi can be replaced with sincospi to remove unnecessary
computations.  However, we need to make sure that the calls are within
the same function!

This fixes PR26993.

llvm-svn: 263875
2016-03-19 04:53:02 +00:00
David Majnemer cdf2873e36 [InstCombine] Don't insert instructions before a catch switch
CatchSwitches are not splittable, we cannot insert casts, etc. before
them.

This fixes PR26992.

llvm-svn: 263874
2016-03-19 04:39:52 +00:00
Mehdi Amini 8d05185a26 Rework linkInModule(), making it oblivious to ThinLTO
Summary:
ThinLTO is relying on linkInModule to import selected function.
However a lot of "magic" was hidden in linkInModule and the IRMover,
who would rename and promote global variables on the fly.

This is moving to an approach where the steps are decoupled and the
client is reponsible to specify the list of globals to import.
As a consequence some test are changed because they were relying on
the previous behavior which was importing the definition of *every*
single global without control on the client side.
Now the burden is on the client to decide if a global has to be imported
or not.

Reviewers: tejohnson

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D18122

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263863
2016-03-19 00:40:31 +00:00
Mehdi Amini 155da5b132 Add a test for r263577: "Add missing error handling in llvm-lto"
On Rafael's suggestion!
(also fix a discrepancy between this error message format and the others)

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263860
2016-03-19 00:17:32 +00:00
Manman Ren a3a019cf90 [CXX_FAST_TLS] Fix issues in ARM.
We need to be careful on which registers can be explicitly handled
via copies. Prologue, Epilogue use physical registers and if one belongs
to the set of CSRsViaCopy, it will no longer be CSRed, since PEI overwrites
it after the explicit copies.

llvm-svn: 263857
2016-03-18 23:44:37 +00:00
Manman Ren 4865d89653 [CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.
Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller
and callee have mismatched calling conventions.

llvm-svn: 263856
2016-03-18 23:41:51 +00:00
Manman Ren 2828c57b6f [CXX_FAST_TLS] fix issues with O0 on ARM, AArch64 and X86.
Since at O0, explicit copies via SplitCSR may not be removed even if
they are unnecessary, we choose not to use SplitCSR at O0.

llvm-svn: 263855
2016-03-18 23:38:49 +00:00
Michael Kuperstein 5abc2765fa Have DataLayout::isLegalInteger() accept uint64_t
While not strictly necessary, since we don't support large integer
types, this avoids bugs due to silent truncation from uint64_t to a
32-bit unsigned (e.g. DL.isLegalInteger(DL.getTypeSizeInBits(Ty) )

This fixes PR26972.

Differential Revision: http://reviews.llvm.org/D18258

llvm-svn: 263850
2016-03-18 23:19:29 +00:00
Alexei Starovoitov 7e453bb8be BPF: emit an error message for unsupported signed division operation
Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@fb.com>
llvm-svn: 263842
2016-03-18 22:02:47 +00:00
Sanjoy Das 60fb899f28 [IndVars] Pass the right loop to isLoopInvariantPredicate
The loop on IVOperand's incoming values assumes IVOperand to be an
induction variable on the loop over which `S Pred X` is invariant;
otherwise loop invariant incoming values to IVOperand are not guaranteed
to dominate the comparision.

This fixes PR26973.

llvm-svn: 263827
2016-03-18 20:37:07 +00:00
Chad Rosier cdfd7e7201 [AArch64] Enable more load clustering in the MI Scheduler.
This patch adds unscaled loads and sign-extend loads to the TII
getMemOpBaseRegImmOfs API, which is used to control clustering in the MI
scheduler. This is done to create more opportunities for load pairing.  I've
also added the scaled LDRSWui instruction, which was missing from the scaled
instructions. Finally, I've added support in shouldClusterLoads for clustering
adjacent sext and zext loads that too can be paired by the load/store optimizer.

Differential Revision: http://reviews.llvm.org/D18048

llvm-svn: 263819
2016-03-18 19:21:02 +00:00
Reid Kleckner fbd7787d7e [codeview] Only emit function ids for inlined functions
We aren't referencing any other kind of function currently.
Should save a bit on our debug info size.

llvm-svn: 263817
2016-03-18 18:54:32 +00:00
Colin LeMahieu 0143146514 [MCParser] Accept uppercase radix variants 0X and 0B
Differential Revision: http://reviews.llvm.org/D14781

llvm-svn: 263802
2016-03-18 18:22:07 +00:00
Colin LeMahieu 307a83d76a [llvm-objdump] Print <unknown> in place of instruction text if it couldn't be disassembled.
llvm-svn: 263793
2016-03-18 16:26:48 +00:00
Nicolai Haehnle 95e8ffd398 AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format
Summary:
Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before
the frontend patches land in Mesa. Eventually, we may want to automatically
reduce the size of loads at the LLVM IR level, which requires such overloads,
and in some cases Mesa can generate them directly.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18255

llvm-svn: 263792
2016-03-18 16:24:40 +00:00
Nicolai Haehnle ad63638f6d AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics
Summary:
These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used
by Mesa to implement atomics with buffer semantics. The intrinsic interface
matches that of buffer.load.format and buffer.store.format, except that the
GLC bit is not exposed (it is automatically deduced based on whether the
return value is used).

The change of hasSideEffects is required for TableGen to accept the pattern
that matches the intrinsic.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, rivanvx, llvm-commits

Differential Revision: http://reviews.llvm.org/D18151

llvm-svn: 263791
2016-03-18 16:24:31 +00:00
Nicolai Haehnle 3003ba00a3 AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.

Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18218

llvm-svn: 263790
2016-03-18 16:24:20 +00:00
Sam Kolton a74cd526e9 [AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3
Review: http://reviews.llvm.org/D18267
llvm-svn: 263789
2016-03-18 15:35:51 +00:00
Simon Atanasyan 05f4c803bb [llvm-objdump] Move test case to the X86 sub-directory because it depends on X86 target supporting. NFC.
llvm-svn: 263781
2016-03-18 09:52:12 +00:00
Adam Nemet 709e3046ee [LoopDataPrefetch] Add TTI to limit the number of iterations to prefetch ahead
Summary:
It can hurt performance to prefetch ahead too much.  Be conservative for
now and don't prefetch ahead more than 3 iterations on Cyclone.

Reviewers: hfinkel

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17949

llvm-svn: 263772
2016-03-18 00:27:43 +00:00
Adam Nemet 6d8beeca53 [LoopDataPrefetch/Aarch64] Allow selective prefetching of large-strided accesses
Summary:
And use this TTI for Cyclone.  As it was explained in the original RFC
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW
prefetcher work up to 2KB strides.

I am also adding tests for this and the previous change (D17943):

* Cyclone prefetching accesses with a large stride
* Cyclone not prefetching accesses with a small stride
* Generic Aarch64 subtarget not prefetching either

Reviewers: hfinkel

Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17945

llvm-svn: 263771
2016-03-18 00:27:38 +00:00
Peter Collingbourne a1f8625662 DebugInfo: Add ability to not emit DW_AT_vtable_elem_location for virtual functions.
A virtual index of -1u indicates that the subprogram's virtual index is
unrepresentable (for example, when using the relative vtable ABI), so do
not emit a DW_AT_vtable_elem_location attribute for it.

Differential Revision: http://reviews.llvm.org/D18236

llvm-svn: 263765
2016-03-17 23:58:03 +00:00
Tim Shen 5cdf75084a [PPC, FastISel] Fix ordered/unordered fcmp
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.

More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.

Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.

llvm-svn: 263753
2016-03-17 22:27:58 +00:00
Adam Nemet b0c4eae073 [LoopVectorize] Annotate versioned loop with noalias metadata
Summary:
Use the new LoopVersioning facility (D16712) to add noalias metadata in
the vector loop if we versioned with memchecks.  This can enable some
optimization opportunities further down the pipeline (see the included
test or the benchmark improvement quoted in D16712).

The test also covers the bug I had in the initial version in D16712.

The vectorizer did not previously use LoopVersioning.  The reason is
that the vectorizer performs its transformations in single shot.  It
creates an empty single-block vector loop that it then populates with
the widened, if-converted instructions.  Thus creating an intermediate
versioned scalar loop seems wasteful.

So this patch (rather than bringing in LoopVersioning fully) adds a
special interface to LoopVersioning to allow the vectorizer to add
no-alias annotation while still performing its own versioning.

As the vectorizer propagates metadata from the instructions in the
original loop to the vector instructions we also check the pointer in
the original instruction and see if LoopVersioning can add no-alias
metadata based on the issued memchecks.

Reviewers: hfinkel, nadav, mzolotukhin

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17191

llvm-svn: 263744
2016-03-17 20:32:37 +00:00
Adam Nemet 5eccf07df3 [LoopVersioning] Annotate versioned loop with noalias metadata
Summary:
If we decide to version a loop to benefit a transformation, it makes
sense to record the now non-aliasing accesses in the newly versioned
loop.  This allows non-aliasing information to be used by subsequent
passes.

One example is 456.hmmer in SPECint2006 where after loop distribution,
we vectorize one of the newly distributed loops.  To vectorize we
version this loop to fully disambiguate may-aliasing accesses.  If we
add the noalias markers, we can use the same information in a later DSE
pass to eliminate some dead stores which amounts to ~25% of the
instructions of this hot memory-pipeline-bound loop.  The overall
performance improves by 18% on our ARM64.

The scoped noalias annotation is added in LoopVersioning.  The patch
then enables this for loop distribution.  A follow-on patch will enable
it for the vectorizer.  Eventually this should be run by default when
versioning the loop but first I'd like to get some feedback whether my
understanding and application of scoped noalias metadata is correct.

Essentially my approach was to have a separate alias domain for each
versioning of the loop.  For example, if we first version in loop
distribution and then in vectorization of the distributed loops, we have
a different set of memchecks for each versioning.  By keeping the scopes
in different domains they can conveniently be defined independently
since different alias domains don't affect each other.

As written, I also have a separate domain for each loop.  This is not
necessary and we could save some metadata here by using the same domain
across the different loops.  I don't think it's a big deal either way.

Probably the best is to review the tests first to see if I mapped this
problem correctly to scoped noalias markers.  I have plenty of comments
in the tests.

Note that the interface is prepared for the vectorizer which needs the
annotateInstWithNoAlias API.  The vectorizer does not use LoopVersioning
so we need a way to pass in the versioned instructions.  This is also
why the maps have to become part of the object state.

Also currently, we only have an AA-aware DSE after the vectorizer if we
also run the LTO pipeline.  Depending how widely this triggers we may
want to schedule a DSE toward the end of the regular pass pipeline.

Reviewers: hfinkel, nadav, ashutosh.nema

Subscribers: mssimpso, aemerson, llvm-commits, mcrosier

Differential Revision: http://reviews.llvm.org/D16712

llvm-svn: 263743
2016-03-17 20:32:32 +00:00
Tim Northover 498c56c240 ARM: stop asserting on weird <3 x Ty> vectors in ISelLowering.
llvm-svn: 263741
2016-03-17 20:10:28 +00:00
Guozhi Wei 7b390ec4cd [InstCombine] Combine A->B->A BitCast
This patch enhances InstCombine to handle following case:

        A  ->  B    bitcast
        PHI
        B  ->  A    bitcast

llvm-svn: 263734
2016-03-17 18:47:20 +00:00
Valery Pykhtin c4546ec0cf [AMDGPU] add VI disassembler tests. NFC.
Autogenerated from the corresponding assembler tests with a few FIXME added (will fix soon).

Differential Revision: http://reviews.llvm.org/D18249

llvm-svn: 263729
2016-03-17 17:56:33 +00:00
Petar Jovanovic 0b44f24033 [PowerPC] Disable CTR loops optimization for soft float operations
This patch prevents CTR loops optimization when using soft float operations
inside loop body. Soft float operations use function calls, but function
calls are not allowed inside CTR optimized loops.

Patch by Aleksandar Beserminji.

Differential Revision: http://reviews.llvm.org/D17600

llvm-svn: 263727
2016-03-17 17:11:33 +00:00
Derek Schuff d4207ba0f6 [WebAssembly] Stackify code emitted by eliminateFrameIndex and SP writeback
Summary:
MRI::eliminateFrameIndex can emit several instructions to do address
calculations; these can usually be stackified. Because instructions with
FI operands can have subsequent operands which may be expression trees,
find the top of the leftmost tree and insert the code before it, to keep
the LIFO property.

Also use stackified registers when writing back the SP value to memory
in the epilog; it's unnecessary because SP will not be used after the
epilog, and it results in better code.

Differential Revision: http://reviews.llvm.org/D18234

llvm-svn: 263725
2016-03-17 17:00:29 +00:00
Changpeng Fang 234fcb81d3 AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute
Symmary:
  ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed.

Reviewers
  arsenm, tstellarAMD

Subscribers
  llvm-commits, arsenm

Differential Revision:
  http://reviews.llvm.org/D18197

llvm-svn: 263720
2016-03-17 16:43:50 +00:00
Nicolai Haehnle 79cad857a0 AMDGPU: mark atomic instructions as sources of divergence
Summary:
As explained by the comment, threads will typically see different values
returned by atomic instructions even if the arguments are equal.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18156

llvm-svn: 263719
2016-03-17 16:21:59 +00:00
Simon Pilgrim 0f37fbac51 [X86][SSE] Simplified blend-with-zero combining
We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines

This patch stops the combine if we already have a blend in place (means we miss some domain corrections)

llvm-svn: 263717
2016-03-17 15:59:36 +00:00
Sanjay Patel 9e23fedaf0 propagate 'unpredictable' metadata on select instructions
This is similar to D18133 where we allowed profile weights on select instructions. 
This extends that change to also allow the 'unpredictable' attribute of branches to apply to selects.

A test to check that 'unpredictable' metadata is preserved when cloning instructions was checked in at:
http://reviews.llvm.org/rL263648

Differential Revision: http://reviews.llvm.org/D18220

llvm-svn: 263716
2016-03-17 15:30:52 +00:00
Saleem Abdulrasool 071a099102 ARM: Revert SVN r253865, 254158, fix windows division
The two changes together weakened the test and caused a regression with division
handling in MSVC mode.  They were applied to avoid an assertion being triggered
in the block frequency analysis.  However, the underlying problem was simply
being masked rather than solved properly.  Address the actual underlying problem
and revert the changes.  Rather than analyze the cause of the assertion, the
division failure was assumed to be an overflow.

The underlying issue was a subtle bug in the BB construction in the emission of
the div-by-zero check (WIN__DBZCHK).  We did not construct the proper successor
information in the basic blocks, nor did we update the PHIs associated with the
basic block when we split them.  This would result in assertions being triggered
in the block frequency analysis pass.

Although the original tests are being removed, the tests themselves performed
very little in terms of validation but merely tested that we did not assert when
generating code.  Update this with new tests that actually ensure that we do not
regress on the code generation.

llvm-svn: 263714
2016-03-17 14:10:49 +00:00
Simon Atanasyan 16f2460575 [llvm-objdump] Add REQUIRES x86 directive to fix buildbots
llvm-svn: 263708
2016-03-17 11:09:21 +00:00
Simon Atanasyan 34223a7e5d [llvm-objdump] Add '0x' prefix to a target displacement number to accent its hex format
It might be hard to recognize a hexadecimal number without '0x' prefix.
Besides that '0x' prefix corresponds to GNU objdump behaviour.

Differential Revision: http://reviews.llvm.org/D18207

llvm-svn: 263705
2016-03-17 10:43:44 +00:00
Simon Atanasyan 58ee875296 [mips] Use `formatImm` call to print immediate value in the `MipsInstPrinter`
That allows, for example, to print hex-formatted immediates using
llvm-objdump --print-imm-hex command line option.

Differential Revision: http://reviews.llvm.org/D18195

llvm-svn: 263704
2016-03-17 10:43:36 +00:00
David Majnemer 6f66f0a343 [yaml2obj, COFF] Correctly handle section alignment
The section alignment field was marked optional but not provided a
default value: initialize it with 0.

While we are here, ensure that the section alignment is plausible.

llvm-svn: 263692
2016-03-17 05:43:26 +00:00
Sanjay Patel 3b32ebb97b use FileCheck for tighter checking
llvm-svn: 263679
2016-03-16 23:39:37 +00:00
Sanjay Patel b672e792f2 reduce check strings; no need to check IR comments
llvm-svn: 263675
2016-03-16 23:22:01 +00:00
Sanjay Patel 355c77e796 use FileCheck for tighter checking
llvm-svn: 263674
2016-03-16 23:20:20 +00:00
Chris Bieneman 671d0dda7d Upgrade TBAA *before* upgrading intrinsics
Summary: If TBAA is on an intrinsic and it gets upgraded and drops the TBAA we hit an odd assert. We should just upgrade the TBAA first because it doesn't have side-effects.

Reviewers: reames, apilipenko, manmanren

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18229

llvm-svn: 263673
2016-03-16 23:17:54 +00:00
Sanjay Patel 6cec10572f use FileCheck for tighter checking
I'm testing out a script that auto-generates the check lines.
It's 98% copied from utils/update_llc_test_checks.py.
If others think this is useful, please let me know.

llvm-svn: 263668
2016-03-16 22:34:57 +00:00
Sanjay Patel cb775fcf22 use FileCheck for tighter checking
I'm testing out a script that auto-generates the check lines.
It's 98% copied from utils/update_llc_test_checks.py.
If others think this is useful, please let me know.

llvm-svn: 263667
2016-03-16 22:29:07 +00:00
Nicolai Haehnle ef160de3e5 AMDGPU: Prevent uniform loops from becoming infinite
Summary:
Uniform loops where the branch leaving the loop is predicated on VCCNZ
must be skipped if EXEC = 0, otherwise they will be infinite.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18137

llvm-svn: 263658
2016-03-16 20:14:33 +00:00
Geoff Berry 56fabf9b55 Revert "[LSR] Create fewer redundant instructions."
This reverts commit r263644.  Investigating bootstrap failures.

llvm-svn: 263655
2016-03-16 19:21:47 +00:00
Simon Pilgrim 53f27fc00f [X86] Reduced alignment of widened vector load/stores to better match PR26953 cases
llvm-svn: 263649
2016-03-16 18:32:44 +00:00
Sanjay Patel fd24fb1b2b add checks for 'unpredictable' metadata preservation
llvm-svn: 263648
2016-03-16 18:15:34 +00:00
Geoff Berry 459b750871 [LSR] Create fewer redundant instructions.
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs.  This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.

Reviewers: atrick

Subscribers: mcrosier, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18001

llvm-svn: 263644
2016-03-16 17:29:49 +00:00
Simon Pilgrim 60228bdb80 [X86] Regenerated + extended widened vector conversion tests
- Ensure we test X86 + X64
- sitopfp / uitofp requires testing for SSE2 and SSE42 as well (part of the fix for PR26953)

llvm-svn: 263640
2016-03-16 15:33:43 +00:00
Igor Breger 0ba7b04f5f AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for 512bit vector because PCMPGT supported only for 128/256bit.
Differential Revision: http://reviews.llvm.org/D18204

llvm-svn: 263624
2016-03-16 08:48:26 +00:00
Vedant Kumar ddfec8cb55 [Bitcode] Add compatibility test for the 3.8 release
Fork off compatibility.ll for the 3.8 release. The *.bc file in this
commit was produced using a Release build of the release_38 branch.

llvm-svn: 263620
2016-03-16 05:43:03 +00:00
Haicheng Wu 7873857a88 [JumpThreading] See through Cast Instructions
To capture more jump-thread opportunity.

llvm-svn: 263618
2016-03-16 04:52:52 +00:00
Simon Pilgrim 39d2411c3b [X86] Regenerated widen load tests
llvm-svn: 263608
2016-03-16 00:41:21 +00:00
Simon Pilgrim f7cb16f7de [X86][SSE41] Additional tests for extracting zeroable shuffle elements
We can currently only match zeroable vector elements of the same size as the shuffle type - these tests demonstrate the problem and a solution will be shortly added in an updated D14261

llvm-svn: 263606
2016-03-16 00:13:36 +00:00
Haicheng Wu 64d9d7c3f7 Revert "[JumpThreading] Simplify Instructions first in ComputeValueKnownInPredecessors()"
Not sure it handles undef properly.

llvm-svn: 263605
2016-03-15 23:38:47 +00:00
Evgeniy Stepanov d6e91369d8 [msan] Don't put module constructors in comdats.
There is something strange going on with debug info (.eh_frame_hdr)
disappearing when msan.module_ctor are placed in comdat sections.

Moving this functionality under flag, disabled by default.

llvm-svn: 263579
2016-03-15 20:25:47 +00:00
Quentin Colombet fdc838e97f [MIR] Add a test case for the diagnostic of a wrongly typed generic instruction
llvm-svn: 263573
2016-03-15 18:31:29 +00:00
Quentin Colombet a3f2abad55 [AArch64] Move GlobalISel test cases into a GlobalISel subdirectory
llvm-svn: 263572
2016-03-15 18:30:00 +00:00
Adam Nemet fdb20595a1 [LV] Preserve LoopInfo when store predication is used
This was a latent bug that got exposed by the change to add LoopSimplify
as a dependence to LoopLoadElimination.  Since LoopInfo was corrupted
after LV, LoopSimplify mis-compiled nbench in the test-suite (more
details in the PR).

The problem was that when we create the blocks for predicated stores we
didn't add those to any loops.

The original testcase for store predication provides coverage for this
assuming we verify LI on the way out of LV.

Fixes PR26952.

llvm-svn: 263565
2016-03-15 18:06:20 +00:00
Changpeng Fang 01f6062227 AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS
Summary:
  Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.

Reviewers:
    arsenm
    tstellarAMD

Subscribers
    llvm-commits, arsenm

Differential Revision:
   http://reviews.llvm.org/D18064

llvm-svn: 263563
2016-03-15 17:28:44 +00:00
Hemant Kulkarni c030f23b8b [llvm-readobj] Impl GNU style printing of sections and relocations
Differential Revision: http://reviews.llvm.org/D17523

llvm-svn: 263561
2016-03-15 17:25:31 +00:00
Benjamin Kramer 96f4b12880 [GlobalOpt] Don't look through aliases when sorting names of globals.
If both are different aliases to the same value the sorting becomes
non-deterministic as array_pod_sort is not stable.

llvm-svn: 263550
2016-03-15 14:18:26 +00:00
Nikolay Haustov cb9dddb1d7 [AMDGPU] Assembler: Update SOP* tests
Add VI encodings.
Reformat sopp.s to match style of other files.

Differential Revision: http://reviews.llvm.org/D18084

llvm-svn: 263540
2016-03-15 07:44:57 +00:00
David Majnemer 0ab61bfb37 [llvm-objdump] Add support for dumping the PE TLS directory
The PE TLS directory contains information about where the TLS data
resides in the image, what functions should be executed when threads are
created, etc.

llvm-svn: 263537
2016-03-15 06:14:01 +00:00
Lang Hames abda4d2526 [MachO] Extend the alt_entry support for aliases added in r263521 to
expressions of the form 'a = .' and 'a = Ltmp'.

llvm-svn: 263528
2016-03-15 04:20:49 +00:00
Lang Hames 1b640e05ba [MachO] Add MachO alt-entry directive support.
This patch adds support for the MachO .alt_entry assembly directive, and uses
it for global aliases with non-zero GEP offsets. The alt_entry flag indicates
that a symbol should be layed out immediately after the preceding symbol.
Conceptually it introduces an alternate entry point for a function or data
structure. E.g.:

safe_foo:
  // check preconditions for foo
.alt_entry fast_foo
fast_foo:
  // body of foo, can assume preconditions.

The .alt_entry flag is also implicitly set on assembly aliases of the form:

a = b + C

where C is a non-zero constant, since these have the same effect as an
alt_entry symbol: they introduce a label that cannot be moved relative to the
preceding one. Setting the alt_entry flag on aliases of this form fixes
http://llvm.org/PR25381.

llvm-svn: 263521
2016-03-15 01:43:05 +00:00
Teresa Johnson 26ab5772b0 [ThinLTO] Renaming of function index to module summary index (NFC)
(Resubmitting after fixing missing file issue)

With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.

A companion clang patch will immediately succeed this patch to reflect
this renaming.

llvm-svn: 263513
2016-03-15 00:04:37 +00:00
Eric Christopher da8b3f1914 Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND on
pre-SSE41 hardware" as it seems to be causing crashes during code
generation in halide. PR forthcoming.

This reverts commit r263303.

llvm-svn: 263512
2016-03-14 23:59:57 +00:00
Justin Lebar 6827de19b2 [LoopUnroll] Respect the convergent attribute.
Summary:
Specifically, when we perform runtime loop unrolling of a loop that
contains a convergent op, we can only unroll k times, where k divides
the loop trip multiple.

Without this change, we'll happily unroll e.g. the following loop

  for (int i = 0; i < N; ++i) {
    if (i == 0) convergent_op();
    foo();
  }

into

  int i = 0;
  if (N % 2 == 1) {
    convergent_op();
    foo();
    ++i;
  }
  for (; i < N - 1; i += 2) {
    if (i == 0) convergent_op();
    foo();
    foo();
  }.

This is unsafe, because we've just added a control-flow dependency to
the convergent op in the prelude.

In general, runtime unrolling loops that contain convergent ops is safe
only if we don't have emit a prelude, which occurs when the unroll count
divides the trip multiple.

Reviewers: resistor

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17526

llvm-svn: 263509
2016-03-14 23:15:34 +00:00
Amaury Sechet bdb261b4c0 Imporove load to store => memcpy
Summary: This now try to reorder instructions in order to help create the optimizable pattern.

Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer

Differential Revision: http://reviews.llvm.org/D16523

llvm-svn: 263503
2016-03-14 22:52:27 +00:00
Teresa Johnson cec0cae313 Revert "[ThinLTO] Renaming of function index to module summary index (NFC)"
This reverts commit r263490. Missed a file.

llvm-svn: 263493
2016-03-14 21:18:10 +00:00
Teresa Johnson 892920b358 [ThinLTO] Renaming of function index to module summary index (NFC)
With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.

A companion clang patch will immediately succeed this patch to reflect
this renaming.

llvm-svn: 263490
2016-03-14 21:05:56 +00:00
Sanjay Patel ee52b6e77d allow branch weight metadata on select instructions (PR26636)
As noted in:
https://llvm.org/bugs/show_bug.cgi?id=26636

This doesn't accomplish anything on its own. It's the first step towards preserving 
and using branch weights with selects.

The next step would be to make sure we're propagating the info in all of the other
places where we create selects (SimplifyCFG, InstCombine, etc). I don't think there's
an easy fix to make this happen; we have to look at each transform individually to 
determine how to correctly propagate the weights.

Along with that step, we need to then use the weights when making subsequent transform
decisions such as discussed in http://reviews.llvm.org/D16836.

The inliner test is independent but closely related. It verifies that metadata is
preserved when both branches and selects are cloned.

Differential Revision: http://reviews.llvm.org/D18133

llvm-svn: 263482
2016-03-14 20:18:59 +00:00
Justin Lebar 9d94397859 [attrs] Handle convergent CallSites.
Summary:
Previously we had a notion of convergent functions but not of convergent
calls.  This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.

Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent.  As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.

Originally landed as r261544, then reverted in r261544 for (incidental)
build breakage.  Re-landed here with no changes.

Reviewers: chandlerc, jingyue

Subscribers: llvm-commits, tra, jhen, hfinkel

Differential Revision: http://reviews.llvm.org/D17739

llvm-svn: 263481
2016-03-14 20:18:54 +00:00
Michael Kuperstein b7860fedd4 [AliasSetTracker] Do not strip pointer casts when processing MemSetInst
This fixes PR26843.

llvm-svn: 263462
2016-03-14 18:34:29 +00:00
Chad Rosier 27c352d26d [AArch64] Refactor AArch64FrameLowering::emitPrologue. NFC.
http://reviews.llvm.org/D18125
Patch by Aditya Kumar.

llvm-svn: 263461
2016-03-14 18:24:34 +00:00
Chad Rosier 6d98655070 [AArch64] Break the dependency between FP and SP when possible.
When the SP in not changed because of realignment/VLAs etc., we restore the SP
by using the previous value of SP and not the FP. Breaking the dependency will
help in cases when the epilog of a callee is close to the epilog of the caller;
for then "sub sp, fp, #" depends on the load restoring the FP in the epilog of
the callee.

http://reviews.llvm.org/D18060
Patch by Aditya Kumar and Evandro Menezes.

llvm-svn: 263458
2016-03-14 18:17:41 +00:00
Tom Stellard 331f981cc9 AMDGPU/SI: Handle wait states required for DPP instructions
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17543

llvm-svn: 263447
2016-03-14 17:05:56 +00:00
Sanjay Patel 62d707c8d9 [x86, AVX] replace masked load with full vector load when possible
Converting masked vector loads to regular vector loads for x86 AVX should always be a win.
I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any
objections.

1. x86 already does this kind of optimization for multiple scalar loads -> vector load.
2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner.

Differential Revision: http://reviews.llvm.org/D18094

llvm-svn: 263446
2016-03-14 16:54:43 +00:00
Daniel Sanders e8efff373a [mips] MIPS32R6 compact branch support
Summary:
MIPSR6 introduces a class of branches called compact branches. Unlike the
traditional MIPS branches which have a delay slot, compact branches do not
have a delay slot. The instruction following the compact branch is only
executed if the branch is not taken and must not be a branch.

It works by generating compact branches for MIPS32R6 when the delay slot
filler cannot fill a delay slot. Then, inspecting the generated code for
forbidden slot hazards (a compact branch with an adjacent branch or other
CTI) and inserting nops to clear this hazard.

Patch by Simon Dardis.

Reviewers: vkalintiris, dsanders

Subscribers: MatzeB, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16353

llvm-svn: 263444
2016-03-14 16:24:05 +00:00
Marek Olsak ed2213e6ef AMDGPU/SI: Incomplete shader binaries need to finish execution at the end
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D18058

llvm-svn: 263441
2016-03-14 15:57:14 +00:00
Nicolai Haehnle 74127fe8d7 AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergence
Summary:
When multiple threads perform an atomic op with the same arguments, they
will usually see different return values.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18101

llvm-svn: 263440
2016-03-14 15:37:18 +00:00
Benjamin Kramer 1082fa66a5 Revert "Recommitted r261633 "Supporting all entities declared in lexical scope in LLVM debug info." After fixing PR26715 at r263379."
This reverts commit r263424. Breaks self-host.

llvm-svn: 263437
2016-03-14 14:58:36 +00:00
Ulrich Weigand cdce026b4d [SystemZ] Avoid LER on z13 due to partial register dependencies
On the z13, it turns out to be more efficient to access a full
floating-point register than just the upper half (as done e.g.
by the LE and LER instructions).

Current code already takes this into account when loading from
memory by using the LDE instruction in place of LE.  However,
we still generate LER, which shows the same performance issues
as LE in certain circumstances.

This patch changes the back-end to emit LDR instead of LER to
implement FP32 register-to-register copies on z13.

llvm-svn: 263431
2016-03-14 13:50:03 +00:00
Zlatko Buljan fba68931ed [mips] Fix an issue with long double when function roundl is defined
Differential Revision: http://reviews.llvm.org/D17760

llvm-svn: 263428
2016-03-14 12:50:23 +00:00
Daniel Sanders 127d2d2b46 [mips] Range check uimm16_64
Summary:

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D17725

llvm-svn: 263427
2016-03-14 12:44:44 +00:00
Amjad Aboud ab0378b16c Recommitted r261633 "Supporting all entities declared in lexical scope in LLVM debug info."
After fixing PR26715 at r263379.

llvm-svn: 263424
2016-03-14 12:03:20 +00:00
Nikolay Haustov 79af6b33e0 [AMDGPU] Assembler: SOP* instruction fixes
s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit.
s_rfe_b64 has just one destination operand and no source.
Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that.
Add s_memrealtime test and change comments in smem.s to follow common style.
Change test for s_memtime to use non-zero register to make it really test encoding.
Add tests for s_buffer_load*.
Add tests for SOPC instructions (same for SI and VI)

Differential Revision: http://reviews.llvm.org/D18040

llvm-svn: 263420
2016-03-14 11:17:19 +00:00
Daniel Sanders 19b7f76afa [mips] Range check uimm6_lsl2.
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D17291

llvm-svn: 263419
2016-03-14 11:16:56 +00:00
Igor Breger a949100532 AVX512: icmp operation should be always lowered to CMPM (AVX-512) instruction on SKX.
implemented by delena

Differential Revision: http://reviews.llvm.org/D18054

llvm-svn: 263417
2016-03-14 10:26:39 +00:00
Haicheng Wu d60ae33d29 [CVP] Convert an SDiv to a UDiv if both operands are known to be nonnegative
The motivating example is this

for (j = n; j > 1; j = i) {
   i = j / 2;
}

The signed division is safely to be changed to an unsigned division (j is known
to be larger than 1 from the loop guard) and later turned into a single shift
without considering the sign bit.

llvm-svn: 263406
2016-03-14 03:24:28 +00:00
Simon Pilgrim 9b7aaafc6a [X86][XOP] Added target shuffle combine tests for XOP's VPPERM 2-op shuffle
Actual combing support will be added in a future patch

llvm-svn: 263402
2016-03-14 00:18:26 +00:00
Simon Pilgrim 5f1326f8cc [X86][SSE] Added truncated vector arithmetic tests.
For cases where we are truncating an integer vector arithmetic result, it may be better to pre-truncate the input operands - no code to support this yet (scalar is done with SimplifyDemandedBits but adding vector support could be a lot of work) but these tests represent the current codegen status.

Example bugs: PR14666, PR22703

llvm-svn: 263384
2016-03-13 19:08:01 +00:00
Simon Pilgrim 035b19ecf5 [X86][SSE41] Avoid variable blend for constant v8i16 shifts
The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.

llvm-svn: 263383
2016-03-13 18:35:59 +00:00
Amaury Sechet b325686764 Add echo test for constant data arrays in the LLVM C API
llvm-svn: 263350
2016-03-13 00:58:25 +00:00
Sanjay Patel 610da4fbaf update test to use FileCheck
llvm-svn: 263347
2016-03-12 21:09:26 +00:00
Sanjay Patel c4acbae63f [x86, InstCombine] delete x86 SSE2 masked store with zero mask
This follows up on the related AVX instruction transforms, but this
one is too strange to do anything more with. Intel's behavioral
description of this instruction in its Software Developer's Manual
is tragi-comic.

llvm-svn: 263340
2016-03-12 15:16:59 +00:00
Nemanja Ivanovic bd56e4e25a Fix for PR 26378
This patch corresponds to review:
http://reviews.llvm.org/D17712

We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).

llvm-svn: 263338
2016-03-12 10:23:07 +00:00
Chandler Carruth 4a543cc7a1 [lit] Hack lit to allow a test suite to request that it is run "early".
This lets us for example start running the unit test suite early. For
'check-llvm' on my machine, this drops the tim e from 44s to 32s!!!!!

It's pretty ugly. I barely know how to write Python, so feel free to
just tell me how I should write it instead. =D Thanks to Filipe and
others for help.

Differential Revision: http://reviews.llvm.org/D18089

llvm-svn: 263329
2016-03-12 03:03:31 +00:00
Quentin Colombet cf9732b417 [X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer.
cmpxchg[8|16]b uses RBX as one of its argument.
In other words, using this instruction clobbers RBX as it is defined to hold one
the input. When the backend uses dynamically allocated stack, RBX is used as a
reserved register for the base pointer. 

Reserved registers have special semantic that only the target understands and
enforces, because of that, the register allocator don’t use them, but also,
don’t try to make sure they are used properly (remember it does not know how
they are supposed to be used).

Therefore, when RBX is used as a reserved register but defined by something that
is not compatible with that use, the register allocator will not fix the
surrounding code to make sure it gets saved and restored properly around the
broken code. This is the responsibility of the target to do the right thing with
its reserved register.

To fix that, when the base pointer needs to be preserved, we use a different
pseudo instruction for cmpxchg that save rbx.
That pseudo takes two more arguments than the regular instruction:
- One is the value to be copied into RBX to set the proper value for the
  comparison.
- The other is the virtual register holding the save of the value of RBX as the
  base pointer. This saving is done as part of isel (i.e., we emit a copy from
  rbx).

cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp

This gets expanded into:
rbx = copy input_for_rbx_reg
cmpxchg <regular cmpxchg args>
rbx = save_of_rbx_as_bp

Note: The actual modeling of the pseudo is a bit more complicated to make sure
the interferes that appears after the pseudo gets expanded are properly modeled
before that expansion.

This fixes PR26883.

llvm-svn: 263325
2016-03-12 02:25:27 +00:00
Michael Zolotukhin 789ac4531c [LoopUnroll] Convert some existing tests to unit-tests.
Summary: As we now have unit-tests for UnrollAnalyzer, we can convert some existing tests to this format. It should make the tests more robust.

Reviewers: chandlerc, sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17904

llvm-svn: 263318
2016-03-12 01:28:56 +00:00
Mike Aizatsky 037fee594b [sancov] using md5 for anchors in attempt to reduce file size.
Differential Revision: http://reviews.llvm.org/D18102

llvm-svn: 263308
2016-03-11 23:28:28 +00:00
Simon Pilgrim 33d57c7547 [X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 263303
2016-03-11 22:18:05 +00:00
Ahmed Bougacha 171f7b9986 [AArch64] Don't blindly lower f16/f128 FCCMPs.
Instead, extend f16 (like we do when lowering a standalone SETCC),
and let f128 be legalized to the RT calls.

Fixes PR26803.

llvm-svn: 263301
2016-03-11 22:02:58 +00:00
Sanjoy Das b51325dbdb Introduce @llvm.experimental.deoptimize
Summary:
This intrinsic, together with deoptimization operand bundles, allow
frontends to express transfer of control and frame-local state from
one (typically more specialized, hence faster) version of a function
into another (typically more generic, hence slower) version.

In languages with a fully integrated managed runtime this intrinsic can
be used to implement "uncommon trap" like functionality.  In unmanaged
languages like C and C++, this intrinsic can be used to represent the
slow paths of specialized functions.

Note: this change does not address how `@llvm.experimental_deoptimize`
is lowered.  That will be done in a later change.

Reviewers: chandlerc, rnk, atrick, reames

Subscribers: llvm-commits, kmod, mjacob, maksfb, mcrosier, JosephTremoulet

Differential Revision: http://reviews.llvm.org/D17732

llvm-svn: 263281
2016-03-11 19:08:34 +00:00
Vedant Kumar e5a9a275d3 [PGO] Skip value profile instrumentation of inline asm
Value profile instrumentation treats inline asm calls like they are
indirect calls. This causes problems when the 'Callee' is passed to a
ptrtoint cast -- the verifier rightly claims that this is bogus and
crashes opt.

llvm-svn: 263278
2016-03-11 18:57:48 +00:00
Teresa Johnson 76a1c1d0ba [ThinLTO] Support for reference graph in per-module and combined summary.
Summary:
This patch adds support for including a full reference graph including
call graph edges and other GV references in the summary.

The reference graph edges can be used to make importing decisions
without materializing any source modules, can be used in the plugin
to make file staging decisions for distributed build systems, and is
expected to have other uses.

The call graph edges are recorded in each function summary in the
bitcode via a list of <CalleeValueIds, StaticCount> tuples when no PGO
data exists, or <CalleeValueId, StaticCount, ProfileCount> pairs when
there is PGO, where the ValueId can be mapped to the function GUID via
the ValueSymbolTable. In the function index in memory, the call graph
edges reference the target via the CalleeGUID instead of the
CalleeValueId.

The reference graph edges are recorded in each summary record with a
list of referenced value IDs, which can be mapped to value GUID via the
ValueSymbolTable.

Addtionally, a new summary record type is added to record references
from global variable initializers. A number of bitcode records and data
structures have been renamed to reflect the newly expanded scope of the
summary beyond functions. More cleanup will follow.

Reviewers: joker.eph, davidxl

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D17212

llvm-svn: 263275
2016-03-11 18:52:24 +00:00
Chad Rosier e6a5231628 Update test case to appease bots after 263255.
I'll follow up with Matt to confirm this is the correct fix.

llvm-svn: 263268
2016-03-11 17:33:36 +00:00
Quentin Colombet dd4b137364 [IRTranslator] Translate unconditional branches.
llvm-svn: 263265
2016-03-11 17:28:03 +00:00
Quentin Colombet 68c1061049 [GlobalISel][Target] Add an opcode for unconditional branch.
llvm-svn: 263259
2016-03-11 17:27:38 +00:00
Valery Pykhtin a7f480b4e9 [AMDGPU] Fix VOPC instruction operand namings
Differential Revision: http://reviews.llvm.org/D17966

llvm-svn: 263242
2016-03-11 14:53:28 +00:00
Simon Pilgrim 7ca9614c71 [X86][AVX] Fixed issue where a long chain of shuffles could attempt to combine to a single (illegal) PSHUFB instruction.
Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases.

llvm-svn: 263239
2016-03-11 14:39:10 +00:00
Vasileios Kalintiris e2cbc21b6f [mips] MIPSR6 Instruction itineraries
Summary: Defines instruction itineraries for common MIPSR6 instructions.

Patch by Simon Dardis.

Reviewers: vkalintiris

Subscribers: MatzeB, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D17198

llvm-svn: 263229
2016-03-11 13:05:06 +00:00
Daniel Sanders 78e8902097 [mips] Range check simm4.
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16811

llvm-svn: 263220
2016-03-11 11:37:50 +00:00
Nikolay Haustov 6560781c4f [AMDGPU] Assembler: change v_madmk operands to have same order as mad.
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.

Differential Revision: http://reviews.llvm.org/D17984

llvm-svn: 263212
2016-03-11 09:27:25 +00:00
Chandler Carruth 45a9c203a0 [PM/AA] Teach the AAManager how to handle module analyses in addition to
function analyses, and use it to wire up globals-aa to the new pass
manager.

llvm-svn: 263211
2016-03-11 09:15:11 +00:00
Chandler Carruth 89c45a162f [PM] Port GVN to the new pass manager, wire it up, and teach a couple of
tests to run GVN in both modes.

This is mostly the boring refactoring just like SROA and other complex
transformation passes. There is some trickiness in that GVN's
ValueNumber class requires hand holding to get to compile cleanly. I'm
open to suggestions about a better pattern there, but I tried several
before settling on this. I was trying to balance my desire to sink as
much implementation detail into the source file as possible without
introducing overly many layers of abstraction.

Much like with SROA, the design of this system is made somewhat more
cumbersome by the need to support both pass managers without duplicating
the significant state and logic of the pass. The same compromise is
struck here.

I've also left a FIXME in a doxygen comment as the GVN pass seems to
have pretty woeful documentation within it. I'd like to submit this with
the FIXME and let those more deeply familiar backfill the information
here now that we have a nice place in an interface to put that kind of
documentaiton.

Differential Revision: http://reviews.llvm.org/D18019

llvm-svn: 263208
2016-03-11 08:50:55 +00:00
Hrvoje Varga 893c68cfc9 [mips] Invalid tests for MTC0, MTC2, MFC0, MFC2, DMTC0, DMFC0 MIPS instructions
Differential Revision: http://reviews.llvm.org/D18037

llvm-svn: 263203
2016-03-11 08:00:11 +00:00
Matt Arsenault 9a19c240c0 AMDGPU: Materialize sign bits with bfrev
If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.

llvm-svn: 263201
2016-03-11 07:42:49 +00:00
Pete Cooper adebb9379a Remove llvm::getDISubprogram in favor of Function::getSubprogram
llvm::getDISubprogram walks the instructions in a function, looking for one in the scope of the current function, so that it can find the !dbg entry for the subprogram itself.

Now that !dbg is attached to functions, this should not be necessary. This patch changes all uses to just query the subprogram directly on the function.

Ideally this should be NFC, but in reality its possible that a function:

has no !dbg (in which case there's likely a bug somewhere in an opt pass), or
that none of the instructions had a scope referencing the function, so we used to not find the !dbg on the function but now we will

Reviewed by Duncan Exon Smith.

Differential Revision: http://reviews.llvm.org/D18074

llvm-svn: 263184
2016-03-11 02:14:16 +00:00
Evgeniy Stepanov 4dc3c8df45 [gold] Fix common symbols handling.
LLVM Gold plugin decides which instance of a common symbol it wants
based on the symbol size in claim_file_hook. If the file that
contains the chosen instance is later dropped from the link, we end
up with an undefined reference.

This change delays this decision until the set of the included files
is known.

llvm-svn: 263180
2016-03-11 00:51:57 +00:00
Adam Nemet efb234135c [LLE] Add missed LoopSimplify dependence
The code assumed that we always had a preheader without making the pass
dependent on LoopSimplify.

Thanks to Mattias Eriksson V for reporting this.

llvm-svn: 263173
2016-03-10 23:54:39 +00:00
Tim Northover 6092de5075 AArch64: only try to use scaled fcvt ops on legal vector types.
Before we ended up calling getSimpleVectorType on a <3 x float>, which
asserted.

llvm-svn: 263169
2016-03-10 23:02:21 +00:00
Simon Pilgrim 61eb49e437 [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).

Differential Revision: http://reviews.llvm.org/D17691

llvm-svn: 263159
2016-03-10 20:40:26 +00:00
Artur Pilipenko 3c8fc57e16 Support arbitrary addrspace pointers in masked load/store intrinsics
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270

llvm-svn: 263158
2016-03-10 20:39:22 +00:00
Peter Collingbourne aba16fca5d ARM: Support relative references using the PREL31 symbol variant.
Differential Revision: http://reviews.llvm.org/D17937

llvm-svn: 263156
2016-03-10 19:30:18 +00:00
Balaram Makam 4058e8fbed Fix testicase to turn buildbot green. NFC.
llvm-svn: 263154
2016-03-10 19:07:50 +00:00
Nicolai Haehnle b142770bfe AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.

The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.

For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.

BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17277

llvm-svn: 263140
2016-03-10 18:43:50 +00:00
Michael Kuperstein 8be8de6d62 [X86] Correctly select registers to pop into for x86_64
When trying to replace an add to esp with pops, we need to choose dead
registers to pop into. Registers clobbered by the call and not imp-def'd
by it should be safe. Except that it's not enough to check the register
itself isn't defined, we also need to make sure no overlapping registers
are defined either.

This fixes PR26711.

Differential Revision: http://reviews.llvm.org/D18029

llvm-svn: 263139
2016-03-10 18:43:21 +00:00
Balaram Makam e9b2725287 [AArch64] Optimize compare and branch sequence when the compare's constant operand is power of 2
Summary:
Peephole optimization that generates a single TBZ/TBNZ instruction
for test and branch sequences like in the example below. This handles
the cases that miss folding of AND into TBZ/TBNZ during ISelLowering of BR_CC

Examples:
   and  w8, w8, #0x400
   cbnz w8, L1
 to
   tbnz w8, #10, L1

Reviewers: MatzeB, jmolloy, mcrosier, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17942

llvm-svn: 263136
2016-03-10 17:54:55 +00:00
Sanjay Patel a333dcfc42 give regression test a meaningful name
llvm-svn: 263135
2016-03-10 17:52:19 +00:00
Alexandros Lamprineas 843164242e [ARM] Cortex-R8 support
This patch adds Cortex-R8 to Target Parser and TableGen.
It also adds CodeGen tests for the build attributes.

Patch by Pablo Barrio.

Differential Revision: http://reviews.llvm.org/D17925

llvm-svn: 263132
2016-03-10 17:38:41 +00:00
Changpeng Fang 278a5b31a5 AMDGPU/SI: Define S_GETREG Intrinsic
Summary:
 Define s_getreg intrinsic to generate s_getreg instruction to read
hardware registers.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17892

llvm-svn: 263124
2016-03-10 16:47:15 +00:00
Saleem Abdulrasool 1632fe1f77 ARM: follow up improvements for SVN r263118
The initial change was insufficiently complete for always getting the semantics
of __builtin_longjmp correct.  The builtin is translated into a
`tInt_eh_sjlj_longjmp` DAG node.  This node set R7 as clobbered.  However, the
code would then follow up with a clobber of R11.  I had failed to notice the
imp-def,kill on R7 in the isel.  Unfortunately, it seems that it is not possible
to conditionalise the Defs list via an !if.  Instead, construct a new parallel
WIN node and prefer that when targeting windows.  This ensures that we now both
correctly model the __builtin_longjmp as well as construct the frame in a more
ABI conformant manner.

llvm-svn: 263123
2016-03-10 16:26:37 +00:00