Commit Graph

35748 Commits

Author SHA1 Message Date
Marek Olsak 8a0f335ad6 AMDGPU/SI: Add support for non-void functions
Summary:
Return values can be stored in SGPRs (i32) and VGPRs (f32).

This will be used by functions which expect some bytecode or other binary to
be appended at the end. It allows defining in which registers the return
values will be stored.

v2: don't do this for compute shaders

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16033

llvm-svn: 257621
2016-01-13 17:23:04 +00:00
Derek Schuff 9c3bf3187a [WebAssemly] Invalidate liveness in CFG stackifier
WebAssemblyCFGStackify does not track liveness for EXPR_STACK, causing
verifier failure if liveness has not already been invalidated.

llvm-svn: 257620
2016-01-13 17:10:28 +00:00
Nicolai Haehnle 02c3291566 AMDGPU/SI: Add SI Machine Scheduler
Summary:
It is off by default, but can be used
with --misched=si

Patch by: Axel Davy

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D11885

llvm-svn: 257609
2016-01-13 16:10:10 +00:00
Michael Zuckerman 6b35f460ac Fixing warning by adding the X86ISD::VROTRI case.
Differential Revision: http://reviews.llvm.org/D16052 

llvm-svn: 257607
2016-01-13 15:48:42 +00:00
Krzysztof Parzyszek a3c5d44437 [Hexagon] Do not insert non-phis before phis in bit simplification
llvm-svn: 257606
2016-01-13 15:48:18 +00:00
Michael Zuckerman 0e31b22487 [AVX512] Adding PMOVSXBD/W/Q , PMOVZSDQ and PMOVZSWD/Q Intrinsics .
Differential Revision: http://reviews.llvm.org/D16111 

llvm-svn: 257604
2016-01-13 14:59:19 +00:00
Michael Zuckerman 43cea85db9 [AVX512] Adding PMOVZXBD/W/Q , PMOVZXDQ and PMOVZXWD/Q Intrinsics
Differential Revision:http://reviews.llvm.org/D16071

llvm-svn: 257601
2016-01-13 14:25:21 +00:00
Ulrich Weigand 46ff7ec317 [PowerPC] Fix large code model with the ELFv2 ABI
The global entry point prologue currently assumes that the TOC
associated with a function is less than 2GB away from the function
entry point.  This is always true when using the medium or small
code model, but may not be the case when using the large code model.

This patch adds a new variant of the ELFv2 global entry point prologue
that lifts the 2GB restriction when building with -mcmodel=large.
This works by emitting a quadword containing the distance from the
function entry point to its associated TOC immediately before the
entry point, and then using a prologue like:

ld r2,-8(r12)
add r2,r2,r12

Since creation of the entry point prologue is now split across two
separate routines (PPCLinuxAsmPrinter::EmitFunctionEntryLabel emits
the data word, PPCLinuxAsmPrinter::EmitFunctionBodyStart the prolog
code), I've switched to using named labels instead of just temporaries
to indicate the locations of the global and local entry points and the
new TOC offset data word.

These names are provided by new routines in PPCFunctionInfo modeled
after the existing PPCFunctionInfo::getPICOffsetSymbol.

Note that a corresponding change was committed to GCC here:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00355.html

Reviewers: hfinkel

Differential Revision: http://reviews.llvm.org/D15500

llvm-svn: 257597
2016-01-13 13:12:23 +00:00
Michael Zuckerman 298a680c80 [AVX512] adding PRORQ , PRORD , PRORLVQ and PRORLVD Intrinsics
Differential Revision: http://reviews.llvm.org/D16052

llvm-svn: 257594
2016-01-13 12:39:33 +00:00
Marek Olsak 4e99b6ec01 AMDGPU/SI: Allow more shader inputs
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16032

llvm-svn: 257593
2016-01-13 11:46:48 +00:00
Marek Olsak b6c8c3d165 AMDGPU/SI: Allow any number of PS inputs
Summary:
With the ability to concatenate shader binaries, the limit of 15 no longer
applies.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16031

llvm-svn: 257592
2016-01-13 11:46:10 +00:00
Marek Olsak fccabaf57e AMDGPU/SI: Add new target attribute InitialPSInputAddr
Summary:
This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM.
The register assigns VGPR locations to PS inputs, while the ENA register
determines whether or not they are loaded.

Mesa needs to set some inputs as not-movable, so that a pixel shader prolog
binary appended at the beginning can assume where some inputs are.

v2: Make PSInputAddr private, because there is never enough silly getters
    and setters for people to read.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16030

llvm-svn: 257591
2016-01-13 11:45:36 +00:00
Marek Olsak 926c56f50c AMDGPU/SI: Fix a bug in SIFoldOperands
Summary: ret.ll will contain a test for this

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16029

llvm-svn: 257590
2016-01-13 11:44:29 +00:00
Andrey Turetskiy 1ce2c9973f LEA code size optimization pass (Part 2): Remove redundant LEA instructions.
Make x86 OptimizeLEAs pass remove LEA instruction if there is another LEA
(in the same basic block) which calculates address differing only be a
displacement. Works only for -Oz.

Differential Revision: http://reviews.llvm.org/D13295

llvm-svn: 257589
2016-01-13 11:30:44 +00:00
James Y Knight 7699494f08 [SPARC] Revamp AnalyzeBranch and add ReverseBranchCondition.
AnalyzeBranch on X86 (and, previously, SPARC, which implementation was
copied from X86) tries to modify the branches based on block
layout (e.g. checking isLayoutSuccessor), when AllowModify is true.

The rest of the architectures leave that up to the caller, which can
call InsertBranch, RemoveBranch, and ReverseBranchCondition as
appropriate. That appears to be the preferred way to do it nowadays.

This commit makes SPARC like the rest: replaces AnalyzeBranch with an
implementation cribbed from AArch64, and adds a ReverseBranchCondition
implementation.

Additionally, a test-case has been added (also cribbed from AArch64)
demonstrating that redundant branch sequences no longer get emitted.

E.g., it used to emit code like this:
         bne .LBB1_2
         nop
         ba .LBB1_1
         nop
 .LBB1_2:

And now emits:
        cmp %i0, 42
        be .LBB1_1
        nop

llvm-svn: 257572
2016-01-13 04:44:14 +00:00
Ana Pazos 359cab3bb3 Guard fabs to bfc convert with V6T2 flag
Summary:
BFC instructions are available in ARMv6T2 and above.


Reviewers: t.p.northover

Subscribers: aemerson

Differential Revision: http://reviews.llvm.org/D16076

llvm-svn: 257546
2016-01-13 00:03:35 +00:00
Quentin Colombet f8e3030794 [ARM] Mark VMOV with immediate: isAsCheapAsMove.
VMOVs are not strictly speaking cheap, but they are as expensive as a vector
copy (VORR), so we should prefer rematerialization over splitting when it
applies.

rdar://problem/23754176

llvm-svn: 257545
2016-01-13 00:02:40 +00:00
Derek Schuff 4377e2d713 [WebAssembly] Fix disassembler shared-libs build
llvm-svn: 257536
2016-01-12 23:03:40 +00:00
Dan Gohman 0656f5f845 [WebAsssembly] Register the MC register info.
llvm-svn: 257525
2016-01-12 21:27:55 +00:00
Michael Zuckerman 2ddcbcf464 [AVX512] adding PROLQ and PROLD Intrinsics
Differential Revision: http://reviews.llvm.org/D16048

llvm-svn: 257523
2016-01-12 21:19:17 +00:00
Kyle Butt cec40806f1 Codegen: [PPC] Handle weighted comparisons when inserting selects.
Only non-weighted predicates were handled in PPCInstrInfo::insertSelect. Handle
the weighted predicates as well.

This latent bug was triggered by r255398, because it added use of the
branch-weighted predicates.

While here, switch over an enum instead of an int to get the compiler to enforce
totality in the future.

llvm-svn: 257518
2016-01-12 21:00:43 +00:00
Dan Gohman 4635017176 [WebAssembly] Add a EM_WEBASSEMBLY value, and several bits of code that use it.
A request has been made to the official registry, but an official value is
not yet available. This patch uses a temporary value in order to support
development. When an official value is recieved, the value of EM_WEBASSEMBLY
will be updated.

llvm-svn: 257517
2016-01-12 20:56:01 +00:00
Dan Gohman 3469ee120c [WebAssembly] Introduce a WebAssemblyTargetStreamer class.
Refactor .param, .result, .local, and .endfunc, as directives, using the
proper MCTargetStreamer mechanism, rather than fake instructions.

llvm-svn: 257511
2016-01-12 20:30:51 +00:00
Krzysztof Parzyszek f62d44be28 Replace inherited constructor with an explicit one
Some bots failed when the inherited constructor was used.

llvm-svn: 257508
2016-01-12 19:27:59 +00:00
Dan Gohman 1d68e80f26 [WebAssembly] Make CFG stackification independent of basic-block labels.
This patch changes the way labels are referenced. Instead of referencing the
basic-block label name (eg. .LBB0_0), instructions now just have an immediate
which indicates the depth in the control-flow stack to find a label to jump to.
This makes them much closer to what we expect to have in the binary encoding,
and avoids the problem of basic-block label names not being explicit in the
binary encoding.

Also, it terminates blocks and loops with end_block and end_loop instructions,
rather than basic-block label names, for similar reasons.

This will also fix problems where two constructs appear to have the same label,
because we no longer explicitly use labels, so consumers that need labels will
presumably create their own labels, and presumably they won't reuse labels
when they do.

This patch does make the code a little more awkward to read; as a partial
mitigation, this patch also introduces comments showing where the labels are,
and comments on each branch showing where it's branching to.

llvm-svn: 257505
2016-01-12 19:14:46 +00:00
Krzysztof Parzyszek 1279881315 [Hexagon] Implement RDF-based post-RA optimizations
- Handle simple cases of register copies (what current RDF CP allows).
- Hexagon-specific dead code elimination: handles dead address updates
  in post-increment instructions.

llvm-svn: 257504
2016-01-12 19:09:01 +00:00
Krzysztof Parzyszek c09d630e50 RDF: Copy propagation
This is a very limited implementation of DFG-based copy propagation.
It only handles actual COPY instructions (does not handle other equivalents
such as add-immediate with a 0 operand).
The major limitation is that it does not update the DFG: that will be the
change required to make it more robust (hopefully coming up soon).

llvm-svn: 257490
2016-01-12 17:23:48 +00:00
Tom Stellard f421837250 AMDGPU: Emit note directive for HSA even if there are no functions
Reviewers: arsenm, echristo

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16010

llvm-svn: 257488
2016-01-12 17:18:17 +00:00
Krzysztof Parzyszek 6f4000e763 RDF: Dead code elimination
Utility class to perform DFG-based dead code elimination.

llvm-svn: 257485
2016-01-12 17:01:16 +00:00
Krzysztof Parzyszek 8dca45efa8 Fix compiler warnings from r257477
llvm-svn: 257483
2016-01-12 16:51:55 +00:00
Krzysztof Parzyszek acdff46a9c RDF: Implement register liveness analysis
Compute block live-ins and operand kill flags from the DFG.

llvm-svn: 257480
2016-01-12 15:56:33 +00:00
Daniel Sanders 5e1d5a789a [mips] Correct operand order in DSP's mthi/mtlo
Summary: The result register is the second operand as per the other mt* instructions.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D15993

llvm-svn: 257478
2016-01-12 15:15:14 +00:00
Krzysztof Parzyszek b5b5a1d7ad Register Data Flow: data flow graph
Target independent, SSA-based data flow framework for representing
data flow between physical registers.

This commit implements the creation of the actual data flow graph.

llvm-svn: 257477
2016-01-12 15:09:49 +00:00
Benjamin Kramer ab8cc02ba5 [Hexagon] Make helper function static. NFC.
llvm-svn: 257476
2016-01-12 14:58:49 +00:00
Keno Fischer 00021429d4 [ARM] Fix several state persistence bugs
Summary:
This fixes three bugs, in all of which state is not or incorrecly reset between
objects (i.e. when reusing the same pass manager to create multiple object
files):
1) AttributeSection needs to be reset to nullptr, because otherwise the backend
   will try to emit into the old object file's attribute section causing a
   segmentation fault.
2) MappingSymbolCounter needs to be reset, otherwise the second object file
   will start where the first one left off.
3) The MCStreamer base class resets the Streamer's e_flags settings. Since
   EF_ARM_EABI_VER5 is set on streamer creation, we need to set it again
   after the MCStreamer was rest.

Also rename Reset (uppser case) to EHReset to avoid confusion with
reset (lower case).

Reviewers: rengolin
Differential Revision: http://reviews.llvm.org/D15950

llvm-svn: 257473
2016-01-12 13:38:15 +00:00
Andrey Turetskiy fed110f646 Test commit access - tiny comment and code style fix.
llvm-svn: 257472
2016-01-12 13:34:11 +00:00
Robert Lougher 6abd69a60b The isel pattern that selects the memory-register form of VCVTPH2PS
(64 to 128-bit) matches against the pattern fragment 'vzmovl_v2i64'
(a zero-extended 64-bit load).

However, a change in r248784 teaches the instruction combiner that only
the lower 64 bits of the input to a 128-bit vcvtph2ps are used.  This means
the instruction combiner will ordinarily optimize away the upper 64-bit
insertelement instruction in the zero-extension and so we no longer select
the memory-register form.  To fix this a new pattern has been added.

Differential Revision: http://reviews.llvm.org/D16067

llvm-svn: 257470
2016-01-12 11:48:25 +00:00
Igor Breger ea8e8e9f97 AVX512: VPMOVAPS/PD and VPMOVUPS/PD (load) intrinsic implementation.
Differential Revision: http://reviews.llvm.org/D16042

llvm-svn: 257463
2016-01-12 10:02:32 +00:00
Dan Gohman 1a42728719 [WebAssembly] Implement a prototype instruction encoder and disassembler.
This is using an extremely simple temporary made-up binary format, not the
official binary format (which isn't defined yet).

llvm-svn: 257440
2016-01-12 03:32:29 +00:00
Dan Gohman afd7e3ada8 [WebAssembly] Register the MC subtarget info.
llvm-svn: 257439
2016-01-12 03:30:06 +00:00
Dan Gohman a11fb2373c [WebAssembly] Define OperandTypes for decoding immediate values.
llvm-svn: 257438
2016-01-12 03:09:16 +00:00
Dan Gohman 85159ca224 [WebAssembly] Use TSFlags instead of keeping a list of special-case opcodes.
llvm-svn: 257433
2016-01-12 01:45:12 +00:00
Manman Ren ed967f3752 CXX_FAST_TLS calling convention: performance improvement for x86-64.
This is the same change on x86-64 as r255821 on AArch64.
rdar://9001553

llvm-svn: 257428
2016-01-12 01:08:46 +00:00
Manman Ren 5e9e65e705 CXX_FAST_TLS calling convention: performance improvement for ARM.
This is the same change on ARM as r255821 on AArch64.
rdar://9001553

llvm-svn: 257424
2016-01-12 00:47:18 +00:00
Manman Ren 1602605bf8 CXX_FAST_TLS calling convention: Add support for ARM on Darwin.
rdar://9001553

llvm-svn: 257417
2016-01-11 23:50:43 +00:00
Dan Gohman 26c6765bd6 [WebAssembly] Define WebAssembly-specific relocation codes.
Currently WebAssembly has two kinds of relocations; data addresses and
function addresses. This adds ELF relocations for them, as well as an
MC symbol kind to indicate which type of relocation is needed.

llvm-svn: 257416
2016-01-11 23:38:05 +00:00
Dan Gohman f225a63849 [WebAssembly] Reorganize address offset folding.
Always expect tglobaladdr and texternalsym to be wrapped in
WebAssemblywrapper nodes. Also, split out a regPlusGA from regPlusImm so
that it can special-case global addresses, as they can be folded in more
cases.

Unfortunately this doesn't enable any new optimizations yet due to
SelectionDAG limitations. I'll be submitting changes to the SelectionDAG
infrastructure, along with tests, in a separate patch.

llvm-svn: 257394
2016-01-11 22:05:44 +00:00
Matt Arsenault 5e0bdb8b95 AMDGPU: Implement {{s|u}}int_to_fp i64 -> f32
The old lowering for uint_to_fp failed opencl conformance.
It might be OK for fast math mode, but I'm not sure.

llvm-svn: 257393
2016-01-11 22:01:48 +00:00
Matt Arsenault 800fecf9de AMDGPU: Fix crash with dispatch.ptr intrinsic with non-HSA target
It might be better to let this be a select failure instead.

llvm-svn: 257386
2016-01-11 21:18:33 +00:00
Matt Arsenault 5319b0add5 AMDGPU: Fix ctlz combine for sub 32-bit types
llvm-svn: 257353
2016-01-11 17:02:06 +00:00
Matt Arsenault de5fbe9c60 AMDGPU: Pattern match ffbh pattern to instruction.
The hardware instruction's output on 0 is -1 rather than 32.
Eliminate a test and select to -1. This removes an extra instruction
from the compatability function with HSAIL's firstbit instruction.

llvm-svn: 257352
2016-01-11 17:02:00 +00:00
Matt Arsenault f058d67643 AMDGPU: Custom lower i64 ctlz
llvm-svn: 257348
2016-01-11 16:50:29 +00:00
Matt Arsenault a0e5cd55ad Mips: Remove lowerSELECT_CC
This is the same as the default expansion.

llvm-svn: 257346
2016-01-11 16:44:48 +00:00
Matt Arsenault 5ca3c72c5a LegalizeDAG: Expand ctlz with ctlz_zero_undef if legal
llvm-svn: 257345
2016-01-11 16:37:46 +00:00
Matt Arsenault 02d45dfeda AMDGPU: Remove dead target dag combine
llvm-svn: 257344
2016-01-11 16:37:40 +00:00
Daniel Sanders 4d32300cfd [mips] Never select JAL for calls to an absolute immediate address.
Summary:
It actually takes an offset into the current PC-region.

This fixes the 'expr' command in lldb.

Reviewers: vkalintiris, jaydeep, bhushan

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16054

llvm-svn: 257339
2016-01-11 15:57:46 +00:00
Krzysztof Parzyszek bc17b68a47 [Hexagon] Add check for nullptr in getFixupNoBits
llvm-svn: 257338
2016-01-11 15:51:53 +00:00
Krzysztof Parzyszek f49a8411f8 [Hexagon] Add implicit uses of GP to GP-relative loads and stores
llvm-svn: 257337
2016-01-11 15:49:58 +00:00
Krzysztof Parzyszek b024445444 [Hexagon] Mark D14 and GP as reserved registers
llvm-svn: 257336
2016-01-11 15:47:41 +00:00
Alexey Bataev 28f0c5efec [X86] Reduce complexity of the LEA optimization pass, by Andrey Turetsky.
In the OptimizeLEA pass keep instructions' positions in the basic block saved and use them for calculation of the distance between two instructions instead of std::distance. This reduces complexity of the pass from O(n^3) to O(n^2) and thus the compile time.
Differential Revision: http://reviews.llvm.org/D15692

llvm-svn: 257328
2016-01-11 11:52:29 +00:00
Craig Topper 9d2cab7742 [AVX-512] Remove another extra space from the Intel syntax asm strings.
llvm-svn: 257304
2016-01-11 01:03:40 +00:00
Craig Topper 9feea57844 [AVX-512] Remove more superfluous spaces from asm strings.
llvm-svn: 257301
2016-01-11 00:44:58 +00:00
Craig Topper 156622ad9d [AVX-512] Remove unused Round and Itinerary from the maskable_cmp multiclasses. They weren't used and there were extra spaces in the asm string to prepare for the concatenations of the round string that wasn't ever used.
llvm-svn: 257300
2016-01-11 00:44:56 +00:00
Craig Topper bfe13ff6ca [AVX-512] Make spacing between comma and {sae} operand consistent in asm strings.
llvm-svn: 257299
2016-01-11 00:44:52 +00:00
Craig Topper 5be407ab27 [X86] Remove extra spaces from MPX instruction asm strings.
llvm-svn: 257298
2016-01-11 00:44:46 +00:00
Elena Demikhovsky 542dfcf44c Optimized instruction sequence for sitofp operation on X86-32
Optimized sitofp i64 %x to double. The current sequence

movl %ecx, 8(%esp) 
movl %edx, 12(%esp) 
fildll 8(%esp)

is replaced with:

movd %ecx, %xmm0 
movd %edx, %xmm1 
punpckldq %xmm1, %xmm0 
movq %xmm0, 8(%esp)

Differential Revision: http://reviews.llvm.org/D15946

llvm-svn: 257285
2016-01-10 09:41:22 +00:00
Michael Zuckerman 885f61c534 [AVX512] add PRORVQ and PRORVD Intrinsic
Differential Revision:http://reviews.llvm.org/D15955

llvm-svn: 257283
2016-01-10 09:16:41 +00:00
Simon Pilgrim c7bebcbfd8 [X86][AVX] Match broadcast loads through a bitcast
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur.

This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars.

llvm-svn: 257266
2016-01-09 20:59:39 +00:00
Simon Pilgrim 2e7a1849c9 [X86][AVX] Add support for i64 broadcast loads on 32-bit targets
Added 32-bit AVX1/AVX2 broadcast tests.

llvm-svn: 257264
2016-01-09 19:59:27 +00:00
Tobias Edler von Koch ccd3bfc3c8 [Hexagon] Replace a static member variable in HexagonCVIResource (NFC)
This creates one instance of TUL per HexagonShuffler, which avoids thread-safety
issues with future changes.

llvm-svn: 257215
2016-01-08 22:07:25 +00:00
Weiming Zhao 4b3b13d3bc RBIT Instruction only available for ARMv6t2 and above.
Summary:
r255334 matches bit-reverse pattern in InstCombine and generates calls to Instrinsic::bitreverse.

RBIT instruction is only available for ARMv6t2 and above. This patch has the intrinsic expanded during legalization for ARMv4 and ARMv5.

Patch by Z. Zheng <zhaoshiz@codeaurora.org>

Reviewers: apazos, jmolloy, weimingz

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D15932

llvm-svn: 257188
2016-01-08 18:43:41 +00:00
Weiming Zhao 48c033e021 Disable shrink-wrap for Thumb1
Summary: In ARMConstantIslandPass, which runs after Shrink Wrap pass, long jumps will be fixed up as BL (tBfar) which depends on spilling LR in epilogue.  However, shrink-wrap may remove the LR, which causes issues when the function returns.

Reviewers: qcolombet, rengolin

Subscribers: aemerson, rengolin

Differential Revision: http://reviews.llvm.org/D15984

llvm-svn: 257187
2016-01-08 18:37:43 +00:00
Tom Stellard 4c4c72db48 AMDGPU/SI: Emit global variable sizes when targeting HSA
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15952

llvm-svn: 257173
2016-01-08 14:50:28 +00:00
Tom Stellard ad8f5e8111 AMDGPU: Emit functions sizes
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15951

llvm-svn: 257172
2016-01-08 14:50:23 +00:00
Nemanja Ivanovic 2314e83227 Prevent renaming of CR fields in AADB when a CR restore is present
This patch corresponds to review:
http://reviews.llvm.org/D15930

Moves to and from CR fields depend on shifts/masks that depend on the
target/source CR field. Thus, post-ra anti-dep breaking must not later
change that CR register assignment.

llvm-svn: 257168
2016-01-08 13:09:54 +00:00
Dylan McKay cc6733aa83 [AVR] Added AVRSelectionDAGInfo header file
llvm-svn: 257152
2016-01-08 06:32:27 +00:00
Craig Topper 048e700828 [AVX-512] Remove superfluous spaces from some asm strings.
llvm-svn: 257150
2016-01-08 06:09:20 +00:00
Craig Topper 04493fda81 [X86] Don't print the aliased version of CVTSD2SI64rm. This appears to be a mistake I made years ago.
llvm-svn: 257149
2016-01-08 06:09:18 +00:00
Craig Topper 29510c0430 [X86] Use \t instead of space after mnemonics in a bunch InstAliases for consistency.
llvm-svn: 257148
2016-01-08 06:09:13 +00:00
Kyle Butt bfcff3856a Add call sequence start and end for __tls_get_addr
This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839.

For a PIC TLS variable access in a function, prologue (mflr followed by std and
stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but
no one saves/restores it.

Also added a test for save/restore clobbered registers during calling __tls_get_addr.

Patch by Tim Shen

llvm-svn: 257137
2016-01-08 02:06:19 +00:00
Dan Gohman 4ef99433aa [WebAssembly] Minor code cleanups. NFC.
llvm-svn: 257131
2016-01-08 01:18:00 +00:00
Dan Gohman 35e4a28947 [WebAssembly] Minor code cleanups. NFC.
llvm-svn: 257128
2016-01-08 01:06:00 +00:00
Dan Gohman 8633eedb30 [WebAssembly] Remove an unused def : Pat.
WebAssemblyISelLowering.cpp does not wrap jump table nodes inside of
WebAssemblywrapper nodes, so this pattern is not currently used.

llvm-svn: 257127
2016-01-08 00:50:33 +00:00
Dan Gohman cceedf79b4 [WebAssembly] Remove unused arguments, unused functions. NFC.
llvm-svn: 257125
2016-01-08 00:43:54 +00:00
Eric Christopher b793230797 Add some testing for thumb1 and thumb2 inline asm immediate constraints
and fix a couple of bugs on inspection.

Also fixes PR26061.

llvm-svn: 257122
2016-01-08 00:34:44 +00:00
JF Bastien b9ec4c6cea WebAssembly: use .skip instead of .zero directive
.zero is confusing when used with two arguments. Documentation:

  This directive emits SIZE 0-valued bytes.  SIZE must be an absolute
  expression.  This directive is actually an alias for the '.skip'
  directive so in can take an optional second argument of the value to
  store in the bytes instead of zero.  Using '.zero' in this way would be
  confusing however.

Ref: https://sourceware.org/bugzilla/show_bug.cgi?id=18353

Hexagon and Sparc do the same, and it's all the same to WebAssembly so
let's pick the less confusing of the two.

llvm-svn: 257111
2016-01-07 23:18:29 +00:00
JF Bastien 841085c561 WebAssembly: update expected failures, more assert got resolved.
llvm-svn: 257098
2016-01-07 21:00:37 +00:00
JF Bastien d9d2892668 WebAssembly: update expected failures, assert got resolved by r257084.
llvm-svn: 257093
2016-01-07 20:07:21 +00:00
Derek Schuff 9bfea27c26 [WebAssembly] Support combining GEP and FrameIndex offsets in memory operand offset field
Previously we only supported putting the FI into memory operand offset
fields if there was nothing there already. Now combine them.

Differential Revision: http://reviews.llvm.org/D15941

llvm-svn: 257084
2016-01-07 18:55:52 +00:00
Dan Gohman a4730cf0b4 [WebAssembly] Use the default private label prefixes.
The MC assembler doesn't like using the empty string as a private label
prefix because then it treats all labels as private. This commit reverts
back to the default prefix, which is .L, which is common in ELF targets
and consistent with the LLVM name mangler.

llvm-svn: 257083
2016-01-07 18:49:53 +00:00
Nicolai Haehnle 82fc962c20 AMDGPU/SI: Fold operands with sub-registers
Summary:
Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs,
increasing the code size and VGPR pressure. These moves are now folded away.

Note that this lack of operand folding was not a problem for VMEM loads,
because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register
coalescer.

Some tests are updated, note that the fsub.ll test explicitly checks that
the move is elided.

With the IR generated by current Mesa, the changes are obviously relatively
minor:

7063 shaders in 3531 tests
Totals:
SGPRS: 351872 -> 352560 (0.20 %)
VGPRS: 199984 -> 200732 (0.37 %)
Code Size: 9876968 -> 9881112 (0.04 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave
Wait states: 295164 -> 295337 (0.06 %)

Totals from affected shaders:
SGPRS: 65784 -> 66472 (1.05 %)
VGPRS: 38064 -> 38812 (1.97 %)
Code Size: 1993828 -> 1997972 (0.21 %) bytes
LDS: 42 -> 42 (0.00 %) blocks
Scratch: 795648 -> 783360 (-1.54 %) bytes per wave
Wait states: 54026 -> 54199 (0.32 %)

Reviewers: tstellarAMD, arsenm, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15875

llvm-svn: 257074
2016-01-07 17:10:29 +00:00
Nicolai Haehnle 3c05d6d3b5 AMDGPU/SI: xnack_mask is always reserved on VI
Summary:
Somehow, I first interpreted the docs as saying space for xnack_mask is only
reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and
went back to actually test what is happening, and it turns out that xnack_mask
is always reserved at least on Tonga and Carrizo, in the sense that flat_scr
is always fixed below the SGPRs that are used to implement xnack_mask, whether
or not they are actually used.

I confirmed this by writing a shader using inline assembly to tease out the
aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where
we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so
xnack_mask is s[76:77] and vcc is s[78:79]).

This patch changes both the calculation of the total number of SGPRs and the
various register reservations to account for this.

It ought to be possible to use the gap left by xnack_mask when the feature
isn't used, but this patch doesn't try to do that. (Note that the same applies
to vcc.)

Note that previously, even before my earlier change in r256794, the SGPRs that
alias to xnack_mask could end up being used as well when flat_scr was unused
and the total number of SGPRs happened to fall on the right alignment
(e.g. highest regular SGPR being used s29 and VCC used would lead to number
of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there
were some conflict due to such aliasing, we should have noticed that already.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15898

llvm-svn: 257073
2016-01-07 17:10:20 +00:00
Michael Zuckerman 3aca221b31 [AVX512] add PSLLW and PSLLV Intrinsic
Differential Revision: http://reviews.llvm.org/D15889

llvm-svn: 257070
2016-01-07 16:02:51 +00:00
Nico Weber 4324b9b236 Revert r257055, it caused PR26064.
llvm-svn: 257066
2016-01-07 15:01:46 +00:00
Michael Zuckerman 354152d590 [AVX512] add PSRAV Intrinsic
Differential Revision: http://reviews.llvm.org/D15856

llvm-svn: 257063
2016-01-07 14:42:20 +00:00
Amjad Aboud d7cfb48485 Added support for macro emission in dwarf (supporting DWARF version 4).
Differential Revision: http://reviews.llvm.org/D15495

llvm-svn: 257060
2016-01-07 14:28:20 +00:00
Michael Zuckerman a6df006b50 [AVX512] add PSHUFHW and PSHUFLW Intrinsic
Differential Revision: http://reviews.llvm.org/D15925

llvm-svn: 257056
2016-01-07 12:35:43 +00:00
Simon Pilgrim bcc11a059e [X86][AVX] Match broadcast loads through a bitcast
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur.

Follow up to D15310

llvm-svn: 257055
2016-01-07 11:34:27 +00:00
Dylan McKay 5c96de3ad7 Added AVRTargetObjectFile class and AVR.h
llvm-svn: 257049
2016-01-07 10:53:15 +00:00
Simon Pilgrim 83e44c66ae [X86][SSE} Add INSERTPS as a target shuffle
Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS.

llvm-svn: 257046
2016-01-07 10:24:19 +00:00