Commit Graph

40345 Commits

Author SHA1 Message Date
Craig Topper 7b9cc1474f [AVX-512] Use 'vnot' instead of 'not' in patterns involving vXi1 vectors.
This fixes selection of KANDN instructions and allows us to remove an extra set of patterns for KNOT and KXNOR.

Reviewers: delena, igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26134

llvm-svn: 285878
2016-11-03 06:04:28 +00:00
Elena Demikhovsky caaceef4b3 Expandload and Compressstore intrinsics
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.

llvm-svn: 285876
2016-11-03 03:23:55 +00:00
Krzysztof Parzyszek ead77016d8 [Hexagon] Remove registers coalesced in expand-condsets from live intervals
llvm-svn: 285846
2016-11-02 17:59:54 +00:00
Nicolai Haehnle 368972c3b3 AMDGPU: Allow additional implicit operands on MOVRELS instructions
Summary:
The post-RA scheduler occasionally uses additional implicit operands when
the vector implicit operand as a whole is killed, but some subregisters
are still live because they are directly referenced later. Unfortunately,
this seems incredibly subtle to reproduce.

Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test
and others.

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25656

llvm-svn: 285835
2016-11-02 17:03:11 +00:00
Malcolm Parsons 06ac79c210 Fix Clang-tidy readability-redundant-string-cstr warnings
Reviewers: beanz, lattner, jlebar

Subscribers: jholewinski, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D26235

llvm-svn: 285832
2016-11-02 16:43:50 +00:00
Nirav Dave 0a392a8e7f [ARM][MC] Cleanup ARM Target Assembly Parser
Summary:
Correctly parse end-of-statement tokens and handle preprocessor
end-of-line comments in ARM assembly processor.

Reviewers: rnk, majnemer

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26152

llvm-svn: 285830
2016-11-02 16:22:51 +00:00
Vasileios Kalintiris e3bb72ea78 [mips] Always run the MipsOptimizePICCall pass.
Summary:
Remove this pass from addMachineSSAOptimization() and register it unconditionally in through addPreRegAlloc(). This pass is required for generating correct PIC calls.

Reviewers: sdardis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26036

llvm-svn: 285814
2016-11-02 15:11:27 +00:00
Joerg Sonnenberger bef3621ad0 Create the virtual register for the global base in the intersection of
GPRC and GPRC_NOR0 (or the 64bit equivalent) and not just the latter.
GPRC_NOR0 contains ZERO as alternative meaning of r0 and is therefore
not a true subclass of GPRC.

llvm-svn: 285813
2016-11-02 15:00:31 +00:00
Aaron Ballman 3ac3a7efff Removing a switch statement that contains a default label, but no case labels. Silences an MSVC warning; NFC.
llvm-svn: 285806
2016-11-02 13:58:57 +00:00
Ulrich Weigand 75d2f1b10d [SystemZ] Fix compiler warnings introduced by r285574
SystemZAsmParser::parseOperand returns a bool, not an enum.

llvm-svn: 285800
2016-11-02 11:32:28 +00:00
Kirill Bobyrev 1f1751182e [llvm] FIx if-clause -Wmisleading-indentation issue.
While bootstrapping Clang with recent `gcc 6.2.0` I found a bug related to misleading indentation.

I believe, a pair of `{}` was forgotten, especially given the above similar piece of code:

```
      if (!RDef || !HII->isPredicable(*RDef)) {
        Done = coalesceRegisters(RD, RegisterRef(S1));
        if (Done) {
          UpdRegs.insert(RD.Reg);
          UpdRegs.insert(S1.getReg());
        }
      }
```

Reviewers: kparzysz

Differential Revision: https://reviews.llvm.org/D26204

llvm-svn: 285794
2016-11-02 10:00:40 +00:00
Dylan McKay 7549b0a013 [AVR] Add instruction selection lowering code
Summary: This adds AVRISelLowering.cpp

Reviewers: arsenm, kparzysz

Subscribers: llvm-commits, modocache, japaric, wdng, beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25034

llvm-svn: 285790
2016-11-02 06:47:40 +00:00
Peter Collingbourne 4e76019e34 Support: Remove MemoryObject and DataStreamer interfaces.
These interfaces are no longer used.

Differential Revision: https://reviews.llvm.org/D26222

llvm-svn: 285774
2016-11-02 00:08:37 +00:00
Alex Bradbury 6b2cca7f8f [RISCV] Add bare-bones RISC-V MCTargetDesc
This is enough to compile and link but doesn't yet do anything particularly 
useful. Once an ASM parser and printer are added in the next two patches, the 
whole thing can be usefully tested.

Differential Revision: https://reviews.llvm.org/D23562

llvm-svn: 285770
2016-11-01 23:47:30 +00:00
Alex Bradbury 24d9b13b36 [RISCV 4/10] Add basic RISCV{InstrFormats,InstrInfo,RegisterInfo,}.td
For now, only add instruction definitions for basic ALU operations. Our 
initial target is a working MC layer rather than codegen, so appropriate 
SelectionDAG patterns will come later.

Differential Revision: https://reviews.llvm.org/D23561

llvm-svn: 285769
2016-11-01 23:40:28 +00:00
Matt Arsenault c507cdb4bc AMDGPU: Handle CopyToReg in getOperandRegClass
llvm-svn: 285768
2016-11-01 23:22:17 +00:00
Matt Arsenault 663ab8c119 AMDGPU: Use brev for materializing SGPR constants
This is already done with VGPR immediates and saves 4 bytes.

llvm-svn: 285765
2016-11-01 23:14:20 +00:00
Matt Arsenault 3d463193a9 AMDGPU: Default to using scalar mov to materialize immediate
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.

Also start verifying the register classes of inlineasm operands.

llvm-svn: 285762
2016-11-01 22:55:07 +00:00
Matt Arsenault a6319b82ca AMDGPU: Stop creating unused virtual registers
These are only used in the spill to VMEM path. Move them to
the one use.

llvm-svn: 285756
2016-11-01 21:58:07 +00:00
Matt Arsenault 2d8c289b4b AMDGPU: Workaround for instruction size with literals
Instructions with a 32-bit base encoding with an optional
32-bit literal encoded after them report their size as 4
for the disassembler. Consider these when computing the
MachineInstr size. This fixes problems caused by size estimate
consistency in BranchRelaxation.

llvm-svn: 285743
2016-11-01 20:42:24 +00:00
Krzysztof Parzyszek 654dc11b79 [Hexagon] Rename operand/predicate names for unshifted integers
For example, rename s6Ext to s6_0Ext. The names for shifted integers
include the underscore and this will make the naming consistent. It
also exposed a few duplicates that were removed.

llvm-svn: 285728
2016-11-01 19:02:10 +00:00
Konstantin Zhuravlyov d971a1123f [AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32
This will prevent following regression when enabling i16 support (D18049):

test/CodeGen/AMDGPU/ctlz.ll
test/CodeGen/AMDGPU/ctlz_zero_undef.ll

Differential Revision: https://reviews.llvm.org/D25802

llvm-svn: 285716
2016-11-01 17:49:33 +00:00
Alex Bradbury b2e5472d85 [RISCV] Add stub backend
This contains just enough for lib/Target/RISCV to compile. Notably a basic 
RISCVTargetMachine and RISCVTargetInfo. At this point you can attempt llc 
-march=riscv32 myinput.ll and will find it fails due to the lack of 
MCAsmInfo.

See http://lists.llvm.org/pipermail/llvm-dev/2016-August/103748.html for 
further discussion

Differential Revision: https://reviews.llvm.org/D23560

llvm-svn: 285712
2016-11-01 17:27:54 +00:00
Tom Stellard 9677b60288 AMDGPU: Fix buildbots broken by r285704
llvm-svn: 285711
2016-11-01 17:20:03 +00:00
Alex Bradbury 58eba09949 [TableGen] Move OperandMatchResultTy enum to MCTargetAsmParser.h
As it stands, the OperandMatchResultTy is only included in the generated
header if there is custom operand parsing. However, almost all backends
make use of MatchOperand_Success and friends from OperandMatchResultTy for
e.g. parseRegister. This is a pain when starting an AsmParser for a new
backend that doesn't yet have custom operand parsing. Move the enum to
MCTargetAsmParser.h.

This patch is a prerequisite for D23563

Differential Revision: https://reviews.llvm.org/D23496

llvm-svn: 285705
2016-11-01 16:32:05 +00:00
Tom Stellard 94c21bc088 AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64
I wanted to implement this as a target independent expansion, however when
targets say they want to expand FP_TO_FP16 what they actually want is
the unsafe math expansion when possible and expansion to a libcall in all
other cases.

The only way to make this work as a target independent would be to add logic
to target's TargetLowering construction to mark theses nodes as Expand when
LegalizeDAG can use the unsafe expansion and mark them as LibCall when it
cannot.  I think this would be possible, but I think it would be too fragile
and complex as it would require targets to keep their expansion logic up
to date with the code in LegalizeDAG.

Reviewers: bogner, ab, t.p.northover, arsenm

Subscribers: wdng, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25999

llvm-svn: 285704
2016-11-01 16:31:48 +00:00
James Molloy 70a3d6df52 [Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables
[Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment]

The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.

It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.

TBB example:
Before: lsls r0, r0, #2    After: add  r0, pc
        adr  r1, .LJTI0_0         ldrb r0, [r0, #6]
        ldr  r0, [r0, r1]         lsls r0, r0, #1
        mov  pc, r0               add  pc, r0
  => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.

The only case that can increase dynamic instruction count is the TBH case:

Before: lsls r0, r4, #2    After: lsls r4, r4, #1
        adr  r1, .LJTI0_0         add  r4, pc
        ldr  r0, [r0, r1]         ldrh r4, [r4, #6]
        mov  pc, r0               lsls r4, r4, #1
                                  add  pc, r4
  => 1 more instruction in prologue. Jump table shrunk by a factor of 2.

So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)

llvm-svn: 285690
2016-11-01 13:37:41 +00:00
Valery Pykhtin 8a89d3662a [AMDGPU] Expand vector mulhu/mulhs
Differential revision: https://reviews.llvm.org/D26077

llvm-svn: 285684
2016-11-01 10:26:48 +00:00
Nemanja Ivanovic e70fa63390 [PowerPC] Implement vector shift builtins - llvm portion
This patch corresponds to review https://reviews.llvm.org/D26095.
Committing on behalf of Tony Jiang.

llvm-svn: 285681
2016-11-01 09:42:32 +00:00
Matt Arsenault f3dd863031 AMDGPU: Whitespace fixes
llvm-svn: 285659
2016-11-01 00:55:14 +00:00
Davide Italiano 51cbe13a3f [Hexagon] Garbage collect dead code.
llvm-svn: 285654
2016-10-31 22:56:56 +00:00
Saleem Abdulrasool e1aa782bd0 CodeGen: further loosen -O0 CG for WoA division
Generate the slowest possible codepath for noopt CodeGen.  Even trying to be
clever with the negated jump can cause out-of-range jumps.  Use a wide branch
instead. Although the code is modelled simplistically, the later optimizations
would recombine the branching into `cbz` if possible.  This re-enables the
previous optimization as well as hopefully gives us working code in all cases.

Addresses PR30356!

llvm-svn: 285649
2016-10-31 22:12:37 +00:00
Justin Lebar ed1e312f05 [NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.
Summary:
This has been replaced by the NVPTXInferAddressSpaces pass.  We've had
the new one as the default with the old one accessible via a flag for
some months now, and we've had no problems.

Reviewers: tra

Subscribers: llvm-commits, jholewinski, jingyue, mgorny

Differential Revision: https://reviews.llvm.org/D26165

llvm-svn: 285642
2016-10-31 21:51:42 +00:00
Nemanja Ivanovic 60bdfe5a7c [PPC] add absolute difference altivec instructions and matching intrinsics
This patch corresponds to review https://reviews.llvm.org/D26072.
Committing on behalf of Sean Fertile.

llvm-svn: 285627
2016-10-31 19:47:52 +00:00
Tim Northover 037af52c8b GlobalISel: allow truncating pointer casts on AArch64.
llvm-svn: 285615
2016-10-31 18:31:09 +00:00
Tim Northover cdf23f1d93 GlobalISel: translate stack protector intrinsics
llvm-svn: 285614
2016-10-31 18:30:59 +00:00
Michael Zuckerman 68a5c53616 [x86][inline-asm][AVX512][llvm][PART-2]
Introducing "k" and "Yk" constraints for extended inline assembly, enabling use of AVX512 masked vectorized instructions.

Commit on behalf of mharoush

Extending inline assembly support, compatible with GCC as folowing:
"k" constraint hints the compiler to select any of AVX512 k0-k7 registers.
"Yk" constraint is a subset of "k" excluding k0 which is not allowd to be used as a mask.

Reviewer: 1. rnk

Differential Revision: https://reviews.llvm.org/D25062

llvm-svn: 285591
2016-10-31 16:19:58 +00:00
Artem Tamazov 54bfd548aa [AMDGPU][MC][gfx8] Support 20-bit immediate offset in SMEM instructions.
Fixes Bug 30808.
Note that passing subtarget information to predicates seems too complicated, so gfx8-specific def smrd_offset_20 introduced.
Old gfx6/7-specific def renamed to smrd_offset_8 for clarity.
Lit tests updated.

Differential Revision: https://reviews.llvm.org/D26085

llvm-svn: 285590
2016-10-31 16:07:39 +00:00
Krzysztof Parzyszek 22586dcb2a [Hexagon] Don't expand mux instructions with both sources identical
llvm-svn: 285588
2016-10-31 15:45:09 +00:00
Ulrich Weigand 2e5e51b3f3 [SystemZ] Rework processor feature definitions and add -mcpu=archX support
This patch implements two changes:

- Move processor feature definition into a new file SystemZFeatures.td,
  and provide explicit lists of supported and unsupported features for
  each level of the z/Architecture.  This allows specifying unsupported
  features in the scheduler definition files for each processor.

- Add optional aliases for the -mcpu processor names according to the
  level of the z/Architecture, for compatibility with other compilers
  on the platform.  The supported aliases are:
    -mcpu=arch8  equals  -mcpu=z10
    -mcpu=arch9  equals  -mcpu=z196
    -mcpu=arch10 equals  -mcpu=zEC12
    -mcpu=arch11 equals  -mcpu=z13

llvm-svn: 285577
2016-10-31 14:33:29 +00:00
Ulrich Weigand d28be373d4 [SystemZ] Guard LEFR/LFER with FeatureVector
The LEFR/LFER pseudos are aliases for vector instructions and should
therefore be guared by FeatureVector.  If they aren't, the TableGen
scheduler definition checking might complain that there is no data
for those pseudos for pre-z13 machines.

No functional change intended. 

llvm-svn: 285576
2016-10-31 14:28:43 +00:00
Ulrich Weigand d9001301d9 [SystemZ] Correctly diagnose missing features in AsmParser
Currently, when using an instruction that is not supported on the
currently selected architecture, the LLVM assembler is likely to
diagnose an "invalid operand" instead of a "missing feature".

This is because many operands require a custom parser in order to
be processed correctly, and if an instruction is not available
according to the current feature set, the generated parser code
will also not detect the associated custom operand parsers.

Fixed by temporarily enabling all features while parsing operands.
The missing features will then be correctly detected when actually
parsing the instruction itself.

llvm-svn: 285575
2016-10-31 14:25:05 +00:00
Ulrich Weigand ec5d779eb8 [SystemZ] Fix encoding of MVCK and .insn ss
LLVM currently treats the first operand of MVCK as if it were a
regular base+index+displacement address.  However, it is in fact
a base+displacement combined with a length register field.

While the two might look syntactically similar, there are two
semantic differences:
- %r0 is a valid length register, even though it cannot be used
  as an index register.
- In an expression with just a single register like 0(%rX), the
  register is treated as base with normal addresses, while it is
  treated as the length register (with an empty base) for MVCK.

Fixed by adding a new operand parser class BDRAddr and reworking
the assembler parser to distinguish between address + length
register operands and regular addresses.

llvm-svn: 285574
2016-10-31 14:21:36 +00:00
Jonas Paulsson 6788ddeac9 [SystemZ] Model 2 VBU units (not 1) in SystemZScheduleZ13.td.
NFC.

Review: Ulrich Weigand.
llvm-svn: 285566
2016-10-31 13:05:48 +00:00
Alexey Bataev d07c731d86 Improved cost model for FDIV and FSQRT, by Andrew Tischenko
There is a bug describing poor cost model for floating point operations:
Bug 29083 - [X86][SSE] Improve costs for floating point operations. This
patch is the second one in series of patches dealing with cost model.

Differential Revision: https://reviews.llvm.org/D25722

llvm-svn: 285564
2016-10-31 12:10:53 +00:00
Craig Topper d4e580705d [AVX-512] Add missing patterns for selecting masked vector extracts that started from shuffles.
llvm-svn: 285546
2016-10-31 05:55:57 +00:00
Craig Topper b7781a95fd [X86] Use intrinsics table for PMADDUBSW and PMADDWD so that we can use the legacy intrinsics to select EVEX encoded instructions when available.
This removes a couple tablegen classes that become unused after this change. Another class gained an additional parameter to allow PMADDUBSW to specify a different result type from its input type.

llvm-svn: 285515
2016-10-30 06:56:16 +00:00
Craig Topper bf9e5a16a4 [X86] Don't use loadv2i64 on SSE version of PMULHRSW. Use memopv2i64 instead.
This bug was introduced in r285501.

llvm-svn: 285510
2016-10-30 00:02:55 +00:00
Craig Topper defe9ffbb5 [X86] Use intrinsics table for VPMULHRSW intrincis so that the legacy intrinsics can select EVEX encoded instructions when available.
This requires a minor rename of the instructions due to the use of different tablegen classes and how the names are concatenated.

llvm-svn: 285501
2016-10-29 18:41:45 +00:00
Elena Demikhovsky 519b4ccd70 Fixed FMA + FNEG combine.
Masked form of FMA should be omitted in this optimization.

Differential Revision: https://reviews.llvm.org/D25984

llvm-svn: 285492
2016-10-29 08:44:46 +00:00
Matt Arsenault c88ba36eab AMDGPU: Use 1/2pi inline imm on VI
I'm guessing at how it is supposed to be printed

llvm-svn: 285490
2016-10-29 04:05:06 +00:00
Matthias Braun 7d78614ae9 AArch64DeadRegisterDefinitionsPass: Cleanup; NFC
- Fix doxygen file comment
- reduce indentation in loop
- Factor out some common subexpressions
- Move independent helper function out of class
- Fix Changed flag (this is not strictly NFC but a bugfix, but the flag
  seems ignored anyway)

llvm-svn: 285488
2016-10-29 01:03:41 +00:00
Tom Stellard 6695ba0440 AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions
Summary:
Flat instruction can return out of order, so we need always need to wait
for all the outstanding flat operations.

Reviewers: tony-tye, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D25998

llvm-svn: 285479
2016-10-28 23:53:48 +00:00
Matt Arsenault 4e9c1e3a79 AMDGPU: Fix instruction flags for s_endpgm
Set isReturn, remove hasSideEffects. Also remove
hasCtrlDep, I'm not really sure what that does.

llvm-svn: 285476
2016-10-28 23:00:38 +00:00
Matt Arsenault 7b6475568d AMDGPU: Add definitions for scalar store instructions
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.

This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.

llvm-svn: 285463
2016-10-28 21:55:15 +00:00
Matt Arsenault 4b6a6cc8e9 AMDGPU: Rename glc operand type
While trying to add the glc bit to SMEM instructions on VI
with the new refactoring I ran into some kind of shadowing
problem for the glc operand when using the pseudoinstruction
as a multiclass parameter.

Everywhere that currently uses it defines the operand to have the same
name as its type, i.e. glc:$glc which works. For some reason now it
conflicts, and its up evaluating to the wrong thing. For the
real encoding classes,

let Inst{16} = !if(ps.has_glc, glc, ?); was not being evaluated
and still visible in the Inst initializer in the expanded td file.
In other cases I got a a different error about an illegal operand
where this was using { 0 } initializer from the bits<1> glc initializer
instead of evaluating it as false in the if.

For consistency all of the operand types should probably
be captialized to avoid conflicting with the variable names
unless somebody has a better idea of how to fix this.

llvm-svn: 285462
2016-10-28 21:55:08 +00:00
Justin Lebar f0a80ba385 [NVPTX] Compute 'rem' using the result of 'div', if possible.
Summary:
In isel, transform

  Num % Den

into

  Num - (Num / Den) * Den

if the result of Num / Den is already available.

Reviewers: tra

Subscribers: hfinkel, llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26090

llvm-svn: 285461
2016-10-28 21:44:00 +00:00
Matt Arsenault 4eae301995 AMDGPU: Diagnose using too many SGPRs
This is possible when using inline asm.

llvm-svn: 285447
2016-10-28 20:31:47 +00:00
Matt Arsenault 08906a3c62 AMDGPU: Fix using incorrect private resource with no allocation
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.

llvm-svn: 285435
2016-10-28 19:43:31 +00:00
Nemanja Ivanovic e28a0fc72a Implement vector count leading/trailing bytes with zero lsb and vector parity
builtins - llvm portion

This patch corresponds to review https://reviews.llvm.org/D26003.
Committing on behalf of Zaara Syeda.

llvm-svn: 285434
2016-10-28 19:38:24 +00:00
Krzysztof Parzyszek 87a47be039 [Hexagon] Maintain kill flags through splitting in expand-condsets
Do not use LiveIntervals to recalculate kills, because that cannot be
done accurately without implicit uses on predicated instructions.

llvm-svn: 285409
2016-10-28 15:50:22 +00:00
Tom Stellard aea899e2a0 AMDGPU/SI: Handle hazard with s_rfe_b64
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25638

llvm-svn: 285368
2016-10-27 23:50:21 +00:00
Tom Stellard 04051b5fad AMDGPU/SI: Handle hazard with sgpr lane selects for v_{read,write}lane
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25637

llvm-svn: 285367
2016-10-27 23:42:29 +00:00
Tom Stellard 6b9c1be4ea AMDGPU/SI: Fix unused variable warning on non-debug builds
llvm-svn: 285363
2016-10-27 23:28:03 +00:00
Tom Stellard b133fbb9a4 AMDGPU/SI: Handle hazard with > 8 byte VMEM stores
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25577

llvm-svn: 285359
2016-10-27 23:05:31 +00:00
Tom Stellard 30d30824b4 AMDGPU/SI: Handle s_setreg hazard in GCNHazardRecognizer
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25528

llvm-svn: 285338
2016-10-27 20:39:09 +00:00
Simon Pilgrim d23219b9ee [X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets
llvm-svn: 285329
2016-10-27 18:32:06 +00:00
Simon Pilgrim 47c1ff7a43 [X86][AVX512DQ] Move v2i64 and v4i64 MUL lowering to tablegen
As suggested by @igorb on D26011

llvm-svn: 285313
2016-10-27 17:07:40 +00:00
Saleem Abdulrasool 075d2e3c59 ARM: ensure that the Windows DBZ check is in range
The Windows ARM target expects the compiler to emit a division-by-zero check.
The check would use the form of:

    cmp r?, #0
    cbz .Ltrap
    b .Lbody
  .Lbody:
    ...
  .Ltrap:
    udf #249 @ __brkdiv0

This works great most of the time.  However, if the body of the function is
greater than 127 bytes, the branch target limitation of cbz becomes an issue.
This occurs in the unoptimized code generation cases sometimes (like in
compiler-rt).

Since this is a matter of correctness, possibly pay a small penalty instead.  We
now form this slightly differently:

    cbnz .Lbody
    udf #249 @ __brkdiv0
  .Lbody:
    ...

The positive case is through the branch instead of being the next instruction.
However, because of the basic block layout, the negated branch is going to be
a short distance always (2 bytes away, after the inserted __brkdiv0).

The new t__brkdiv0 instruction is required to explicitly mark the instruction as
a terminator as the generic UDF instruction is not a terminator.

Addresses PR30532!

llvm-svn: 285312
2016-10-27 16:59:22 +00:00
Vasileios Kalintiris cfb005a0ee [mips] Do not allow -opt-bisect-limit to skip the PIC call optimization pass.
r282428 added the MipsOptimizePICCall as an opt-in pass that can be
skipped when using the -opt-bisect-limit option. However, this pass is
needed because it generates code that conforms to the o32 ABI
specification by using the $t9 register for PIC calls with JALR
instructions.

This bug was exposed by the fact that skipFunction() also checks for
the "optnone" attribute. This caused functions with that attribute to
break the requirements of the o32 ABI.

llvm-svn: 285305
2016-10-27 15:50:36 +00:00
Simon Pilgrim 820e1326d7 [X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq).

Updated cost table accordingly.

Differential Revision: https://reviews.llvm.org/D26011

llvm-svn: 285304
2016-10-27 15:27:00 +00:00
Krzysztof Parzyszek 046da74699 [Hexagon] Do not expand ISD::SELECT for HVX vectors
llvm-svn: 285297
2016-10-27 14:30:16 +00:00
Sam Parker e7d9505c08 [ARM] Predicate UMAAL selection on hasDSP.
UMAAL is a DSP instruction and it is not available on thumbv7m
(Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong
CHECK prefix in longMAC.ll test.

Patch by Vadzim Dambrouski.

Differential Revision: https://reviews.llvm.org/D25890

llvm-svn: 285278
2016-10-27 09:47:10 +00:00
Dylan McKay dd680cc753 [AVR] Generate all of the TableGen files we need
This enables generation of all of the TableGen files that are used
downstream.

llvm-svn: 285274
2016-10-27 08:20:47 +00:00
Nicolai Haehnle 7b0e25b7ad AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.

Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect].

The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25829

llvm-svn: 285273
2016-10-27 08:15:07 +00:00
Dylan McKay 00009d4824 [AVR] Compile the disassembler
This also updates references of 'TheAVRTarget' to the new
'getTheAVRTarget()' method.

llvm-svn: 285272
2016-10-27 08:09:15 +00:00
Dylan McKay ec47065795 [AVR] Add AVRISelDAGToDAG.cpp
Summary: This pulls the AVR instruction selector in-tree.

Reviewers: arsenm, kparzysz

Subscribers: llvm-commits, wdng, beanz, japaric, mgorny

Differential Revision: https://reviews.llvm.org/D25278

llvm-svn: 285270
2016-10-27 07:03:47 +00:00
Dylan McKay 6eaa4e4bcc [AVR] Add the machine code emitter
Reviewers: arsenm, kparzysz

Subscribers: wdng, beanz, japaric, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D25388

llvm-svn: 285269
2016-10-27 06:56:46 +00:00
Nemanja Ivanovic 32b5fed639 [PowerPC] - No SExt/ZExt needed for count trailing zeros
This patch corresponds to review:
https://reviews.llvm.org/D25896

It just eliminates the redundant ZExt after a count trailing zeros instruction.

llvm-svn: 285267
2016-10-27 05:17:58 +00:00
Evandro Menezes ca8370396a [AArch64] Create feature set for Samsung Exynos-M2
Since Exynos-M2 improved the FP square root unit a bit over the one in
Exynos-M1, it does not benefit from using the Newton series for such
operations.

llvm-svn: 285246
2016-10-26 22:06:20 +00:00
Tim Northover a9cc385664 ARM: don't rely on push/pop reglists being in order when folding SP adjust.
It would be a very nice invariant to rely on, but unfortunately it doesn't
necessarily hold (and the causes of mis-sorted reglists appear to be quite
varied) so to be robust the frame lowering code can't assume that the first
register in the list is also the first one that actually gets pushed.

Should fix an issue where we were turning something like:

    push {r8, r4, r7, lr}
    sub sp, #24

into nonsense like:

    push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr}

llvm-svn: 285232
2016-10-26 20:01:00 +00:00
Nemanja Ivanovic 0f45998bc6 [PowerPC] Implement vec_insert_exp builtins - llvm portion
This revision corresponds to review: https://reviews.llvm.org/D25957.
Committing on behalf of Zaara Syeda.

llvm-svn: 285225
2016-10-26 19:03:40 +00:00
Chad Rosier 0c621fda0d [AArch64] Avoid materializing constant 1 when generating cneg instructions.
Instead of

 cmp w0, #1
 orr w8, wzr, #0x1
 cneg w0, w8, ne

we now generate

 cmp w0, #1
 csinv w0, w0, wzr, eq

PR28965

llvm-svn: 285217
2016-10-26 18:15:32 +00:00
Dan Gohman 68a423bf84 [WebAssembly] Update the README.txt.
Update the README.txt with newer information, add a link to the Emscripten
page explaining the current easiest way to use the LLVM wasm backend, and
mention that other ways of using the LLVM wasm backend are in development.

llvm-svn: 285215
2016-10-26 17:44:09 +00:00
Yaxun Liu 94add85adb AMDGPU: Refactor processor definition to use ISA version features
Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend.

Refactor processor definition to use ISA version features.

Fixed ISA version for stoney.

Based on Laurent Morichetti's patch.

Differential Revision: https://reviews.llvm.org/D25919

llvm-svn: 285210
2016-10-26 16:37:56 +00:00
Matt Arsenault 39787bdcbb Reapply "AMDGPU: Don't use offen if it is 0"
This reverts r283003

llvm-svn: 285203
2016-10-26 15:08:16 +00:00
Matt Arsenault 1110f14b42 AMDGPU: Fix counting si_mask_branch as 4 bytes
llvm-svn: 285202
2016-10-26 14:53:54 +00:00
Tom Stellard f8e6eaff6e AMDGPU/SI: Don't emit multi-dword flat memory ops when they might access scratch
Summary:
A single flat memory operations that might access the scratch buffer
can only access MaxPrivateElementSize bytes.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25788

llvm-svn: 285198
2016-10-26 14:38:47 +00:00
Zvi Rackover aa3402b41e [X86] AVX512 fallback for floating-point scalar selects
Summary:
In the case where of 'select i1 , f32, f32' or select i1, f64, f64 prefer lowering to masked-moves over branches.

Fixes pr30561

Reviewers: igorb, aymanmus, delena

Differential Revision: https://reviews.llvm.org/D25310

llvm-svn: 285196
2016-10-26 14:12:46 +00:00
Craig Topper 812d3d30ae [AVX-512] Add scalar vfmsub/vfnmsub mask3 intrinsics
Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions.

Reviewers: igorb, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25933

llvm-svn: 285173
2016-10-26 04:59:58 +00:00
James Y Knight 2e64b8b79e [Sparc] Don't overlap variable-sized allocas with other stack variables.
On SparcV8, it was previously the case that a variable-sized alloca
might overlap by 4-bytes the last fixed stack variable, effectively
because 92 (the number of bytes reserved for the register spill area) !=
96 (the offset added to SP for where to start a DYNAMIC_STACKALLOC).

It's not as simple as changing 96 to 92, because variables that should
be 8-byte aligned would then be misaligned.

For now, simply increase the allocation size by 8 bytes for each dynamic
allocation -- wastes space, but at least doesn't overlap. As the large
comment says, doing this more efficiently will require larger changes in
llvm.

Also adds some test cases showing that we continue to not support
dynamic stack allocation and over-alignment in the same function.

llvm-svn: 285131
2016-10-25 22:13:28 +00:00
Evandro Menezes 7696dc0685 [AArch64] Adjust the cost model for Exynos M1.
Modify the maximum jump table size.

llvm-svn: 285106
2016-10-25 20:05:42 +00:00
Dan Gohman f50d964bdb [WebAssembly] Add immediate fields to call_indirect and memory operators.
call_indirect, grow_memory, and current_memory now have immediate
operands in the 0xd binary encoding.

llvm-svn: 285085
2016-10-25 16:55:52 +00:00
Ulrich Weigand 7bdb485e18 [SystemZ] Do not use LOC(G) for volatile loads
It is not safe to use LOAD ON CONDITION to implement access to a memory
location marked "volatile", since the architecture leaves it unspecified
whether or not an access happens if the condition is false.

The current code already appears to care about that:
  def LOC  : CondUnaryRSY<"loc",  0xEBF2, nonvolatile_load, GR32, 4>;

Unfortunately, that "nonvolatile_load" operator is simply ignored
by the CondUnaryRSY class, and there was no test to catch it.

llvm-svn: 285077
2016-10-25 15:39:15 +00:00
Simon Pilgrim 5c3c9707c3 [X86][SSE] Add support for (V)PMOVSX* constant folding
We already have (V)PMOVZX* combining support, this is the beginning of handling (V)PMOVSX* similarly - other combines in combineVSZext can be generalized in future patches.

This unearthed an interesting bug in that we were generating illegal build vectors on 32-bit targets - it was proving difficult to create a test for it from PMOVZX, but it fired immediately with PMOVSX. I've created a more general form of the existing getConstVector to handle these cases - ideally this should be handled in non-target-specific code but I couldn't find an equivalent.

Differential Revision: https://reviews.llvm.org/D25874

llvm-svn: 285072
2016-10-25 14:29:25 +00:00
Benjamin Kramer 7df3043db3 Fix an unused warning in WebAssemblyInstPrinter with NDEBUG.
Patch by Sam McCall!

Differential Revision: https://reviews.llvm.org/D25934

llvm-svn: 285055
2016-10-25 09:08:50 +00:00
Craig Topper 01e4667e02 [AVX-512] Add support for creating SIGN_EXTEND_VECTOR_INREG and ZERO_EXTEND_VECTOR_INREG for 512-bit vectors to support vpmovzxbq and vpmovsxbq.
Summary: The one tricky thing about this is that the sign/zero_extend_inreg uses v64i8 as an input type which isn't legal without BWI support. Though the vpmovsxbq and vpmovzxbq instructions themselves don't require BWI. To support this we need to add custom lowering for ZERO_EXTEND_VECTOR_INREG with v64i8 input. This can mostly reuse the existing sign extend code with a couple checks for sign extend vs zero extend added.

Reviewers: delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25594

llvm-svn: 285053
2016-10-25 04:00:29 +00:00
Matthias Braun c8440dddb2 MachineInstrBundle: Pass iterators to getBundle(Start|End); NFC
This is a function to go backwards in a block to find the first
instruction in a bundle, so iterator is a more natural choice for
parameter/return rather than a reference to a MachineInstruction.

llvm-svn: 285051
2016-10-25 02:55:17 +00:00
Dan Gohman 48abaa9c74 [WebAssembly] Reorder load/store operands to match binary encoding.
The p2align operand of a load/store is encoded before the offset
operand; reorder the MachineInstr operands accordingly.

llvm-svn: 285044
2016-10-25 00:17:11 +00:00
Dan Gohman 3acb187d95 [WebAssembly] Implement more WebAssembly binary encoding.
This changes locals from being declared by the emitLocal hook in
WebAssemblyTargetStreamer, rather than with an instruction. After exploring
the infastructure in LLVM more, this seems to make more sense since
declaring locals doesn't use an encoded opcode.

This also adds more 0xd opcodes, type encodings, and miscellaneous
binary encoding bits.

llvm-svn: 285040
2016-10-24 23:27:49 +00:00
Matthias Braun 8b38ffaa98 CodeGen/Passes: Pass MachineFunction as functor arg; NFC
Passing a MachineFunction as argument is more natural and avoids an
unnecessary round-trip through the logic determining the correct
Subtarget because MachineFunction already has a reference anyway.

llvm-svn: 285039
2016-10-24 23:23:02 +00:00
Matthias Braun fc371558a0 Use MachineInstr::mop_iterator instead of MIOperands; NFC
(Const)?MIOperands is equivalent to the C++ style
MachineInstr::mop_iterator. Use the latter for consistency except for a
few callers of MIOperands::analyzePhysReg().

llvm-svn: 285029
2016-10-24 21:36:43 +00:00
Dan Gohman 5d3391f859 [WebAssembly] Fix a broken URL.
llvm-svn: 285017
2016-10-24 20:35:17 +00:00
Dan Gohman 4becc58587 [WebAssembly] Define the `end` opcode value.
CFGStackify differentiates between END_LOOP and END_BLOCK, but wasm
itself doesn't. For now, just use the same opcode for both.

llvm-svn: 285016
2016-10-24 20:32:04 +00:00
Dan Gohman c968297b95 [WebAssembly] Update opcode values according to recent spec changes.
This corresponds to the "0xd" opcode renumbering.

llvm-svn: 285014
2016-10-24 20:21:49 +00:00
Dan Gohman 4fc4e42dea [WebAssembly] Add an option to make get_local/set_local explicit.
This patch adds a pass, controlled by an option and off by default for
now, for making implicit get_local/set_local explicit. This simplifies
emitting wasm with MC.

Differential Revision: https://reviews.llvm.org/D25836

llvm-svn: 285009
2016-10-24 19:49:43 +00:00
Peter Collingbourne 6733564e5a Target: Change various section classifiers in TargetLoweringObjectFile to take a GlobalObject.
These functions are about classifying a global which will actually be
emitted, so it does not make sense for them to take a GlobalValue which may
for example be an alias.

Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to
look through aliases before using TargetLoweringObjectFile interfaces. These
are functional changes but all appear to be bug fixes.

Differential Revision: https://reviews.llvm.org/D25917

llvm-svn: 285006
2016-10-24 19:23:39 +00:00
Krzysztof Parzyszek eb6172404d Revert r284972 and remove other defaulted copy/move constructors/=
David Blaikie pointed out that we get them for free without having to
write anything.

llvm-svn: 284996
2016-10-24 17:40:46 +00:00
Ehsan Amiri c90b02cf50 [PPC] Generate positive FP zero using xor insn instead of loading from constant area
https://reviews.llvm.org/D23614

Currently we load +0.0 from constant area. That can change to be generated using
XOR instruction.

llvm-svn: 284995
2016-10-24 17:31:09 +00:00
Eli Friedman b37864b58d Revert r284580+r284917. ("Synthesize TBB/TBH instructions")
The optimization has correctness issues, so reverting for now to fix tests
on thumb1 targets.

llvm-svn: 284993
2016-10-24 17:20:50 +00:00
Evandro Menezes eff2bd9d4f [AArch64] Optionally use the Newton series for reciprocal estimation
Add support for estimating the square root or its reciprocal and division or
reciprocal using the combiner generic Newton series.

Differential revision: https://reviews.llvm.org/D25291

llvm-svn: 284986
2016-10-24 16:14:58 +00:00
Ehsan Amiri 1f31e9157d [PPC] Better codegen for AND, ANY_EXT, SRL sequence
https://reviews.llvm.org/D24924

This improves the code generated for a sequence of AND, ANY_EXT, SRL instructions. This is a targetted fix for this special pattern. The pattern is generated by target independet dag combiner and so a more general fix may not be necessary. If we come across other similar cases, some ideas for handling it are discussed on the code review.

llvm-svn: 284983
2016-10-24 15:46:58 +00:00
Nicolai Haehnle a785209bc2 AMDGPU: Fix Two Address problems with v_movreld
Summary:
The v_movreld machine instruction is used with three operands that are
in a sense tied to each other (the explicit VGPR_32 def and the implicit
VGPR_NN def and use). There is no way to express that using the currently
available operand bits, and indeed there are cases where the Two Address
instructions pass does the wrong thing.

This patch introduces a new set of pseudo instructions that are identical
in intended semantics as v_movreld, but they only have two tied operands.

Having to add a new set of pseudo instructions is admittedly annoying, but
it's a fairly straightforward and solid approach. The only alternative I
see is to try to teach the Two Address instructions pass about Three Address
instructions, and I'm afraid that's trickier and is going to end up more
fragile.

Note that v_movrels does not suffer from this problem, and so this patch
does not touch it.

This fixes several GL45-CTS.shaders.indexing.* tests.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25633

llvm-svn: 284980
2016-10-24 14:56:02 +00:00
Pavel Labath 51c454c1a9 Remove unused #includes of TimeValue.h. NFC.
llvm-svn: 284975
2016-10-24 14:00:26 +00:00
Joel Jones 504bf334b0 AArch64 ILP32 relocations for assembly and ELF
Summary:
Add relocations for AArch64 ILP32. Includes:
  - Addition of definitions for R_AARCH32_*
  - Definition of new -target-abi: ilp32
  - Definition of data layout string
  - Tests for added relocations. Not comprehensive, but matches
    existing tests for 64-bit. Renames "CHECK-OBJ" to "CHECK-OBJ-LP64".
  - Tests for llvm-readobj

Reviewers: zatrazz, peter.smith, echristo, t.p.northover

Subscribers: aemerson, rengolin, mehdi_amini

Differential Revision: https://reviews.llvm.org/D25159

llvm-svn: 284973
2016-10-24 13:37:13 +00:00
Krzysztof Parzyszek f74683f930 [RDF] Add default move constructors/assignment operators
llvm-svn: 284972
2016-10-24 13:15:20 +00:00
Simon Dardis 9c34854833 [mips] synci microMIPS instruction definition.
Add synci to the microMIPS instruction definitions, mark the MIPS sync & synci
as not being part of microMIPS. This does not cover the sync instruction alias,
as that will be handled with a different patch. Add sync to the valid tests for
microMIPS.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D25795

llvm-svn: 284962
2016-10-24 10:23:59 +00:00
Craig Topper 8ec5c7326d [AVX-512] Remove masked pmin/pmax intrinsics and autoupgrade to native IR.
Clang patch to replace 512-bit vector and 64-bit element versions with native IR will follow.

llvm-svn: 284955
2016-10-24 04:04:16 +00:00
Simon Pilgrim 6ac1e98b09 [X86][SSE] Add SSE41/AVX1 costs for vector shifts.
We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results.

llvm-svn: 284939
2016-10-23 16:49:04 +00:00
Simon Pilgrim 96ef0c1103 Use APInt::isAllOnesValue instead of popcnt. NFCI.
More obvious implementation and faster too.

llvm-svn: 284937
2016-10-23 15:09:44 +00:00
Dylan McKay 479a13c0aa [AVR] Add the machine code disassembler
This adds a super basic implementation of a machine code disassembler.

It doesn't support any operands with custom encoding.

llvm-svn: 284930
2016-10-22 23:57:59 +00:00
Simon Pilgrim d3829c89bc [X86][AVX512VL] Added support for combining target 256-bit shuffles to AVX512VL VPERMV3
llvm-svn: 284922
2016-10-22 20:15:39 +00:00
Simon Pilgrim 56c0524f0f [X86][AVX512] Added support for combining target shuffles to AVX512 VPERMV3
llvm-svn: 284921
2016-10-22 19:53:59 +00:00
James Molloy 2bae8640d7 [ARM] Fix crash in ConstantIslands
tPCRelJT may not be the first instruction in a block. Check that instead of dereferencing a broken iterator.

llvm-svn: 284917
2016-10-22 09:58:37 +00:00
Craig Topper b084c90a18 [X86] Add support for printing shuffle comments for VALIGN instructions.
llvm-svn: 284915
2016-10-22 06:51:56 +00:00
Craig Topper 7b2b8db438 [X86] Add support for lowering v4i64 and v8i64 shuffles directly to PALIGNR. I think shuffle combine can figure it out later, but we should try to get it right up front.
llvm-svn: 284914
2016-10-22 06:51:52 +00:00
Craig Topper 9f374533e3 [X86] Remove unnecessary AVX2 check that was already covered by an assertion earlier in the function. NFC
llvm-svn: 284913
2016-10-22 06:51:49 +00:00
Craig Topper bea5cb5491 [X86] Remove 128-bit lane handling from the main loop of matchVectorShuffleAsByteRotate. Instead check for is128LaneRepeatedSuffleMask before the loop and just loop over the repeated mask.
I plan to use the loop to support VALIGND/Q shuffles so this makes it easier to reuse.

llvm-svn: 284912
2016-10-22 06:51:44 +00:00
Simon Pilgrim 0d376bcbf0 [X86][SSE] Use getConstVector helper for VPERMV mask generation. NFCI.
llvm-svn: 284911
2016-10-22 06:18:36 +00:00
Konstantin Zhuravlyov fda33eaf0c [AMDGPU] Perform uchar to float combine for ISD::SINT_TO_FP
This will prevent following regression when enabling i16 support (D18049):
  test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

Differential Revision: https://reviews.llvm.org/D25805

llvm-svn: 284891
2016-10-21 22:10:03 +00:00
Tom Stellard 6c7dd980e4 AMDGPU/SI: Fix crash caused by r284267
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25782

llvm-svn: 284875
2016-10-21 20:25:11 +00:00
Peter Collingbourne e9bd49824d X86: Improve BT instruction selection for 64-bit values.
If a 64-bit value is tested against a bit which is known to be in the range
[0..31) (modulo 64), we can use the 32-bit BT instruction, which has a slightly
shorter encoding.

Differential Revision: https://reviews.llvm.org/D25862

llvm-svn: 284864
2016-10-21 19:57:55 +00:00
Simon Pilgrim ab48872313 [X86][AVX512BWVL] Added support for lowering v16i16 shuffles to AVX512BWVL vpermw
llvm-svn: 284863
2016-10-21 19:54:38 +00:00
Simon Pilgrim da814cba0d [X86][AVX512BWVL] Added support for combining target v16i16 shuffles to AVX512BWVL vpermw
llvm-svn: 284860
2016-10-21 19:40:29 +00:00
Simon Pilgrim 0109bf116f [X86][AVX512] Added support for combining target shuffles to AVX512 vpermpd/vpermq/vpermps/vpermd/vpermw
llvm-svn: 284858
2016-10-21 19:18:09 +00:00
Krzysztof Parzyszek 6e7fa99d3a [RDF] Use RegisterId typedef more consistently, NFC
llvm-svn: 284857
2016-10-21 19:12:13 +00:00
Krzysztof Parzyszek b71085b547 [Hexagon] Handle spills of partially defined double vector registers
After register allocation it is possible to have a spill of a register
that is only partially defined. That in itself it fine, but creates a
problem for double vector registers. Stores of such registers are pseudo
instructions that are expanded into pairs of individual vector stores,
and in case of a partially defined source, one of the stores may use
an entirely undefined register. To avoid this, track the defined parts
and only generate actual stores for those.

llvm-svn: 284841
2016-10-21 16:38:29 +00:00
Derek Schuff 6f69783f1f [WebAssembly] Fix for 0xc call_indirect changes
Summary:
Need to reorder the operands to have the callee as the last argument.
Adds a pseudo-instruction, and a pass to lower it into a real
call_indirect.

This is the first of two options for how to fix the problem.

Reviewers: dschuff, sunfish

Subscribers: jfb, beanz, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D25708

llvm-svn: 284840
2016-10-21 16:38:07 +00:00
Abderrazek Zaafrani 9daf8110c8 Set the vectorizer MaxInterleaveFactor for Exynos.
llvm-svn: 284839
2016-10-21 16:28:27 +00:00
Simon Pilgrim 2d96daa885 [X86] Use DAG::getBuildVector helper wrapper where possible. NFCI.
llvm-svn: 284835
2016-10-21 16:07:51 +00:00
Abderrazek Zaafrani 9f382f53d1 Test commit
llvm-svn: 284832
2016-10-21 15:24:08 +00:00
Artem Tamazov 751985a757 [AMDGPU][mc] Fix ds_min/max[_rtn]_f32 - extra source operand removed.
Fixes Bug 28215. Lit tests updated.

Differential Revision: https://reviews.llvm.org/D25837

llvm-svn: 284825
2016-10-21 14:49:22 +00:00
Simon Pilgrim c98d99a600 [X86][AVX2] Begun generalizing lowering to VPERMD/VPERMPS in preparation for AVX512 support.
llvm-svn: 284823
2016-10-21 13:00:47 +00:00
Simon Pilgrim 32b06235da [X86][AVX512] Add mask/maskz writemask support to subvector broadcast shuffle decode comments
llvm-svn: 284821
2016-10-21 12:14:24 +00:00
Bjorn Pettersson 9fcd605d1e [AArch64] Corrected spill size for DDD register class. NFCI
Summary:
The spill size was incorrectly set to 196 bits,
which isn't a multiple of 8. This problem was detected when
experimenting with asserts that the spill size should be a
multiple of the byte size.

New corrected value for the spill size is set to 192 bits.

Note that tablegen (RegisterInfoEmitter) will divide the
size set in the RegisterClass definition by 8. So this
change should not have any impact on the tablegen output
(trunc(192/8) == trunc(196/8) == 24 bytes).

Reviewers: t.p.northover

Subscribers: llvm-commits, aemerson, rengolin

Differential Revision: https://reviews.llvm.org/D25818

llvm-svn: 284814
2016-10-21 09:53:42 +00:00
Michael Kuperstein b2443ed62b [X86] Enable interleaved memory access by default
This lets the loop vectorizer generate interleaved memory accesses on x86.

Differential Revision: https://reviews.llvm.org/D25350

llvm-svn: 284779
2016-10-20 21:04:31 +00:00
Konstantin Zhuravlyov 521e5ef4ce [AMDGPU] Make note record name a static const member of target streamer
Differential Revision: https://reviews.llvm.org/D25746

llvm-svn: 284760
2016-10-20 18:22:36 +00:00
Konstantin Zhuravlyov 08326b6256 [AMDGPU] Emit constant address space data in .rodata section and use relocations instead of fixups (amdhsa only)
Differential Revision: https://reviews.llvm.org/D25693

llvm-svn: 284759
2016-10-20 18:12:38 +00:00
Simon Pilgrim 365be4f95c [CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors
We weren't checking for uniform const costs before the general cost, resulting in very high estimates.

llvm-svn: 284755
2016-10-20 18:00:35 +00:00
Sanjay Patel 0051efcf97 [Target] remove TargetRecip class; 2nd try
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.

This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.

original commit msg:

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284746
2016-10-20 16:55:45 +00:00
Simon Pilgrim 025e26dd32 [CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit integer vectors
We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults.

We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases.

llvm-svn: 284744
2016-10-20 16:39:11 +00:00
Valery Pykhtin e55fd41f73 [AMDGPU] add fcopysign(f64, f32) pattern
Differential revision: https://reviews.llvm.org/D25827

llvm-svn: 284743
2016-10-20 16:17:54 +00:00
Benjamin Kramer 2a8bef8769 Do a sweep over move ctors and remove those that are identical to the default.
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.

No functionality change intended.

llvm-svn: 284721
2016-10-20 12:20:28 +00:00
Jonas Paulsson 8010b631d5 [SystemZ] Post-RA scheduler implementation
Post-RA sched strategy and scheduling instruction annotations for z196, zEC12
and z13.

This scheduler optimizes decoder grouping and balances processor resources
(including side steering the FPd unit instructions).

The SystemZHazardRecognizer keeps track of the scheduling state, which can
be dumped with -debug-only=misched.

Reviers: Ulrich Weigand, Andrew Trick.
https://reviews.llvm.org/D17260

llvm-svn: 284704
2016-10-20 08:27:16 +00:00
Peter Collingbourne c7766778a0 X86: Allow expressions to appear as u8imm operands.
llvm-svn: 284688
2016-10-20 01:58:34 +00:00
Peter Collingbourne de1f039360 X86: Deduplicate some lowering code. NFCI.
llvm-svn: 284686
2016-10-20 01:21:26 +00:00
Reid Kleckner 40d7230f2f Use __func__ directly now that all supported compilers support it
Remove the portability macro now that it is unused.

llvm-svn: 284681
2016-10-20 00:22:23 +00:00
Wei Ding 3cb2a1e8d1 AMDGPU : Add a function to enable and disable IEEEBit for SC and shader
respectively.

Differential Revision: http://reviews.llvm.org/D25789

llvm-svn: 284655
2016-10-19 22:34:49 +00:00
Krzysztof Parzyszek c87155037b [AMDGPU] Stop using MCRegisterClass::getSize()
Differential Review: https://reviews.llvm.org/D24675

llvm-svn: 284619
2016-10-19 17:40:36 +00:00
Krzysztof Parzyszek 7bb63ac029 [RDF] Switch RefMap in liveness calculation to use lane masks
This required reengineering of some of the part of liveness calculation,
including fixing some issues caused by the limitations of the previous
approach. The current code is not necessarily the fastest, but it should
be functionally correct (at least more so than before). The compile-time
performance will be addressed in the future.

llvm-svn: 284609
2016-10-19 16:30:56 +00:00
Chris Dewhurst 2c3cdd66d2 [Sparc][LEON] Detects an erratum on UT699 LEON 3 processors involving rounding mode changes and issues an appropriate user error message.
Differential Revision: https://reviews.llvm.org/D24665

llvm-svn: 284591
2016-10-19 14:01:06 +00:00
Sjoerd Meijer 2fc4cb6f72 Reapply r284571 (with the new tests fixed).
llvm-svn: 284588
2016-10-19 13:43:02 +00:00
Ulrich Weigand 6e31ab388a [SystemZ] Add missing vector instructions for the assembler
Most z13 vector instructions have a base form where the data type of
the operation (whether to consider the vector to be 16 bytes, 8
halfwords, 4 words, or 2 doublewords) is encoded into a mask field,
and then a set of extended mnemonics where the mask field is not
present but the data type is encoded into the mnemonic name.

Currently, LLVM only supports the type-specific forms (since those
are really the ones needed for code generation), but not the base
type-generic forms.

To complete the assembler support and make it fully compatible with
the GNU assembler, this commit adds assembler aliases for all the
base forms of the various vector instructions.

It also adds two more alias forms that are documented in the PoP:
VFPSO/VFPSODB/WFPSODB -- generic form of VFLCDB etc.
VNOT -- special variant of VNO

llvm-svn: 284586
2016-10-19 13:03:18 +00:00
Ulrich Weigand 556a90c00c [SystemZ] Add optional argument to some vector string instructions
The vfee[bhf], vfene[bhf], and vistr[bhf] assembler mnemonics are
documented in the Principles of Operation to have an optional last
operand to encode arbitrary values in a mask field.

This commit adds support for those optional operands, and cleans up
the patterns to generate vector string instruction as bit.  No change
to code generation intended.

llvm-svn: 284585
2016-10-19 12:57:46 +00:00
James Molloy fbfd173447 [Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables
The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.

It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.

TBB example:
Before: lsls r0, r0, #2    After: add  r0, pc
        adr  r1, .LJTI0_0         ldrb r0, [r0, #6]
        ldr  r0, [r0, r1]         lsls r0, r0, #1
        mov  pc, r0               add  pc, r0
  => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.

The only case that can increase dynamic instruction count is the TBH case:

Before: lsls r0, r4, #2    After: lsls r4, r4, #1
        adr  r1, .LJTI0_0         add  r4, pc
        ldr  r0, [r0, r1]         ldrh r4, [r4, #6]
        mov  pc, r0               lsls r4, r4, #1
                                  add  pc, r4
  => 1 more instruction in prologue. Jump table shrunk by a factor of 2.

So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)

llvm-svn: 284580
2016-10-19 12:06:49 +00:00
Sjoerd Meijer 3f5111d363 Revert of r284571 because of failing tests.
llvm-svn: 284572
2016-10-19 07:45:48 +00:00
Sjoerd Meijer a318779263 Checking FP function attribute values and adding more build attribute tests.
This renames the function for checking FP function attribute values and also
adds more build attribute tests (which are in separate files because build
attributes are set per file).

Differential Revision: https://reviews.llvm.org/D25625

llvm-svn: 284571
2016-10-19 07:25:06 +00:00
Craig Topper a4dc340cf2 [AVX-512] Teach isel lowering that a subvector broadcast being inserted into both halves of a 512-bit vector can be combined into a larger subvector broadcast.
Summary:
This allows us to create broadcasts of 128-bit vector loads into 512-bit vectors.

New patterns added to support 8-bit and 16-bit vector types and v2f64/v2i64->v8f64/v8i64 without DQI instructions.

There also fallback patterns when the load can't be folded. These patterns are a little complex as we first need to insert the lower 128-bits into the second 128-bits using a zmm subvector insert instruction. We need to use a zmm insert in case VLX isn't available. Then use another zmm sub vector insert to take those 256-bits and insert them into the upper bits. Since we used a zmm insert to create the 256-bits we also need to do a extract_subreg to get just the lower 256-bits to pass to the second insert.

The outer insert for the fallback patterns should have its type correct because eventually we should also supported masked operations here too. So we need a DQI and a NoDQI version of the v16f32/v16i32 patterns.

Reviewers: RKSimon, delena, igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25651

llvm-svn: 284567
2016-10-19 04:44:17 +00:00
Eli Friedman c0a717ba5b Improve ARM lowering for "icmp <2 x i64> eq".
The custom lowering is pretty straightforward: basically, just AND
together the two halves of a <4 x i32> compare.

Differential Revision: https://reviews.llvm.org/D25713

llvm-svn: 284536
2016-10-18 21:03:40 +00:00
Evandro Menezes ce8d60156c [AArch64] Avoid materializing 0.0 when generating FP SELECT
Transform `a == 0.0 ? 0.0 : x` to `a == 0.0 ? a : x` and `a != 0.0 ? x : 0.0`
to `a != 0.0 ? x : a` to avoid materializing 0.0 for FCSEL, since it does not
have to be materialized beforehand for FCMP, as it has a form that has 0.0
as an implicit operand.

Differential Revision: https://reviews.llvm.org/D24808

llvm-svn: 284531
2016-10-18 20:37:35 +00:00
Tim Northover 55782222c0 GlobalISel: select small binary operations on AArch64.
AArch64 actually supports many 8-bit operations under the definition used by
GlobalISel: the designated information-carrying bits of a GPR32 get the right
value if you just use the normal 32-bit instruction.

llvm-svn: 284526
2016-10-18 20:03:48 +00:00
Tim Northover 4494d69862 GlobalISel: support floating-point constants on AArch64.
Patch from Ahmed Bougacha.

llvm-svn: 284523
2016-10-18 19:47:57 +00:00
Krzysztof Parzyszek 5bb417bed2 [Hexagon] Handle block live-ins with lane masks in HexagonBlockRanges
llvm-svn: 284522
2016-10-18 19:47:20 +00:00
Benjamin Kramer 4c2582ad78 Reduce global namespace pollution. NFC.
llvm-svn: 284521
2016-10-18 19:39:31 +00:00
Sanjay Patel 19601fa587 revert r284495: [Target] remove TargetRecip class
There's something wrong with the StringRef usage while parsing the attribute string.

llvm-svn: 284513
2016-10-18 18:36:49 +00:00
Sanjay Patel 08fff9ca81 [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284495
2016-10-18 17:05:05 +00:00
Simon Pilgrim ca3072ac58 [X86][AVX512] Add mask/maskz writemask support to constant pool shuffle decode commentx
llvm-svn: 284488
2016-10-18 15:45:37 +00:00
Simon Dardis 858915f054 [mips][ias] Handle more complicated expressions for memory operands
This patch teaches ias for mips to handle expressions such as
(8*4)+(8*31)($sp). Such expression typically occur from the expansion
of multiple macro definitions.

This partially resolves PR/30383.

Thanks to Sean Bruno for reporting the issue!

Reviewers: zoran.jovanovic, vkalintiris

Differential Revision: https://reviews.llvm.org/D24667

llvm-svn: 284485
2016-10-18 15:17:17 +00:00
Simon Dardis c4463c942c [mips] Fix sync instruction definition
The 'sync' instruction for MIPS was defined in MIPS-II as taking no operands.
MIPS32 extended the define of 'sync' as taking an optional unsigned 5 bit
immediate.

This patch correct the definition of sync so that it is accepted with an
operand of 0 or no operand for MIPS-II to MIPS-V, and a 5 bit unsigned
immediate for MIPS32 and later revisions.

Additionally a clear error is given when the MIPS32 version of sync is
used when targeting pre MIPS32.

This partially resolves PR/30714.

Thanks to Daniel Sanders for reporting this issue!

Reveiwers: vkalintiris

Differential Revision: https://reviews.llvm.org/D25672

llvm-svn: 284483
2016-10-18 14:42:13 +00:00
Simon Dardis aff4d141b9 [mips] Macro expansion for ld, sd for O32
ld and sd when assembled for the O32 ABI expand to a pair of 32 bit word loads
or stores using the specified source or destination register and the next
register.

This patch does not add support for the cases where the offset is greater than
a 16 bit signed immediate as that would lead to a wrong/misleading error
message as the assembler would report "instruction requires a CPU feature
not currently enabled" for ld & sd for MIPS64 when their offset is not a signed
16 bit number.

This fixes PR/29159.

Thanks to Sean Bruno for reporting this issue!

Reviewers: vkalintiris, seanbruno, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D24556

llvm-svn: 284481
2016-10-18 14:28:00 +00:00
Michael Zuckerman 1bee6340ef [x86][inline-asm][avx512] allow swapping of '{k<num>}' & '{z}' marks
Committing on behalf of Coby Tayree: After check-all and LGTM

Desc:

AVX512 allows dest operand to be followed by an op-mask register specifier ('{k<num>}', which in turn may be followed by a merging/zeroing specifier ('{z}')
 Currently, the following forms are allowed:
 {k<num>}
 {k<num>}{z}

This patch allows the following forms:
 {z}{k<num>}

and ignores the next form:
 {z}

Justification would be quite simple - GCC

Differential Revision: http://reviews.llvm.org/D25013

llvm-svn: 284479
2016-10-18 13:52:39 +00:00
Vasileios Kalintiris 3955b75ba9 [mips][FastISel] Instantiate the MipsFastISel class only for targets that support FastISel.
Summary:
Instead of instantiating the MipsFastISel class and checking if the
target is supported in the overriden methods, we should perform that
check before creating the class. This allows us to enable FastISel *only*
for targets that truly support it, ie. MIPS32 to MIPS32R5.

Reviewers: sdardis

Subscribers: ehostunreach, llvm-commits

Differential Revision: https://reviews.llvm.org/D24824

llvm-svn: 284475
2016-10-18 13:05:42 +00:00
Javed Absar e7c338081a [ARM] Assign cost of scaling for Cortex-R52
This patch assigns cost of the scaling used in addressing for Cortex-R52.

On Cortex-R52 a negated register offset takes longer than a non-negated
register offset, in a register-offset addressing mode.

Differential Revision: http://reviews.llvm.org/D25670

Reviewer: jmolloy
llvm-svn: 284460
2016-10-18 09:08:54 +00:00
Simon Pilgrim 4ddc92b6cd [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Differential Revision: https://reviews.llvm.org/D23808

llvm-svn: 284459
2016-10-18 07:42:15 +00:00
Dean Michael Berris 156f6cafc2 [XRay] Support for for tail calls for ARM no-Thumb
This patch adds simplified support for tail calls on ARM with XRay instrumentation.

Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall
-std=c++14  -ffunction-sections -fdata-sections` (this list doesn't include my
specific flags like --target=armv7-linux-gnueabihf etc.), the following program

    #include <cstdio>
    #include <cassert>
    #include <xray/xray_interface.h>

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() {
      std::printf("In fC()\n");
    }

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() {
      std::printf("In fB()\n");
      fC();
    }

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() {
      std::printf("In fA()\n");
      fB();
    }

    // Avoid infinite recursion in case the logging function is instrumented (so calls logging
    //   function again).
    [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
    {
      printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
    }

    int main(int argc, char* argv[]) {
      __xray_set_handler(simplyPrint);

      printf("Patching...\n");
      __xray_patch();
      fA();

      printf("Unpatching...\n");
      __xray_unpatch();
      fA();

      return 0;
    }

gives the following output:

    Patching...
    XRay: functionId=3 type=0.
    In fA()
    XRay: functionId=3 type=1.
    XRay: functionId=2 type=0.
    In fB()
    XRay: functionId=2 type=1.
    XRay: functionId=1 type=0.
    XRay: functionId=1 type=1.
    In fC()
    Unpatching...
    In fA()
    In fB()
    In fC()

So for function fC() the exit sled seems to be called too much before function
exit: before printing In fC().

Debugging shows that the above happens because printf from fC is also called as
a tail call. So first the exit sled of fC is executed, and only then printf is
jumped into. So it seems we can't do anything about this with the current
approach (i.e. within the simplification described in
https://reviews.llvm.org/D23988 ).

Differential Revision: https://reviews.llvm.org/D25030

llvm-svn: 284456
2016-10-18 05:54:15 +00:00
Craig Topper 448358b5f1 [X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has a different type than the shuffle itself.
This is especially important for 32-bit targets with 64-bit shuffle elements.

llvm-svn: 284453
2016-10-18 04:48:33 +00:00
Craig Topper 7268bf99ab [AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool entry has a different type than the shuffle itself.
Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25666

llvm-svn: 284451
2016-10-18 04:00:32 +00:00
Craig Topper 175a415e78 [AVX-512] Add support for decoding shuffle mask from constant pool for masked VPERMILPS/PD.
llvm-svn: 284450
2016-10-18 03:36:52 +00:00
Konstantin Zhuravlyov 98a3ac7106 [AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for it
Differential Revision: https://reviews.llvm.org/D25694

llvm-svn: 284435
2016-10-17 22:40:15 +00:00
Tim Northover 020d104496 GlobalISel: support wider range of load/store sizes in AArch64.
llvm-svn: 284406
2016-10-17 18:36:53 +00:00
Tom Stellard 083f1626f5 AMDGPU/SI: LowerParameter() should be computing align based on memory type
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25203

llvm-svn: 284398
2016-10-17 16:56:19 +00:00
Tom Stellard bc6c523cce AMDGPU/SI: Fix LowerParameter() for i16 arguments
Summary:
If we are loading an i16 value from a 32-bit memory location, then
we need to be able to truncate the loaded value to i16.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25198

llvm-svn: 284397
2016-10-17 16:21:45 +00:00
Craig Topper 1f5178ff9f [X86] Fix shuffle decoding assertions to print the right number of required operands. Update the checks themselves to be >= to the same number instead of > one less than the required number.
llvm-svn: 284365
2016-10-17 06:41:18 +00:00
Craig Topper 5b24cd31f5 [AVX-512] Add shuffle combining support for vpermi2var shuffles derived from existing support for vpermt2var.
llvm-svn: 284357
2016-10-17 04:26:47 +00:00
Craig Topper 715ad7fef5 [AVX-512] Add support for turning a 256-bit load that goes to both halfs of an insert_subvector into a subvector broadcast.
Differential Revision: https://reviews.llvm.org/D25650

llvm-svn: 284353
2016-10-16 23:29:51 +00:00
Craig Topper aa1370ac57 [AVX-512] Fix the operand order for vpermi2var_qi intrinsics to match the other vpermi2var intrinsics.
llvm-svn: 284329
2016-10-16 04:54:35 +00:00
Craig Topper 4729fe8bb6 [AVX-512] Correct execution domain for VPERMT2PS and VPERMI2PS.
llvm-svn: 284328
2016-10-16 04:54:31 +00:00
Craig Topper f18b9201f5 [AVX-512] Move (v4i64 (X86SubVBroadcast (v2i64))) alternate patterns under a HasVLX predicate. Similar for floating point.
llvm-svn: 284327
2016-10-16 04:54:26 +00:00
Davide Italiano e9cdb24f67 [ArmFastISel] Kill dead code. NFCI.
llvm-svn: 284320
2016-10-16 01:09:39 +00:00
Konstantin Zhuravlyov 8ea0246e93 [MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG.
Differential Revision: https://reviews.llvm.org/D24577

llvm-svn: 284312
2016-10-15 22:01:18 +00:00
Craig Topper dde865afb5 [AVX-512] Add shuffle comments for vbroadcast instructions.
llvm-svn: 284305
2016-10-15 16:26:07 +00:00
Craig Topper 51e052f741 [AVX-512] Rename VPBROADCASTI32X2 and VPBROADCASTF32X2 instruction classes to match the mnemonic which does not include a 'P'.
llvm-svn: 284304
2016-10-15 16:26:02 +00:00
Tom Stellard 961811c906 AMDGPU/SI: Handle s_getreg hazard in GCNHazardRecognizer
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25526

llvm-svn: 284298
2016-10-15 00:58:14 +00:00
Tim Northover 69fa84a6e9 GlobalISel: rename legalizer components to match others.
The previous names were both misleading (the MachineLegalizer actually
contained the info tables) and inconsistent with the selector & translator (in
having a "Machine") prefix. This should make everything sensible again.

The only functional change is the name of a couple of command-line options.

llvm-svn: 284287
2016-10-14 22:18:18 +00:00
Guozhi Wei 0cd65429be [PPC] Shorter sequence to load 64bit constant with same hi/lo words
This is a patch to implement pr30640.

When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.

This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.

Differential Revision: https://reviews.llvm.org/D25521

llvm-svn: 284276
2016-10-14 20:41:50 +00:00
Tom Stellard 09c2bd6bd4 AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)

Reviewers: arsenm, nhaehnle

Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24672

llvm-svn: 284267
2016-10-14 19:14:29 +00:00
Krzysztof Parzyszek 775a20913d The real fix for post-r284255 failures
llvm-svn: 284264
2016-10-14 19:06:25 +00:00
Krzysztof Parzyszek a22daa0fa6 Workaround to eliminate check-llvm failures after r284255
llvm-svn: 284262
2016-10-14 18:36:42 +00:00
David L Kreitzer 01a057a0c4 Add a pass to optimize patterns of vectorized interleaved memory accesses for
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.

Patch by Farhana Aleen

Differential revision: http://reviews.llvm.org/D24681

llvm-svn: 284260
2016-10-14 18:20:41 +00:00
Tom Stellard 64a9d0876c AMDGPU/SI: Don't allow unaligned scratch access
Summary: The hardware doesn't support this.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25523

llvm-svn: 284257
2016-10-14 18:10:39 +00:00
Krzysztof Parzyszek 445bd12621 [RDF] Switch RegisterRef to be a pair (Register, LaneMask)
Use PackedRegisterRef to store the register information in the graph nodes.

This commit also removes support for virtual registers. It has never been
tested or used. It will be possible to add it back if there is a need.

llvm-svn: 284255
2016-10-14 17:57:55 +00:00
David L Kreitzer d5c6755d83 [safestack] Use non-thread-local unsafe stack pointer for Contiki OS
Patch by Michael LeMay

Differential revision: http://reviews.llvm.org/D19852

llvm-svn: 284254
2016-10-14 17:56:00 +00:00
Eric Christopher c39f8b0a3a Revert "In preparation for removing getNameWithPrefix off of
TargetMachine," as it's causing sanitizer/memory issues until I
can track down this set.

This reverts commit r284203

llvm-svn: 284252
2016-10-14 17:28:23 +00:00
Pierre Gousseau b6d652adb5 [X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero.
This change adds transformations such as:
  zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
  To:
  srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.

Differential Revision: https://reviews.llvm.org/D23446

llvm-svn: 284248
2016-10-14 16:41:38 +00:00
Nicolai Haehnle 67624af0cc AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes
Summary:
This will be used for 64-bit MULHU, which is in turn used for the 64-bit
divide-by-constant optimization (see D24822).

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25289

llvm-svn: 284224
2016-10-14 10:30:00 +00:00
Simon Dardis b3fd189cb5 [mips] Fix aui/daui/dahi/dati for MIPSR6
For compatiblity with binutils, define these instructions to take
two registers with a 16bit unsigned immediate. Both of the registers
have to be same for dahi and dati.

Reviewers: dsanders, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D21473

llvm-svn: 284218
2016-10-14 09:31:42 +00:00
Nicolai Haehnle bd15c3267f AMDGPU: Fix use-after-frees
Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25312

llvm-svn: 284215
2016-10-14 09:03:04 +00:00
Michael Zuckerman 174d2e784b [x86][ms-inline-asm] use of "jmp short" in asm is not supported
Committing in the name of Ziv Izhar: After check-all and LGTM .

The following patch is for compatability with Microsoft.
Microsoft ignores the keyword "short" when used after a jmp, for example:
__asm {
      jmp short label
      label:
      }

A test for that patch will be added in another patch, since it's located in clang's codegen tests. Link will be added shortly.
link to test: https://reviews.llvm.org/D24958

Differential Revision: https://reviews.llvm.org/D24957

llvm-svn: 284211
2016-10-14 08:09:40 +00:00
Eric Christopher 2bd52b5d91 In preparation for removing getNameWithPrefix off of TargetMachine,
sink the current behavior into the callers and sink
TargetMachine::getNameWithPrefix into TargetMachine::getSymbol.

llvm-svn: 284203
2016-10-14 05:47:41 +00:00
Eric Christopher 445c952bd0 Tidy the calls to getCurrentSection().first -> getCurrentSectionOnly to help
readability a bit.

llvm-svn: 284202
2016-10-14 05:47:37 +00:00
Konstantin Zhuravlyov c96b5d7073 [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables
Differential Revision: https://reviews.llvm.org/D25562

llvm-svn: 284196
2016-10-14 04:37:34 +00:00
Konstantin Zhuravlyov 2a2ac37c2c [AMDGPU] Add 32-bit lo/hi got and pc relative variant kinds and emit appropriate relocations
Differential Revision: https://reviews.llvm.org/D25548

llvm-svn: 284195
2016-10-14 04:21:32 +00:00
Saleem Abdulrasool 7705c4f1be CodeGen: use MSVC division on windows itanium
Windows itanium is identical to MSVC when dealing with everything but C++.
Lower the math routines into msvcrt rather than compiler-rt.

llvm-svn: 284175
2016-10-13 23:00:11 +00:00
Saleem Abdulrasool 06383dd272 CodeGen: adjust floating point operations in Windows itanium
Windows itanium is equivalent to MSVC except in C++ mode.  Ensure that the
promote the 32-bit floating point operations to their 64-bit equivalences.

llvm-svn: 284173
2016-10-13 22:38:15 +00:00
Sriraman Tallam f29fa586e1 New llc option pie-copy-relocations to optimize access to extern globals.
This option indicates copy relocations support is available from the linker
when building as PIE and allows accesses to extern globals to avoid the GOT.

Differential Revision: https://reviews.llvm.org/D24849

llvm-svn: 284160
2016-10-13 20:54:39 +00:00
Nirav Dave a81682aad4 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r284151 which appears to be triggering a LTO
failures on Hexagon

llvm-svn: 284157
2016-10-13 20:23:25 +00:00
Nirav Dave 4b36957243 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.

   Simplify Consecutive Merge Store Candidate Search

   Now that address aliasing is much less conservative, push through
   simplified store merging search which only checks for parallel stores
   through the chain subgraph. This is cleaner as the separation of
   non-interfering loads/stores from the store-merging logic.

   Whem merging stores, search up the chain through a single load, and
   finds all possible stores by looking down from through a load and a
   TokenFactor to all stores visited. This improves the quality of the
   output SelectionDAG and generally the output CodeGen (with some
   exceptions).

   Additional Minor Changes:

       1. Finishes removing unused AliasLoad code
       2. Unifies the the chain aggregation in the merged stores across
       code paths
       3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
       4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

   This finishes the change Matt Arsenault started in r246307 and
   jyknight's original patch.

   Many tests required some changes as memory operations are now
   reorderable. Some tests relying on the order were changed to use
   volatile memory operations

   Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 284151
2016-10-13 19:20:16 +00:00
Quentin Colombet b3f5a8c644 [AArch64][RegisterBankInfo] Switch to fully static opds mapping for G_BITCAST.
NFC.

llvm-svn: 284146
2016-10-13 18:46:38 +00:00
Igor Breger 8409c356ad [X86][AVX512] Fix sext v32i1 -> v32i8 lowering.
Fix PR30600.

Differential Revision: https://reviews.llvm.org/D25554

llvm-svn: 284134
2016-10-13 17:20:38 +00:00
Reid Kleckner 468e793fea Fix for PR30687. Avoid dereferencing MBB.end().
We don't need to return a MachineInstr* from these stack probe insertion
calls anyway. If we ever need to add it back, we can return an iterator
instead.

Based on a patch by David Kreitzer

This bug is a consequence of

r279314 | dexonsmith | 2016-08-19 13:40:12 -0700 (Fri, 19 Aug 2016) | 110 lines

We hit the "Assertion `!NodePtr->isKnownSentinel()' failed" assertion,
but only when inserting a stack probe call at the end of an MBB, which
isn't necessarily a common situation.

Differential Revision: https://reviews.llvm.org/D25566

llvm-svn: 284130
2016-10-13 15:48:48 +00:00
Javed Absar 85874a9360 [ARM]: Assign cost of scaling used in addressing mode for ARM cores
This patch assigns cost of the scaling used in addressing.
On many ARM cores, a negated register offset takes longer than a
non-negated register offset, in a register-offset addressing mode.

For instance:

LDR R0, [R1, R2 LSL #2]
LDR R0, [R1, -R2 LSL #2]

Above, (1) takes less cycles than (2).

By assigning appropriate scaling factor cost, we enable the LLVM
to make the right trade-offs in the optimization and code-selection phase.

Differential Revision: http://reviews.llvm.org/D24857

Reviewers: jmolloy, rengolin
llvm-svn: 284127
2016-10-13 14:57:43 +00:00
Matt Arsenault 253640e18d AMDGPU: Assume spilling will occur at -O0
Because everything live is spilled at the end of a
block by fast regalloc, assume this will happen and
avoid the copies of the resource descriptor.

llvm-svn: 284119
2016-10-13 13:10:00 +00:00
Matt Arsenault dac31db12f AMDGPU: Fix truncate to bool warnings
llvm-svn: 284116
2016-10-13 12:45:16 +00:00
Simon Dardis 515e8699f4 [mips] Add IAS support for dvp, evp
These instructions were only defined for microMIPSR6 previously. Add
definitions for MIPSR6, correct definitions for microMIPSR6, flag these
instructions as having unmodelled side effects (they disable/enable
virtual processors) and add missing disassember tests for microMIPSR6.

Reviewers: vkalintiris

Differential Review: https://reviews.llvm.org/D24291

llvm-svn: 284115
2016-10-13 12:12:56 +00:00
Oren Ben Simhon 92ccbf20ff [X86] Basic additions to support RegCall Calling Convention.
The Register Calling Convention (RegCall) was introduced by Intel to optimize parameter transfer on function call.
This calling convention ensures that as many values as possible are passed or returned in registers.
This commit presents the basic additions to LLVM CodeGen in order to support RegCall in X86.

Differential Revision: http://reviews.llvm.org/D25022

llvm-svn: 284108
2016-10-13 07:53:43 +00:00
Daniel Jasper bee9dea306 Silence unused warning in non-assert builds.
llvm-svn: 284107
2016-10-13 06:39:44 +00:00
Craig Topper ff23af4299 [AVX-512] Teach shuffle lowering to recognize 512-bit zero extends.
llvm-svn: 284105
2016-10-13 05:29:41 +00:00
Craig Topper 8cb2efa58a [X86] Simplify the lowering code for extracting and inserting subvectors.
We don't need to check if AVX is enabled. It's implied by the operation action being set to Custom.
We don't need to check both the input and output type widths. We only need to check the type that's being inserted or extracted. The other type is known to be a legal type and we can assume its a different width.

llvm-svn: 284102
2016-10-13 04:14:47 +00:00
Quentin Colombet 6b87a3109c [AArch64][RegisterBankInfo] Provide alternative mappings for 64-bit load
This allows RegBankSelect in greedy mode to get rid some of the cross
register bank copies when loads are involved in the chain of
computation.

llvm-svn: 284097
2016-10-13 01:01:23 +00:00
Reid Kleckner 741d8a21d3 Correct PrivateLinkage for COFF
- Use storage class C_STAT for 'PrivateLinkage' The storage class for
  PrivateLinkage should equal to the Internal Linkage.

- Set 'PrivateGlobalPrefix' from "L" to ".L" for MM_WinCOFF (includes
  x86_64) MM_WinCOFF has empty GlobalPrefix '\0' so PrivateGlobalPrefix
  "L" may conflict to the normal symbol name starting with 'L'.

Based on a patch by Han Sangjin! Manually updated test cases.

llvm-svn: 284096
2016-10-13 00:55:24 +00:00
Quentin Colombet cd80e97e88 [AArch64][RegisterBankInfo] Provide alternative mappings for G_BITCASTs.
Thanks to this patch, RegBankSelect is able to get rid of some register
bank copies as demonstrated in the test case.

llvm-svn: 284094
2016-10-13 00:34:48 +00:00
Quentin Colombet 45c9c1432f [AArch64][RegisterBankInfo] Describe cross regbank copies statically.
NFC.

llvm-svn: 284091
2016-10-13 00:12:06 +00:00
Quentin Colombet 9e64919b7c [AArch64][RegisterBankInfo] Use static mapping for same bank G_BITCAST.
NFC.

llvm-svn: 284090
2016-10-13 00:12:04 +00:00
Quentin Colombet db643d9091 [AArch64][MachineLegalizer] Mark more G_BITCAST as legal.
Basically any vector types that fits in a 32-bit register is also valid
as far as copies are concerned.

llvm-svn: 284089
2016-10-13 00:12:01 +00:00
Quentin Colombet f760799c40 [AArch64][RegisterBankInfo] Bump the cost of vector loads.
This does not change anything yet, because we do not offer any
alternative mapping.

llvm-svn: 284088
2016-10-13 00:11:59 +00:00
Quentin Colombet f35a8c5bdc [AArch64][RegisterBankInfo] Use a proper cost for cross regbank G_BITCASTs.
This does not change anything yet, because we do not offer any
alternative mapping.

llvm-svn: 284087
2016-10-13 00:11:57 +00:00
Quentin Colombet 27b40356f7 [AArch64][RegisterBankInfo] Provide more realistic copy costs.
llvm-svn: 284086
2016-10-13 00:11:55 +00:00
Tim Northover fb8d989818 GlobalISel: support G_TRUNC selection on AArch64.
Ahmed's patch again.

llvm-svn: 284075
2016-10-12 22:49:15 +00:00
Tim Northover 69271c64d5 GlobalISel: support int <-> float conversions on AArch64.
More of Ahmed's work.

llvm-svn: 284074
2016-10-12 22:49:11 +00:00
Tim Northover 7dd378dd08 GlobalISel: select G_FCMP instructions on AArch64.
Another of Ahmed's patches.

llvm-svn: 284073
2016-10-12 22:49:07 +00:00
Tim Northover 6c02ad5e4f GlobalISel: support selection of G_ICMP on AArch64.
Patch from Ahmed Bougaca again.

llvm-svn: 284072
2016-10-12 22:49:04 +00:00
Tim Northover 5e3dbf326c GlobalISel: select G_BRCOND instructions on AArch64.
llvm-svn: 284071
2016-10-12 22:49:01 +00:00
Tim Northover 6aacd27cd7 GlobalISel: mark G_BRCOND on s1 as legal.
It's going to be a TBNZ (at -O0) anyway, so the high bits don't matter.

llvm-svn: 284070
2016-10-12 22:48:36 +00:00
Albert Gutowski 795d7d6381 Create llvm.addressofreturnaddress intrinsic
Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows.

Reviewers: majnemer, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25293

llvm-svn: 284061
2016-10-12 22:13:19 +00:00
Matt Arsenault d486d3f8d1 AMDGPU: Initial implementation of VGPR indexing mode
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.

llvm-svn: 284031
2016-10-12 18:49:05 +00:00
Matt Arsenault cc88ce36ed AMDGPU: Add instruction definitions for VGPR indexing
VI added a second method of indexing into VGPRs
besides using v_movrel*

llvm-svn: 284027
2016-10-12 18:00:51 +00:00
Tom Stellard fac248cb5f AMDGPU/SI: Change mimg intrinsic signatures
This makes more fields overridable and removes redundant bits.

Patch by: Changpeng Fang

llvm-svn: 284024
2016-10-12 16:35:29 +00:00
Alexey Bataev b271a58e37 NFC: The Cost Model specialization, by Andrey Tischenko
The current Cost Model implementation is very inaccurate and has to be
updated, improved, re-implemented to be able to take into account the
concrete CPU models and the concrete targets where this Cost Model is
being used. For example, the Latency Cost Model should be differ from
Code Size Cost Model, etc.
This patch is the first step to launch the developing and implementation
of a new Cost Model generation.

Differential Revision: https://reviews.llvm.org/D25186

llvm-svn: 284012
2016-10-12 13:24:13 +00:00
Quentin Colombet 9de30faeac [AArch64][InstrustionSelector] Teach the selector about G_BITCAST.
llvm-svn: 283973
2016-10-12 03:57:52 +00:00
Quentin Colombet cb629a897c [AArch64][InstructionSelector] Refactor the handling of copies.
Although Copies are not specific to preISel, we still have to assign them
a proper register class. However, given they are not constrained to
anything we do not have to handle the source register at the copy. It
will be properly mapped when reaching the related definition.

In the process, the handlong of G_ANYEXT is slightly modified as those
end up being selected as copy. The difference is that when register size
do not match on both sides, we need to insert SUBREG_TO_REG operation,
otherwise the post RA copy expansion will not be happy!

llvm-svn: 283972
2016-10-12 03:57:49 +00:00
Quentin Colombet 404e4350dc [AArch64][MachineLegalizer] Mark more bitcasts as legal.
Those are copies, we do not have to do any legalization action for them.

llvm-svn: 283970
2016-10-12 03:57:43 +00:00
Tim Shen 4ff62b187e [PPCMIPeephole] Fix splat elimination
Summary:
In PPCMIPeephole, when we see two splat instructions, we can't simply do the following transformation:
  B = Splat A
  C = Splat B
=>
  C = Splat A
because B may still be used between these two instructions. Instead, we should make the second Splat a PPC::COPY and let later passes decide whether to remove it or not:
  B = Splat A
  C = Splat B
=>
  B = Splat A
  C = COPY B

Fixes PR30663.

Reviewers: echristo, iteratee, kbarton, nemanjai

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D25493

llvm-svn: 283961
2016-10-12 00:48:25 +00:00
Tim Northover c1d8c2bf8c GlobalISel: support same-size casts on AArch64.
Mostly Ahmed's work again, I'm just sprucing things up slightly before
committing.

llvm-svn: 283952
2016-10-11 22:29:23 +00:00
Reid Kleckner bdfc05ff93 Re-land "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"
Reverts r283938 to reinstate r283867 with a fix.

The original change had an ArrayRef referring to a destroyed temporary
initializer list. Use plain C arrays instead.

llvm-svn: 283942
2016-10-11 21:14:03 +00:00
Reid Kleckner f4876beb2b Revert "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"
This reverts r283867.

This appears to be an infinite loop:

    while (HiRegToSave != AllHighRegs.end() && CopyReg != AllCopyRegs.end()) {
      if (HiRegsToSave.count(*HiRegToSave)) {
        ...

        CopyReg = findNextOrderedReg(++CopyReg, CopyRegs, AllCopyRegs.end());
        HiRegToSave =
            findNextOrderedReg(++HiRegToSave, HiRegsToSave, AllHighRegs.end());
      }
    }

llvm-svn: 283938
2016-10-11 20:54:41 +00:00
Tim Northover 3d38b3a4d1 GlobalISel: support selection of extend operations.
Patch mostly by Ahmed Bougaca.

llvm-svn: 283937
2016-10-11 20:50:21 +00:00
Konstantin Zhuravlyov cdd4547607 [AMDGPU] Refactor waitcnt encoding
- Refactor bit packing/unpacking
- Calculate bit mask given bit shift and bit width
- Introduce function for decoding bits of waitcnt
- Introduce function for encoding bits of waitcnt
- Introduce function for getting waitcnt mask (instead of using bare numbers)
- Introduce function fot getting max waitcnt(s) (instead of using bare numbers)

Differential Revision: https://reviews.llvm.org/D25298

llvm-svn: 283919
2016-10-11 18:58:22 +00:00
Mehdi Amini caec5559cb Fix "static initialization order fiasco" for the XCore Target.
I fixed all the other Targets in r283702, and interestingly the
sanitizers are only now "sometimes" catching this bug on the only
one I missed.

llvm-svn: 283914
2016-10-11 18:22:41 +00:00
NAKAMURA Takumi e9587bd771 ARMMachineFunctionInfo.cpp: Add an initializer of ARMFunctionInfo::ReturnRegsCount in the explicit ctor.
It caused crash since r283867.

llvm-svn: 283909
2016-10-11 17:38:30 +00:00
NAKAMURA Takumi e61de02016 Reformat.
llvm-svn: 283908
2016-10-11 17:38:25 +00:00
Daniel Jasper b617286089 Silence unused warning in non-assert builds.
llvm-svn: 283899
2016-10-11 16:22:36 +00:00
Changpeng Fang 98317d20f4 AMDGPU/SI: Update ISA version numbers for Tonga and Polaris10/11.
Differential Revision:
  http://reviews.llvm.org/D25454

Reviewers:
  tstellarAMD

llvm-svn: 283893
2016-10-11 16:00:47 +00:00
Oliver Stannard d2083fb356 [Thumb] Save/restore high registers in Thumb1 pro/epilogues
The high registers are not allocatable in Thumb1 functions, but they
could still be used by inline assembly, so we need to save and restore
the callee-saved high registers (r8-r11) in the prologue and epilogue.

This is complicated by the fact that the Thumb1 push and pop
instructions cannot access these registers. Therefore, we have to move
them down into low registers before pushing, and move them back after
popping into low registers.

In most functions, we will have low registers that are also being
pushed/popped, which we can use as the temporary registers for
saving/restoring the high registers. However, this is not guaranteed, so
we may need to push some extra low registers to ensure that the high
registers can be saved/restored. For correctness, it would be sufficient
to use just one low register, but if we have enough low registers
available then we only need one push/pop instruction, rather than one
per high register.

We can also use the argument/return registers when they are not live,
and the link register when saving (but not restoring), reducing the
number of extra registers we need to push.

There are still a few extreme edge cases where we need two push/pop
instructions, because not enough low registers can be made live in the
prologue or epilogue.

In addition to the regression tests included here, I've also tested this
using a script to generate functions which clobber different
combinations of registers, have different numbers of argument and return
registers (including variadic arguments), allocate different fixed sized
objects on the stack, and do or don't use variable sized allocas and the
__builtin_return_address intrinsic (all of which affect the available
registers in the prologue and epilogue). I ran these functions in a test
harness which verifies that all of the callee-saved registers are
correctly preserved.

Differential Revision: https://reviews.llvm.org/D24228

llvm-svn: 283867
2016-10-11 10:12:25 +00:00
Oliver Stannard 50a74393c2 [ARM] Fix registers clobbered by SjLj EH on soft-float targets
Currently, the Int_eh_sjlj_dispatchsetup intrinsic is marked as
clobbering all registers, including floating-point registers that may
not be present on the target. This is technically true, as we could get
linked against code that does use the FP registers, but that will not
actually work, as the soft-float code cannot save and restore the FP
registers. SjLj exception handling can only work correctly if either all
or none of the code is built for a target with FP registers. Therefore,
we can assume that, when Int_eh_sjlj_dispatchsetup is compiled for a
soft-float target, it is only going to be linked against other
soft-float code, and so only clobbers the general-purpose registers.
This allows us to check that no non-savable registers are clobbered when
generating the prologue/epilogue.

Differential Revision: https://reviews.llvm.org/D25180

llvm-svn: 283866
2016-10-11 10:06:59 +00:00
Diana Picus c93518db8c [AArch64] Allow label arithmetic with add/sub/cmp
Allow instructions such as 'cmp w0, #(end - start)' by folding the
expression into a constant. For ELF, we fold only if the symbols are in
the same section. For MachO, we fold if the expression contains only
symbols that are not linker visible.

Fixes https://llvm.org/bugs/show_bug.cgi?id=18920

Differential Revision: https://reviews.llvm.org/D23834

llvm-svn: 283862
2016-10-11 09:17:47 +00:00
Quentin Colombet d2623f8e38 [AArch64][InstructionSelector] Teach how to select FP load/store.
This patch allows to select 32 and 64-bit FP load and store.

llvm-svn: 283832
2016-10-11 00:21:14 +00:00
Quentin Colombet 0e5312787e [AArch64][InstructionSelector] Teach the selector how to handle vector OR.
This only adds the support for 64-bit vector OR. Adding more sizes is
not difficult, but it requires a bigger refactoring because ORs work on
any size, not necessarly the ones that match the width of the register
width. Right now, this is not expressed in the legalization, so don't
bother pushing the refactoring yet.

llvm-svn: 283831
2016-10-11 00:21:11 +00:00
Quentin Colombet d3126d5fb4 [AArch64][MachineLegalizer] Mark v2s32 G_LOAD as legal.
Actually every 64-bit loads are legal, but right now the API does not
offer a simple way to express that.

llvm-svn: 283829
2016-10-11 00:21:08 +00:00
Peter Collingbourne 0da86301ad Revert r283690, "MC: Remove unused entities."
llvm-svn: 283814
2016-10-10 22:49:37 +00:00
Tim Northover bdf1624367 GlobalISel: select G_GLOBAL_VALUE uses on AArch64.
llvm-svn: 283809
2016-10-10 21:50:00 +00:00
Tim Northover ad0acca544 GlobalISel: allow G_GLOBAL_VALUEs in AArch64 legalization.
llvm-svn: 283808
2016-10-10 21:49:53 +00:00
Tim Northover 2fda4b08ae GlobalISel: support selecting G_GEP instructions.
They're basically just an alias for G_ADD on AArch64.

llvm-svn: 283807
2016-10-10 21:49:49 +00:00
Tim Northover 4edc60d785 GlobalISel: support selecting constants on AArch64.
llvm-svn: 283806
2016-10-10 21:49:42 +00:00
Alexandros Lamprineas 20e9ddba73 [ARM] Fix invalid VLDM/VSTM access when targeting Big Endian with NEON
The instructions VLDM/VSTM can only access word-aligned memory
locations and produce alignment fault if the condition is not met.

The compiler currently generates VLDM/VSTM for v2f64 load/store
regardless the alignment of the memory access. Instead, if a v2f64
load/store is not word-aligned, the compiler should generate
VLD1/VST1. For each non double-word-aligned VLD1/VST1, a VREV
instruction should be generated when targeting Big Endian.

Differential Revision: https://reviews.llvm.org/D25281

llvm-svn: 283763
2016-10-10 16:01:54 +00:00
Zvi Rackover 2a21f125bd [X86] Prefer rotate by 1 over rotate by imm
Summary:
Rotate by 1 is translated to 1 micro-op, while rotate with imm8 is translated to 2 micro-ops.

Fixes pr30644.

Reviewers: delena, igorb, craig.topper, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D25399

llvm-svn: 283758
2016-10-10 14:43:55 +00:00
Chris Dewhurst 850131213f This pass, fixing an erratum in some LEON 2 processors ensures that the SDIV instruction is not issued, but replaced by SDIVcc instead, which does not exhibit the error. Unit test included.
Differential Review: https://reviews.llvm.org/D24660

llvm-svn: 283727
2016-10-10 08:53:06 +00:00
Daniel Jasper 0dea246b4f Fix WebAssembly build after r283702.
llvm-svn: 283723
2016-10-10 06:49:55 +00:00
Craig Topper 9ece2f7529 [AVX-512] Add missing pattern sext or zext from bytes to quad words with a 128-bit load as input.
llvm-svn: 283720
2016-10-10 06:25:48 +00:00
Michael Zuckerman 3eeac2d56b [x86][inline-asm][llvm] accept 'v' constraint
Commit in the name of:Coby Tayree
1.'v' constraint for (x86) non-avx arch imitates the already implemented 'x' constraint, i.e. allows XMM{0-15} & YMM{0-15} depending on the apparent arch & mode (32/64).
2.for the avx512 arch it allows [X,Y,Z]MM{0-31} (mode dependent)

This patch applies the needed changes to clang
 clang patch: https://reviews.llvm.org/D25004

Differential Revision: D25005
 

llvm-svn: 283717
2016-10-10 05:48:56 +00:00
Dylan McKay 1a523767dc [AVR] Enable generation of the TableGen assembly writer tables
This also changes the order of the statements in CMakeLists.txt to be
alphabetical.

llvm-svn: 283711
2016-10-10 01:28:45 +00:00
Craig Topper 64378f4378 [AVX-512] Port 128 and 256-bit memory->register sign/zero extend patterns from SSE file. Also add a minimal set for 512-bit.
llvm-svn: 283704
2016-10-09 23:08:39 +00:00
Craig Topper 29558b8284 [X86] Remove redundant patterns. The same pattern appears a few lines up.
llvm-svn: 283703
2016-10-09 23:08:33 +00:00
Mehdi Amini f42454b94b Move the global variables representing each Target behind accessor function
This avoids "static initialization order fiasco"

Differential Revision: https://reviews.llvm.org/D25412

llvm-svn: 283702
2016-10-09 23:00:34 +00:00
Elena Demikhovsky 5b10aa1f1e DAG: Setting Masked-Expand-Load as a variant of Masked-Load node
Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector.
Right now, the node is used in intrinsics for VEXPAND* instructions. 
The work is done towards implementation of masked.expandload and masked.compressstore intrinsics.

Differential Revision: https://reviews.llvm.org/D25322

llvm-svn: 283694
2016-10-09 10:48:52 +00:00
Craig Topper 43973154dd [AVX-512] Fix execution domain for EVEX encoded VINSERTPS.
llvm-svn: 283692
2016-10-09 06:41:47 +00:00
Peter Collingbourne cc723cccab MC: Remove unused entities.
llvm-svn: 283691
2016-10-09 04:39:13 +00:00
Peter Collingbourne 5c924d7117 Target: Remove unused entities.
llvm-svn: 283690
2016-10-09 04:38:57 +00:00
Craig Topper e30cb00dc0 [AVX-512] Add subvector insert and extract to load/store folding tables.
llvm-svn: 283689
2016-10-09 03:54:13 +00:00
Craig Topper 4262d53024 [AVX-512] Add the vector down convert instructions to the store folding tables.
llvm-svn: 283687
2016-10-09 03:54:05 +00:00
Mehdi Amini 732afdd09a Turn cl::values() (for enum) from a vararg function to using C++ variadic template
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:

 va_start(ValueArgs, Desc);

with Desc being a StringRef.

Differential Revision: https://reviews.llvm.org/D25342

llvm-svn: 283671
2016-10-08 19:41:06 +00:00
Craig Topper 086f0c1401 [AVX-512] Fix a bug in getLargestLegalSuperClass where we inflated to VR128X/VR256X even when VLX isn't supported.
This seems to have been responsible for the XMM16-31 spills observed in PR29112. With this fixed the test case has been modified to no longer have a spill of XMM16.

llvm-svn: 283668
2016-10-08 18:49:57 +00:00
Colin LeMahieu c69f7ff6c0 [Hexagon] Adding change of flow max 1 (cofMax1) TS flag for marking this restriction rather than implying it from TypeJR.
llvm-svn: 283665
2016-10-08 17:18:51 +00:00
Sebastian Pop eb65d72d9c [AArch64] Avoid generating indexed vector instructions for Exynos
Avoid generating indexed vector instructions for Exynos. This is needed for
fmla/fmls/fmul/fmulx. For example, the instruction

  fmla v0.4s, v1.4s, v2.s[1]

is less efficient than the instructions

  dup v2.4s, v2.s[1]
  fmla v0.4s, v1.4s, v2.4s

Patch written by Abderrazek Zaafrani.

Differential Revision: https://reviews.llvm.org/D21571

llvm-svn: 283663
2016-10-08 12:30:07 +00:00
Dylan McKay f96ffe1ebf [AVR] Add backend dependencies to MCTargetDesc/LLVMBuild.txt
llvm-svn: 283642
2016-10-08 01:14:23 +00:00
Dylan McKay 552b7856d3 Fix incorrect assertion in AVRFrameLowering.cpp
This wasn't looking at the right instruction, and would always fail.

llvm-svn: 283640
2016-10-08 01:10:36 +00:00
Dylan McKay b16b6d5739 [AVR] Don't worry about call frame size when initializing frame pointer
We previously only used the frame pointer if the frame pointer was too
big. This was to work around a bug (described in this old commit)

https://sourceforge.net/p/avr-llvm/code/204/tree//llvm/trunk/AVR/AVRFrameLowering.cpp?diff=50d64d912718465cb887d17a:203

I mistakenly invered the condition assuming it was a typo. I am now
removing it because it doesn't seem to be a problem anymore (plus it's a
dirty hack).

llvm-svn: 283639
2016-10-08 01:10:31 +00:00
Dylan McKay 7c2d41aa9f [AVR] Don't shadow container while iterating in range-based loop
This works on clang, but fails on GCC 4.6

llvm-svn: 283638
2016-10-08 01:09:06 +00:00
Dylan McKay a1a944e3cb [AVR] Use references rather than pointers in AVRISelLowering
llvm-svn: 283636
2016-10-08 01:06:21 +00:00
Dylan McKay 12109e7314 Allow a maximum of 64 bits to be returned in registers
The rest spills to the stack

Authored by Jake Goulding

llvm-svn: 283635
2016-10-08 01:05:09 +00:00
Dylan McKay c1ff65cf62 [AVR] Expand MULHS for all types
Once MULHS was expanded, this exposed an issue where the condition
register was thought to be 16-bit. This caused an attempt to copy a
16-bit register to an 8-bit register.

Authored by Jake Goulding

llvm-svn: 283634
2016-10-08 01:01:49 +00:00
Dylan McKay ddb7a59fe9 [AVR] Add the 'SoftFail' field to all instruction formats
This will be used in the future for disassembly.

llvm-svn: 283630
2016-10-08 00:55:46 +00:00
Dylan McKay 24d02ee141 [AVR] Set up the instruction printer and the assembly backend
llvm-svn: 283629
2016-10-08 00:50:11 +00:00
Dylan McKay 2b0936d41d [AVR] Add dependencies to AVR libraries in AVRCodeGen
llvm-svn: 283628
2016-10-08 00:45:24 +00:00
Dylan McKay 07897f5492 [AVR] Add missing subdirectories to LLVMBuild
llvm-svn: 283627
2016-10-08 00:42:58 +00:00
Dylan McKay 4d82df32b9 [AVR] Add the assembly printer
Summary: This adds the AVRAsmPrinter class.

Reviewers: arsenm, kparzysz

Subscribers: llvm-commits, wdng, beanz, japaric, mgorny

Differential Revision: https://reviews.llvm.org/D25271

llvm-svn: 283623
2016-10-08 00:02:36 +00:00
Tom Stellard 5ab6154dc3 AMDGPU/SI: Handle div_fmas hazard in GCNHazardRecognizer
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25250

llvm-svn: 283622
2016-10-07 23:42:48 +00:00
Tom Stellard 6982bb8f25 AMDGPU/SI: Add support for 8-byte relocations
Reviewers: arsenm, kzhuravl

Subscribers: wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25375

llvm-svn: 283593
2016-10-07 20:36:58 +00:00
Colin LeMahieu 9694825d32 [Hexagon][NFC] Using documented instruction type name V4LDST instead of MEMOP.
llvm-svn: 283582
2016-10-07 19:11:28 +00:00
Tom Stellard ef33c4b3f2 AMDGPU/SI: Emit fixups for long branches
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25366

llvm-svn: 283570
2016-10-07 16:01:18 +00:00
Artem Tamazov 73f1ab28cd [AMDGPU][mc] Add support for buffer_load_dwordx3, buffer_store_dwordx3.
Partially fixes Bug 28232.
Lit tests added.

Differential Revision: https://reviews.llvm.org/D25367

llvm-svn: 283567
2016-10-07 15:53:16 +00:00
Sam Kolton a3ec5c10e2 [AMDGPU] Assembler: support v_mac_f32 DPP and SDWA. Move getNamedOperandIdx to AMDGPUBaseInfo.h
Reviewers: artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D25084

llvm-svn: 283560
2016-10-07 14:46:06 +00:00
Konstantin Zhuravlyov c09e2d7e46 [AMDGPU] AMDGPUCodeGenPrepare: remove extra ';'
llvm-svn: 283558
2016-10-07 14:39:53 +00:00
Konstantin Zhuravlyov f74fc60a7d [AMDGPU] Promote uniform (i1, i16] operations to i32
Differential Revision: https://reviews.llvm.org/D25302

llvm-svn: 283555
2016-10-07 14:22:58 +00:00
Javed Absar 9797989ca7 [ARM]: add missing switch case for cortex-r52
Adds a missing switch case for handling cortex-r52
in init-subtarget-features.

llvm-svn: 283551
2016-10-07 13:41:55 +00:00
Martin Storsjo 04864f45b2 [ARM] Reapply: Use __rt_div functions for divrem on Windows
Reapplying r283383 after revert in r283442. The additional fix
is a getting rid of a stray space in a function name, in the
refactoring part of the commit.

This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.

The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).

Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.

Differential Revision: https://reviews.llvm.org/D25332

llvm-svn: 283550
2016-10-07 13:28:53 +00:00
Javed Absar fb4b6e8db9 [ARM]: Add Cortex-R52 target to LLVM
This patch adds Cortex-R52, the new ARM real-time processor, to LLVM. 
Cortex-R52 implements the ARMv8-R architecture.

llvm-svn: 283542
2016-10-07 12:06:40 +00:00
Simon Pilgrim a5d019ee95 [X86][SSE] Update register class during MOVSD/MOVSS - BLENDPD/BLENDPS commutation
MOVSD/MOVSS take a 128-bit register and a FR32/FR64 register input, the commutation code wasn't taking this into account leading to verification errors.

This patch inserts a vreg copy mi to ensure that the registers are correct.

Fix for PR30607

Differential Revision: https://reviews.llvm.org/D25280

llvm-svn: 283539
2016-10-07 11:18:38 +00:00
Oliver Stannard 4df1cc0b00 [ARM] Don't convert switches to lookup tables of pointers with ROPI/RWPI
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.

Differential Revision: https://reviews.llvm.org/D24462

llvm-svn: 283530
2016-10-07 08:48:24 +00:00
Mehdi Amini 68c6c8cd78 Use StringRef in ARMELFStreamer (NFC)
llvm-svn: 283529
2016-10-07 08:48:07 +00:00
Nicolai Haehnle 87bc4c218b AMDGPU: Fix use-after-free in SIOptimizeExecMasking
Summary:
There was a bug with sequences like

   s_mov_b64 s[0:1], exec
   s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill>
   ...
   s_mov_b64_term exec, s[2:3]

because s[2:3] was defined and used in the same instruction, ending up with
SaveExecInst inside OtherUseInsts.

Note that the test case also exposes an unrelated bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25306

llvm-svn: 283528
2016-10-07 08:40:14 +00:00
Mehdi Amini a0016ec95f Use StringReg in TargetParser APIs (NFC)
llvm-svn: 283527
2016-10-07 08:37:29 +00:00
Mehdi Amini 9ff8e87ca4 Revert "Revert "Add a static_assert to enforce that parameters to llvm::format() are not totally unsafe""
This reverts commit r283510 and reapply r283509, with updates to
clang-tools-extra as well.

llvm-svn: 283525
2016-10-07 08:25:42 +00:00
Craig Topper 948625633f [X86] Fix patterns for VPMULLD and VPCMPEQQ to not require aligned loads.
llvm-svn: 283524
2016-10-07 06:54:43 +00:00
Craig Topper 871da8ebea [X86] Remove unused PatFrags. NFC
llvm-svn: 283523
2016-10-07 06:54:39 +00:00
Dylan McKay e5d89e8001 [AVR] Add the AVRMCInstLower class
Summary:
This class deals with the lowering of CodeGen `MachineInstr` objects to
MC `MCInst` objects.

Reviewers: kparzysz, arsenm

Subscribers: wdng, beanz, japaric, mgorny

Differential Revision: https://reviews.llvm.org/D25269

llvm-svn: 283522
2016-10-07 06:13:09 +00:00
Peter Collingbourne 2261d78cd2 Target: Remove unused patterns and transforms. NFC.
llvm-svn: 283515
2016-10-07 00:30:49 +00:00
Colin LeMahieu 8ed1aee9dd [Hexagon] NFC Removing 'V4_' prefix from duplex instruction names.
llvm-svn: 283514
2016-10-07 00:15:07 +00:00
Mehdi Amini 292f376934 Revert "Add a static_assert to enforce that parameters to llvm::format() are not totally unsafe"
This reverts commit r283509, clang is hitting the assert.

llvm-svn: 283510
2016-10-06 23:41:49 +00:00
Mehdi Amini a7e893f638 Add a static_assert to enforce that parameters to llvm::format() are not totally unsafe
Summary:
I had for the second time today a bug where llvm::format("%s", Str)
was called with Str being a StringRef. The Linux and MacOS bots were
fine, but windows having different calling convention, it printed
garbage.

Instead we can catch this at compile-time: it is never expected to
call a C vararg printf-like function with non scalar type I believe.

Reviewers: bogner, Bigcheese, dexonsmith

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25266

llvm-svn: 283509
2016-10-06 23:26:29 +00:00
Colin LeMahieu 9675de5ba8 [Hexagon] NFC. Canonicalizing absolute address instruction names.
llvm-svn: 283507
2016-10-06 23:02:11 +00:00
Dan Gohman 2726b88c03 [WebAssemby] Implement block signatures.
Per spec changes, this implements block signatures, and adds just enough
logic to produce correct block signatures at the ends of functions.

Differential Revision: https://reviews.llvm.org/D25144

llvm-svn: 283503
2016-10-06 22:29:32 +00:00
Dan Gohman 3a643e8d46 [WebAssembly] Remove loop's bottom label.
Per spec changes, loop constructs no longer have a bottom label.

https://reviews.llvm.org/D25118

llvm-svn: 283502
2016-10-06 22:10:23 +00:00
Dan Gohman 7f1bdb2e02 [WebAssembly] Remove the output operand from stores.
Per spec changes, store instructions in WebAssembly no longer have a return
value. Update the instruction descriptions.

Differential Revision: https://reviews.llvm.org/D25122

llvm-svn: 283501
2016-10-06 22:08:28 +00:00
Michael Kuperstein e524e22846 [X86] Preserve BasePtr for LEA64_32r
When replacing FrameIndex with BasePtr, we must preserve BasePtr for
LEA64_32r since BasePtr is used later for stack adjustment if it is
the same as StackPtr.

Patch by H.J Lu <hjl.tools@gmail.com>

Differential Revision: https://reviews.llvm.org/D23575

llvm-svn: 283486
2016-10-06 19:31:27 +00:00
Matt Arsenault 5e63a04e46 AMDGPU: Don't fold undef uses or copies with implicit uses
llvm-svn: 283476
2016-10-06 18:12:13 +00:00
Matt Arsenault c59a92387e AMDGPU: Remove scheduling info from si_mask_branch
llvm-svn: 283475
2016-10-06 18:12:07 +00:00
Matt Arsenault c2ee42cd16 AMDGPU: Remove leftover implicit operands when folding immediates
When constant folding an operation to a copy or an immediate
mov, the implicit uses/defs of the old instruction were left behind,
e.g. replacing v_or_b32 left the implicit exec use on the new copy.

llvm-svn: 283471
2016-10-06 17:54:30 +00:00
Matt Arsenault 11f7402075 Reapply "AMDGPU: Support using tablegened MC pseudo expansions"
Fix bad merge

llvm-svn: 283470
2016-10-06 17:19:11 +00:00
Matt Arsenault cbc879ee2f Revert "AMDGPU: Support using tablegened MC pseudo expansions"
llvm-svn: 283469
2016-10-06 17:08:01 +00:00
Matt Arsenault d20a2dd7ac AMDGPU: Support using tablegened MC pseudo expansions
Make the necessary refactorings to make use of PseudoInstExpansion

llvm-svn: 283467
2016-10-06 16:56:41 +00:00
Matt Arsenault 6bc43d8627 BranchRelaxation: Support expanding unconditional branches
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.

llvm-svn: 283464
2016-10-06 16:20:41 +00:00
Krzysztof Parzyszek d391d6f1c3 [Hexagon] Avoid replacing full regs with subregisters in tied operands
Doing so will result in the two-address pass generating incorrect code.

llvm-svn: 283463
2016-10-06 16:18:04 +00:00
Matt Arsenault 36919a4f7c Move AArch64BranchRelaxation to generic code
llvm-svn: 283459
2016-10-06 15:38:53 +00:00
Matt Arsenault 0a3ea89e85 AArch64: Move remaining target specific BranchRelaxation bits to TII
llvm-svn: 283458
2016-10-06 15:38:09 +00:00
Nirav Dave ee554e6155 [X86] Fix intel syntax push parsing bug
Change erroneous parsing of push immediate instructions in intel syntax
to default to pointer size by rewriting into the ATT style for matching.

This fixes PR22028.

Reviewers: majnemer, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25288

llvm-svn: 283457
2016-10-06 15:28:08 +00:00
Sam Kolton 3381d7a216 [AMDGPU] Disassembler: print label names in branch instructions
Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table.
Initialize MCObjectFileInfo with some default values.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D24802

llvm-svn: 283450
2016-10-06 13:46:08 +00:00
Krzysztof Parzyszek 459a1c9f2b [RDF] Replace some expensive copies with references in range-based loops
llvm-svn: 283446
2016-10-06 13:05:46 +00:00
Krzysztof Parzyszek 61d9032bf3 [RDF] Replace potentially unclear autos with real types
llvm-svn: 283445
2016-10-06 13:05:13 +00:00
Diana Picus 6341e46cd1 Revert "[ARM] Use __rt_div functions for divrem on Windows"
This reverts commit r283383 because it broke some of the bots:
undefined reference to ` __aeabi_uldivmod'

It affected (at least) clang-cmake-armv7-a15-selfhost,
clang-cmake-armv7-a15-selfhost and clang-native-arm-lnt.

llvm-svn: 283442
2016-10-06 11:24:29 +00:00
Matt Arsenault 10c17ca6c6 AMDGPU: Partially fix reported code size for some instructions
These ones need to have the size on the pseudo instruction set for
getInstSizeInBytes to work correctly. These also have a statically
known size.

llvm-svn: 283437
2016-10-06 10:13:23 +00:00
James Molloy 6215fad0e9 [ARM] Constant pool promotion - fix alignment calculation
Global variables are GlobalValues, so they have explicit alignment. Querying
DataLayout for the alignment was incorrect.

Testcase added.

llvm-svn: 283423
2016-10-06 07:56:00 +00:00
Petr Hosek e023d62e76 [Triple] Add triple for Fuchsia
Fuchsia is a new operating system.

Differential Revision: https://reviews.llvm.org/D25116

llvm-svn: 283419
2016-10-06 05:17:26 +00:00
Konstantin Zhuravlyov b4eb5d5049 [AMDGPU] Promote uniform i16 bitreverse intrinsic to i32
Differential Revision: https://reviews.llvm.org/D25121

llvm-svn: 283415
2016-10-06 02:20:46 +00:00
David Callahan c1051ab26e Modify df_iterator to support post-order actions
Summary: This makes a change to the state used to maintain visited information for depth first iterator. We know assume a method "completed(...)" which is called after all children of a node have been visited. In all existing cases, this method does nothing so this patch has no functional changes.  It will however allow a client to distinguish back from cross edges in a DFS tree.

Reviewers: nadav, mehdi_amini, dberlin

Subscribers: MatzeB, mzolotukhin, twoh, freik, llvm-commits

Differential Revision: https://reviews.llvm.org/D25191

llvm-svn: 283391
2016-10-05 21:36:16 +00:00
Dan Gohman 5a68ec7f09 [WebAssembly] Add binary-encoding opcode values to instruction descriptions.
llvm-svn: 283389
2016-10-05 21:24:08 +00:00
Martin Storsjo f997759aef [ARM] Use __rt_div functions for divrem on Windows
This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.

The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).

Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.

Differential Revision: https://reviews.llvm.org/D24076

llvm-svn: 283383
2016-10-05 21:08:02 +00:00
James Y Knight b0a473aaf8 [Sparc] Implement UMUL_LOHI and SMUL_LOHI instead of MULHS/MULHU/MUL.
This is what the instruction-set actually provides, and the default
expansions of the others into the lohi opcodes are good.

llvm-svn: 283381
2016-10-05 20:54:17 +00:00
Krzysztof Parzyszek 3b6cbd55f7 [RDF] Fix live def propagation through basic block
llvm-svn: 283371
2016-10-05 20:08:09 +00:00
Matthias Braun 0a6916f303 AMDGPU: Do not re-use tmpreg in spill/restore lowering
The register scavenging code does not support multiple definitions of
the same vreg.

Differential Revision: https://reviews.llvm.org/D25220

llvm-svn: 283369
2016-10-05 20:02:51 +00:00
Simon Dardis 299dbd6cd1 [mips][ias] fix li macro when values are negated with ~
The integrated assembler evaluates the expressions such as ~0x80000000 to
0xffffffff7fffffff early in the parsing process. This patch adds compatibility
with gas so that li loads the expected value (0x7fffffff) in those cases. This
only occurs iff all the upper 32bits are set and maintains existing checks by
not truncating the result down to 32 bits if any of the the upper bits are not
set.

Reviewers: dsanders, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D23399

llvm-svn: 283353
2016-10-05 18:26:19 +00:00
Simon Dardis f45a59f80b Recommit: "[mips] Add rsqrt, recip for MIPS"
Add rsqrt.[ds], recip.[ds] for MIPS. Correct the microMIPS definitions for
architecture support and register usage.

Reviewers: vkalintiris, zoran.jovanoic

Differential Review: https://reviews.llvm.org/D24499

llvm-svn: 283334
2016-10-05 16:11:01 +00:00
Hans Wennborg c26c03d911 Revert r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)"
This is suspected to cause a miscompile in Chromium. Reverting while
investigating.

llvm-svn: 283329
2016-10-05 15:39:27 +00:00
Simon Dardis bbfd528748 Revert "[mips] Add rsqrt, recip for MIPS"
This reverts commit r282485 which contain two patches instead of
one.

llvm-svn: 283327
2016-10-05 15:28:33 +00:00
Douglas Katzman 0411e8669b [X86] Don't randomly encode %rip where illegal
Differential Revision: https://reviews.llvm.org/D25112

llvm-svn: 283326
2016-10-05 15:23:35 +00:00
James Molloy b7de497cb9 [Thumb] Don't try and emit LDRH/LDRB from the constant pool
This is not a valid encoding - these instructions cannot do PC-relative addressing.

The underlying problem here is of whitelist in ARMISelDAGToDAG that unwraps ARMISD::Wrappers during addressing-mode selection. This didn't realise TargetConstantPool was actually possible, so didn't handle it.

llvm-svn: 283323
2016-10-05 14:52:13 +00:00
Oren Ben Simhon a2010755fa Test commit permission
llvm-svn: 283318
2016-10-05 13:48:33 +00:00
Dylan McKay afff169f17 [AVR] Don't select 'MOVW' instructions when they are not supported
We have a subtarget feature which we were ignoring, which was causing us
to generate unsupported instructions for some older chips.

llvm-svn: 283317
2016-10-05 13:38:29 +00:00
Dylan McKay 82ef77091c [AVR] Add AVRRegisterInfo::splitReg function
No tests are included just yet - this is used from the pseudo
instruction expander pass, which hasn't been pulled in-tree yet.

llvm-svn: 283316
2016-10-05 13:27:30 +00:00
Dylan McKay ea55554803 [AVR] Update return type of dynamic alloca pass
It was recently changed from 'const char*' to StringRef

llvm-svn: 283312
2016-10-05 12:32:24 +00:00
Dylan McKay 192405a31a [AVR] Add the AVR frame lowering code
Summary: This allows AVR to lower frames into assembly code.

Reviewers: arsenm, kparzysz

Subscribers: japaric, wdng, beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25032

llvm-svn: 283311
2016-10-05 11:48:56 +00:00
Dylan McKay c1760424de [AVR] Split all of the AVR device definitions into a separate file
We have ~500 lines of subtarget feature definitions, they don't belong
in our main TableGen file.

llvm-svn: 283310
2016-10-05 10:28:45 +00:00
Dylan McKay 5af1248230 [AVR] Enable the instruction printer in the target definition
llvm-svn: 283309
2016-10-05 10:23:38 +00:00
Dylan McKay f66e120b3b [AVR] Add definitions for the ATTiny102 and ATtiny104 chips
llvm-svn: 283308
2016-10-05 10:20:33 +00:00
Dylan McKay efe40389c0 [AVR] Add the machine code backend
Summary:
This adds the AVR machine code backend (`AVRAsmBackend.cpp`). This will
allow us to generate machine code from assembled AVR instructions.

Reviewers: arsenm, kparzysz

Subscribers: modocache, japaric, wdng, beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25029

llvm-svn: 283297
2016-10-05 05:30:19 +00:00
Mehdi Amini 5b00770c35 Use StringRef in ARMConstantPool APIs (NFC)
llvm-svn: 283293
2016-10-05 01:41:06 +00:00
Sanjay Patel bfdbea6481 [Target] move reciprocal estimate settings from TargetOptions to TargetLowering
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.

Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.

The ingredients of this patch are:

    Remove the reciprocal estimate command-line debug option.
    Add TargetRecip to TargetLowering.
    Remove TargetRecip from TargetOptions.
    Clean up the TargetRecip implementation to work with this new scheme.
    Set the default reciprocal settings in TargetLoweringBase (everything is off).
    Update the PowerPC defaults, users, and tests.
    Update the x86 defaults, users, and tests.

Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.

Differential Revision: https://reviews.llvm.org/D24816

llvm-svn: 283252
2016-10-04 20:46:43 +00:00
Matthias Braun 46a5238682 AArch64: Macrofusion: Split features, add missing combinations.
AArch64InstrInfo::shouldScheduleAdjacent() determines whether two
instruction can benefit from macroop fusion on apple CPUs. The list
turned out to be incomplete:
- the "rr" variants of the instructions were missing
- even the "rs" variants can have shift value == 0 and behave like the
  "rr" variants

This also splits the MacropFusion target feature into
ArithmeticBccFusion and ArithmeticCbzFusion.

Differential Revision: https://reviews.llvm.org/D25142

llvm-svn: 283243
2016-10-04 19:28:21 +00:00
Nemanja Ivanovic 6354d23555 [Power9] Exploit D-Form VSX Scalar memory ops that target full VSX register set
This patch corresponds to review:

The newly added VSX D-Form (register + offset) memory ops target the upper half
of the VSX register set. The existing ones target the lower half. In order to
unify these and have the ability to target all the VSX registers using D-Form
operations, this patch defines Pseudo-ops for the loads/stores which are
expanded post-RA. The expansion then choses the correct opcode based on the
register that was allocated for the operation.

llvm-svn: 283212
2016-10-04 11:25:52 +00:00
Simon Dardis 86b3a1e79b [mips][fastisel] Consider soft-float an unsupported floating point mode
Treat soft-float as unsupported for fast-isel. Additionally, ensure we check
that lowering f32 arguments also considers the case of soft-float mode.

Reviewers: ehostunreach, vkalintiris, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D24505

llvm-svn: 283209
2016-10-04 10:35:07 +00:00
Sjoerd Meijer 535529b41c Consistent fp denormal mode names. NFC.
This fixes the inconsistency of the fp denormal option names: in LLVM this was
DenormalType, but in Clang this is DenormalMode which seems better.

Differential Revision: https://reviews.llvm.org/D24906

llvm-svn: 283192
2016-10-04 08:03:36 +00:00
Nemanja Ivanovic 11049f8f07 [Power9] Part-word VSX integer scalar loads/stores and sign extend instructions
This patch corresponds to review:
https://reviews.llvm.org/D23155

This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:

    Int to Fp conversions of 1 or 2-byte values loaded from memory
    Building vectors of 1 or 2-byte integers with values loaded from memory
    Storing individual 1 or 2-byte elements from integer vectors

This patch implements all of those uses.

llvm-svn: 283190
2016-10-04 06:59:23 +00:00
Craig Topper ee2d995661 [X86] Add MOV8rm_NOREX to switch in isReallyTriviallyReMaterializable to match MOV8rm.
llvm-svn: 283184
2016-10-04 03:11:44 +00:00
Matt Arsenault dcf0cfca4c AMDGPU: Refactor indirect vector lowering
Allow inserting multiple instructions in the
expanded loop.

llvm-svn: 283177
2016-10-04 01:41:05 +00:00
Matt Arsenault 283fbc24f6 AMDGPU: Factor SGPR spilling into separate functions
llvm-svn: 283175
2016-10-04 01:14:56 +00:00
Dan Gohman e040533ece [WebAssembly] Update to more stack-machine-oriented terminology.
WebAssembly has officially switched from being an AST to being a stack
machine. Update various bits of terminology and README.md entries
accordingly.

llvm-svn: 283154
2016-10-03 22:43:53 +00:00
Dan Gohman ffc184bb1d [WebAssemby] Clean up an obsolete comment.
The comment is present inside the body of GetVRegDef.

llvm-svn: 283153
2016-10-03 22:32:21 +00:00
Matthias Braun 9baa3e80a9 TargetMachine: Make the win32-macho workaround more specific.
This is to avoid problems with win32 + ELF which surprisingly happens a
lot in practice: If a user just specifies -march on the commandline the
object format changes along with the architecture to ELF in many
instances while the OS stays with the default/host OS.

llvm-svn: 283151
2016-10-03 22:12:37 +00:00
Dan Gohman 16fa0d8159 [WebAssembly] Delete an unused function. NFC.
llvm-svn: 283150
2016-10-03 22:06:28 +00:00
Dan Gohman 9850e87784 [WebAssembly] Fix indentation. NFC.
llvm-svn: 283147
2016-10-03 21:33:09 +00:00
Dan Gohman 4b8e8becf6 [WebAssembly] Rename OPERAND_FP32IMM to OPERAND_F32IMM.
WebAssembly documentation consistently says "f32" rather than "fp32" to
describe 32-bit floating-point.

llvm-svn: 283146
2016-10-03 21:31:31 +00:00
Quentin Colombet 3a06701913 [AArch64][RegisterBankInfo] Add getSameKindofOperandsMapping.
Refactor the code so that the same function can be used for all
instructions with all the same operands for up to 3 operands.

This is going to be useful for cast instructions.
NFC.

llvm-svn: 283144
2016-10-03 20:20:13 +00:00
Krzysztof Parzyszek c8b6ecabd8 [RDF] Fix liveness propagation through shadows
Each shadow only represents data flow that is restricted to its reaching
def. Propagating more than that could lead to spurious register liveness,
resulting in extra (incorrectly) block live-ins.

llvm-svn: 283143
2016-10-03 20:17:20 +00:00
Matthias Braun a827ed8891 AArch64Subtarget: Remove unused CPUString field
llvm-svn: 283142
2016-10-03 20:17:02 +00:00
Matthias Braun eccdee9196 X86: Do not produce GOT relocations on windows
Windows has no GOT relocations the way elf/darwin has. Some people use
x86_64-pc-win32-macho to build EFI firmware; Do not produce GOT
relocations for this target.

Differential Revision: https://reviews.llvm.org/D24627

llvm-svn: 283140
2016-10-03 20:11:24 +00:00
Konstantin Zhuravlyov 60a8373778 [AMDGPU] Pass optimization level to SelectionDAGISel
llvm-svn: 283133
2016-10-03 18:47:26 +00:00
Konstantin Zhuravlyov 691e2e020b [AMDGPU] Sign extend AShr when promoting (instead of zero extending)
llvm-svn: 283130
2016-10-03 18:29:01 +00:00
Krzysztof Parzyszek ab26e2dd3b [RDF] Further improve readability of the graph
Print target basic block for a branch.

llvm-svn: 283126
2016-10-03 17:54:33 +00:00
Krzysztof Parzyszek a77fe4eef3 [RDF] Replace RegisterAliasInfo with target-independent code using lane masks
llvm-svn: 283122
2016-10-03 17:14:48 +00:00
Sanjay Patel d27a21874b [x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433

There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?

Differential Revision: https://reviews.llvm.org/D25165
 

llvm-svn: 283119
2016-10-03 16:38:27 +00:00
Matt Arsenault 4a048a44e5 AMDGPU: Fix typo
llvm-svn: 283108
2016-10-03 13:06:58 +00:00
Volkan Keles 1c38681ae6 Add new target hooks for LoadStoreVectorizer
Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D24727

llvm-svn: 283099
2016-10-03 10:31:34 +00:00
Sjoerd Meijer 4dbe73c1ed [ARM] Code size optimisation to lower udiv+urem to udiv+mls instead of a
library call to __aeabi_uidivmod. This is an improved implementation of
r280808, see also D24133, that got reverted because isel was stuck in a loop.
That was caused by the optimisation incorrectly triggering on i64 ints, which
shouldn't happen because there is no 64bit hwdiv support; that put isel's type
legalization and this optimisation in a loop. A native ARM compiler and testing
now shows that this is fixed.

Patch mostly by Pablo Barrio.

Differential Revision: https://reviews.llvm.org/D25077

llvm-svn: 283098
2016-10-03 10:12:32 +00:00
Konstantin Zhuravlyov c9786f0d8a [AMDGPU] Remove unused variables from SIOptimizeExecMasking
Differential Revision: https://reviews.llvm.org/D25110

llvm-svn: 283087
2016-10-03 04:43:22 +00:00
Hal Finkel 530fa5fcc9 [PowerPC] Account for the ELFv2 function prologue during branch selection
The PPC branch-selection pass, which performs branch relaxation, needs to
account for the padding that might be introduced to satisfy block alignment
requirements. We were assuming that the first block was at offset zero (i.e.
had the alignment of the function itself), but under the ELFv2 ABI, a global
entry function prologue is added to the first block, and it is a
two-instruction sequence (i.e. eight-bytes long). If the function has 16-byte
alignment, the fact that the first block is eight bytes offset from the start
of the function is relevant to calculating where padding will be added in
between later blocks.

Unfortunately, I don't have a small test case.

llvm-svn: 283086
2016-10-03 04:06:44 +00:00
Craig Topper eab23d3bc4 [AVX-512] Remove isCheapAsAMove flag from VMOVAPSZ128rm_NOVLX and friends.
This was accidentally copy and pasted from other Pseudos in the file.

llvm-svn: 283084
2016-10-03 02:22:33 +00:00
Craig Topper 4e7b888ea4 [X86] Mark all sizes of (V)MOVUPD as trivially rematerializable.
I don't know for sure that we truly needs this, but its the only vector load that isn't rematerializable. Making it consistent allows it to not be a special case in the td files.

llvm-svn: 283083
2016-10-03 02:00:29 +00:00
Simon Pilgrim a8d2168cb0 [X86][AVX2] Add support for combining target shuffles to VPERMD/VPERMPS
llvm-svn: 283080
2016-10-02 21:07:58 +00:00
Simon Pilgrim 03afbe783d [X86][AVX] Ensure broadcast loads respect dependencies
To allow broadcast loads of a non-zero'th vector element, lowerVectorShuffleAsBroadcast can replace a load with a new load with an adjusted address, but unfortunately we weren't ensuring that the new load respected the same dependencies.

This patch adds a TokenFactor and updates all dependencies of the old load to reference the new load instead.

Bug found during internal testing.

Differential Revision: https://reviews.llvm.org/D25039

llvm-svn: 283070
2016-10-02 15:59:15 +00:00
Craig Topper 46413af7f7 [X86] Don't set i64 ADDC/ADDE/SUBC/SUBE as Custom if the target isn't 64-bit. This way we don't have to catch them and do nothing with them in ReplaceNodeResults.
llvm-svn: 283066
2016-10-02 06:13:43 +00:00
Craig Topper 68c08931fc [X86] Fix indentation. NFC
llvm-svn: 283065
2016-10-02 06:13:40 +00:00
Hal Finkel a9321059b9 [PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.

Fixes PR26970.

llvm-svn: 283060
2016-10-02 02:10:20 +00:00
Simon Pilgrim 630dd6ff02 [X86][SSE] Cleaned up shuffle decode assertion messages
llvm-svn: 283050
2016-10-01 20:12:56 +00:00
Simon Pilgrim 5b0c15ddf7 Fix signed/unsigned warning
llvm-svn: 283041
2016-10-01 16:14:57 +00:00
Simon Pilgrim 1638d49f20 [X86][SSE] Add support for combining target shuffles to binary BLEND
We already had support for 1-input BLEND with zero - this adds support for 2-input BLEND as well.

llvm-svn: 283040
2016-10-01 16:04:28 +00:00
Simon Pilgrim ae17cf20ce [X86][SSE] Always combine target shuffles to MOVSD/MOVSS
Now we can commute to BLENDPD/BLENDPS on SSE41+ targets if necessary, so simplify the combine matching where we can.

This required me to add a couple of scalar math movsd/moss fold patterns that hadn't been needed in the past.

llvm-svn: 283038
2016-10-01 15:33:01 +00:00
Simon Pilgrim ccdd1ff49b [X86][SSE] Enable commutation from MOVSD/MOVSS to BLENDPD/BLENDPS on SSE41+ targets
Instead of selecting between MOVSD/MOVSS and BLENDPD/BLENDPS at shuffle lowering by subtarget this will help us select the instruction based on actual commutation requirements.

We could possibly add BLENDPD/BLENDPS -> MOVSD/MOVSS commutation and MOVSD/MOVSS memory folding using a similar approach if it proves useful

I avoided adding AVX512 handling as I'm not sure when we should be making use of VBLENDPD/VBLENDPS on EVEX targets

llvm-svn: 283037
2016-10-01 14:26:11 +00:00
Craig Topper 5eb5ade894 [X86] Cleanup patterns for using VMOVDDUP for broadcasts.
-Remove OptForSize. Not all of the backend follows the same rules for creating broadcasts and there is no conflicting pattern.
-Don't stop selecting VEX VMOVDDUP when AVX512 is supported. We need VLX for EVEX VMOVDDUP.
-Only use VMOVDDUP for v2i64 broadcasts if AVX2 is not supported.

llvm-svn: 283020
2016-10-01 07:11:24 +00:00
Mehdi Amini 9af9a9d5f9 Revert "Use StringRef instead of raw pointer in TargetRegistry API (NFC)"
This reverts commit r283017. Creates an infinite loop somehow.

llvm-svn: 283019
2016-10-01 07:08:23 +00:00
Mehdi Amini 36d33fc109 Use StringRef instead of raw pointers in MCAsmInfo/MCInstrInfo APIs (NFC)
llvm-svn: 283018
2016-10-01 06:46:33 +00:00
Mehdi Amini cd354a659b Use StringRef instead of raw pointer in TargetRegistry API (NFC)
llvm-svn: 283017
2016-10-01 06:25:30 +00:00
Craig Topper be351eea0c [AVX-512] Add EVEX versions of VPBROADCASTW patterns with truncated i32 loads.
llvm-svn: 283015
2016-10-01 06:01:23 +00:00
Mehdi Amini 48878ae579 Use StringRef in Datalayout API (NFC)
llvm-svn: 283013
2016-10-01 05:57:55 +00:00
Mehdi Amini 217b246484 Revert "Use StringRef in Datalayout API (NFC)"
This reverts commit r283009. Bots are broken.

llvm-svn: 283011
2016-10-01 05:12:48 +00:00
Mehdi Amini 29baf9c0e1 Use StringRef in Datalayout API (NFC)
llvm-svn: 283009
2016-10-01 04:17:59 +00:00
Mehdi Amini 117296c0a0 Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
2016-10-01 02:56:57 +00:00
Mehdi Amini 86eeda8e20 Revert "AMDGPU: Don't use offen if it is 0"
This reverts commit r282999.
Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038

llvm-svn: 283003
2016-10-01 02:35:24 +00:00
Eric Christopher 98983d0aff Remove TargetTriple from AArch64MCInstLower as it's used in few places
and can be pulled from the TargetMachine. NFC.

llvm-svn: 283000
2016-10-01 01:50:25 +00:00
Matt Arsenault 3070fdf798 AMDGPU: Don't use offen if it is 0
This removes many re-initializations of a base register to 0.

llvm-svn: 282999
2016-10-01 01:37:15 +00:00
Quentin Colombet a6119958ff [AArch64][RegisterBankInfo] Use the helper functions for the checks
This makes sure the helper functions work as expected.

NFC.

llvm-svn: 282961
2016-09-30 21:46:21 +00:00
Quentin Colombet 7c3fa8e361 [AArch64][RegisterBankInfo] Rename getValueMappingIdx to getValueMapping
We don't return index, we return the actual ValueMapping.

NFC.

llvm-svn: 282960
2016-09-30 21:46:19 +00:00
Quentin Colombet b4afac7b32 [AArch64][RegisterBankInfo] Compress the ValueMapping table a bit.
We don't need to have singleton ValueMapping on their own, we can just
reuse one of the elements of the 3-ops mapping.
This allows even more code sharing.

NFC.

llvm-svn: 282959
2016-09-30 21:46:17 +00:00
Quentin Colombet 7fc5fe41c5 [AArch64][RegisterBankInfo] Refactor the code to access AArch64::ValMapping
Use a helper function to access ValMapping. This should make the code
easier to understand and maintain.

NFC.

llvm-svn: 282958
2016-09-30 21:46:15 +00:00
Quentin Colombet 15dc25bb3d [AArch64][RegisterBankInfo] Rename getRegBankIdx to getRegBankIdxOffset
The function name did not make it clear that the returned value was an
offset to apply to a register bank index.

NFC.

llvm-svn: 282957
2016-09-30 21:46:12 +00:00
Quentin Colombet b2308987ab [AArch64][RegisterBankInfo] Use the static opds mapping for alt mappings
Avoid to rely on the dynamically allocated operands mapping for the
alternative mapping.
NFC.

llvm-svn: 282956
2016-09-30 21:45:56 +00:00
Hans Wennborg b5643b47b6 X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)
We can't use Jcc to leave a Win64 function in general, because that
confuses the unwinder. However, for "leaf" functions, that is, functions
where the return address is always on top of the stack and which don't
have unwind info, it's OK.

Differential Revision: https://reviews.llvm.org/D24836

llvm-svn: 282920
2016-09-30 20:07:35 +00:00
Derek Schuff e9e6891b2d [WebAssembly] Make register stackification more conservative
Register stackification currently checks VNInfo for changes. Make that
more accurate by testing each intervening instruction for any other defs
to the same virtual register.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D24942

llvm-svn: 282886
2016-09-30 18:02:54 +00:00
Konstantin Zhuravlyov 836cbff695 [AMDGPU] Choose VMCNT, EXPCNT, LGKMCNT masks and shifts based on the isa version
Differential Revision: https://reviews.llvm.org/D24973

llvm-svn: 282877
2016-09-30 17:01:40 +00:00
Konstantin Zhuravlyov d7bdf24f32 [AMDGPU] Ask subtarget if waitcnt instruction is needed before barrier instruction
Differential Revision: https://reviews.llvm.org/D24985

llvm-svn: 282875
2016-09-30 16:50:36 +00:00
Konstantin Zhuravlyov 4658e5f7b0 [AMDGPU] Do not run scalar optimization passes at "-O0"
Differential Revision: https://reviews.llvm.org/D25055

llvm-svn: 282873
2016-09-30 16:39:24 +00:00
Dylan McKay 4a25499b13 [AVR] Add the ELF object file writer
Summary: This adds the ELF32 writer for AVR.

Reviewers: kparzysz

Subscribers: beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25031

llvm-svn: 282856
2016-09-30 14:09:20 +00:00