Commit Graph

96564 Commits

Author SHA1 Message Date
Joerg Sonnenberger bef3621ad0 Create the virtual register for the global base in the intersection of
GPRC and GPRC_NOR0 (or the 64bit equivalent) and not just the latter.
GPRC_NOR0 contains ZERO as alternative meaning of r0 and is therefore
not a true subclass of GPRC.

llvm-svn: 285813
2016-11-02 15:00:31 +00:00
Aaron Ballman 3ac3a7efff Removing a switch statement that contains a default label, but no case labels. Silences an MSVC warning; NFC.
llvm-svn: 285806
2016-11-02 13:58:57 +00:00
Joerg Sonnenberger 5e31b3ad93 Simplify.
llvm-svn: 285802
2016-11-02 12:45:28 +00:00
Ulrich Weigand 75d2f1b10d [SystemZ] Fix compiler warnings introduced by r285574
SystemZAsmParser::parseOperand returns a bool, not an enum.

llvm-svn: 285800
2016-11-02 11:32:28 +00:00
Kirill Bobyrev 1f1751182e [llvm] FIx if-clause -Wmisleading-indentation issue.
While bootstrapping Clang with recent `gcc 6.2.0` I found a bug related to misleading indentation.

I believe, a pair of `{}` was forgotten, especially given the above similar piece of code:

```
      if (!RDef || !HII->isPredicable(*RDef)) {
        Done = coalesceRegisters(RD, RegisterRef(S1));
        if (Done) {
          UpdRegs.insert(RD.Reg);
          UpdRegs.insert(S1.getReg());
        }
      }
```

Reviewers: kparzysz

Differential Revision: https://reviews.llvm.org/D26204

llvm-svn: 285794
2016-11-02 10:00:40 +00:00
Bjorn Pettersson 7424c8ccd1 [Reassociate] Skip analysis of dead code to avoid infinite loop.
Summary:
It was detected that the reassociate pass could enter an inifite
loop when analysing dead code. Simply skipping to analyse basic
blocks that are dead avoids such problems (and as a side effect
we avoid spending time on optimising dead code).

The solution is using the same Reverse Post Order ordering of the
basic blocks when doing the optimisations, as when building the
precalculated rank map. A nice side-effect of this solution is
that we now know that we only try to do optimisations for blocks
with ranked instructions.

Fixes https://llvm.org/bugs/show_bug.cgi?id=30818

Reviewers: llvm-commits, davide, eli.friedman, mehdi_amini

Subscribers: dberlin

Differential Revision: https://reviews.llvm.org/D26154

llvm-svn: 285793
2016-11-02 08:55:19 +00:00
Dylan McKay 7549b0a013 [AVR] Add instruction selection lowering code
Summary: This adds AVRISelLowering.cpp

Reviewers: arsenm, kparzysz

Subscribers: llvm-commits, modocache, japaric, wdng, beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25034

llvm-svn: 285790
2016-11-02 06:47:40 +00:00
Peter Collingbourne ff2c2ec6b2 Bitcode: Check file size before reading bitcode header.
Should unbreak ocaml binding tests.

Also added an llvm-dis test that checks for the same thing.

llvm-svn: 285777
2016-11-02 00:39:11 +00:00
Peter Collingbourne 4e76019e34 Support: Remove MemoryObject and DataStreamer interfaces.
These interfaces are no longer used.

Differential Revision: https://reviews.llvm.org/D26222

llvm-svn: 285774
2016-11-02 00:08:37 +00:00
Peter Collingbourne 028eb5a3f8 Bitcode: Change reader interface to take memory buffers.
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106595.html

This change also fixes an API oddity where BitstreamCursor::Read() would
return zero for the first read past the end of the bitstream, but would
report_fatal_error for subsequent reads. Now we always report_fatal_error
for all reads past the end. Updated clients to check for the end of the
bitstream before reading from it.

I also needed to add padding to the invalid bitcode tests in
test/Bitcode/. This is because the streaming interface was not checking that
the file size is a multiple of 4.

Differential Revision: https://reviews.llvm.org/D26219

llvm-svn: 285773
2016-11-02 00:08:19 +00:00
Alex Bradbury 6b2cca7f8f [RISCV] Add bare-bones RISC-V MCTargetDesc
This is enough to compile and link but doesn't yet do anything particularly 
useful. Once an ASM parser and printer are added in the next two patches, the 
whole thing can be usefully tested.

Differential Revision: https://reviews.llvm.org/D23562

llvm-svn: 285770
2016-11-01 23:47:30 +00:00
Alex Bradbury 24d9b13b36 [RISCV 4/10] Add basic RISCV{InstrFormats,InstrInfo,RegisterInfo,}.td
For now, only add instruction definitions for basic ALU operations. Our 
initial target is a working MC layer rather than codegen, so appropriate 
SelectionDAG patterns will come later.

Differential Revision: https://reviews.llvm.org/D23561

llvm-svn: 285769
2016-11-01 23:40:28 +00:00
Matt Arsenault c507cdb4bc AMDGPU: Handle CopyToReg in getOperandRegClass
llvm-svn: 285768
2016-11-01 23:22:17 +00:00
Matt Arsenault 663ab8c119 AMDGPU: Use brev for materializing SGPR constants
This is already done with VGPR immediates and saves 4 bytes.

llvm-svn: 285765
2016-11-01 23:14:20 +00:00
Matt Arsenault 3d463193a9 AMDGPU: Default to using scalar mov to materialize immediate
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.

Also start verifying the register classes of inlineasm operands.

llvm-svn: 285762
2016-11-01 22:55:07 +00:00
Eric Christopher 690f8e587e Move the initialization of PreferredLoopExit into runOnMachineFunction to be near the other function specific initializations.
llvm-svn: 285758
2016-11-01 22:15:50 +00:00
Sam McCall 2a36eee4ca Fix uninitialized access in MachineBlockPlacement.
Summary:
Currently PreferredLoopExit is set only in buildLoopChains, which is
never called if there are no MachineLoops.

MSan is currently broken by this:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/145/steps/check-llvm%20msan/logs/stdio

This is a naive fix to get things green again. iteratee: you may have a better fix.

This change will also mean PreferredLoopExit will not carry over if
buildCFGChains() is called a second time in runOnMachineFunction, this
appears to be the right thing.

Reviewers: bkramer, iteratee, echristo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26069

llvm-svn: 285757
2016-11-01 22:02:14 +00:00
Matt Arsenault a6319b82ca AMDGPU: Stop creating unused virtual registers
These are only used in the spill to VMEM path. Move them to
the one use.

llvm-svn: 285756
2016-11-01 21:58:07 +00:00
George Burgess IV 66837aba0a [MemorySSA] Tighten up types to make our API prettier. NFC.
Patch by bryant.

Differential Revision: https://reviews.llvm.org/D26126

llvm-svn: 285750
2016-11-01 21:17:46 +00:00
Sanjay Patel 9840cdad4c [ValueTracking] remove TODO comment; NFC
InstCombine should always canonicalize patterns like the one shown in the comment
when visiting 'select' insts in adjustMinMax().

Scalars were already handled there, and vector splats are handled after:
https://reviews.llvm.org/rL285732

llvm-svn: 285744
2016-11-01 20:43:00 +00:00
Matt Arsenault 2d8c289b4b AMDGPU: Workaround for instruction size with literals
Instructions with a 32-bit base encoding with an optional
32-bit literal encoded after them report their size as 4
for the disassembler. Consider these when computing the
MachineInstr size. This fixes problems caused by size estimate
consistency in BranchRelaxation.

llvm-svn: 285743
2016-11-01 20:42:24 +00:00
Sanjay Patel c3d89842ad [InstCombine] allow splat vector folds in adjustMinMax()
llvm-svn: 285732
2016-11-01 20:08:02 +00:00
Sanjay Patel c0339c77ef [InstCombine] Fold nuw left-shifts in `ugt`/`ule` comparisons.
This transforms

%a = shl nuw %x, c1
%b = icmp {ugt|ule} %a, c0

into

%b = icmp {ugt|ule} %x, (c0 >> c1)

z3:

(declare-const x (_ BitVec 64))
(declare-const c0 (_ BitVec 64))
(declare-const c1 (_ BitVec 64))

(push)
(assert (= x (bvlshr (bvshl x c1) c1)))  ; nuw
(assert (not (= (bvugt (bvshl x c1) c0)
                (bvugt x
                       (bvlshr c0 c1)))))
(check-sat)
(get-model)
(pop)

(push)
(assert (= x (bvlshr (bvshl x c1) c1)))  ; nuw
(assert (not (= (bvule (bvshl x c1) c0)
                (bvule x
                       (bvlshr c0 c1)))))
(check-sat)
(get-model)
(pop)

Patch by bryant!

Differential Revision: https://reviews.llvm.org/D25913

llvm-svn: 285729
2016-11-01 19:19:29 +00:00
Krzysztof Parzyszek 654dc11b79 [Hexagon] Rename operand/predicate names for unshifted integers
For example, rename s6Ext to s6_0Ext. The names for shifted integers
include the underscore and this will make the naming consistent. It
also exposed a few duplicates that were removed.

llvm-svn: 285728
2016-11-01 19:02:10 +00:00
Matt Arsenault cb578f84e0 BranchRelaxation: Expand unconditional branches first
It's likely if a conditional branch needs to be expanded, the following
unconditional branch will also need expansion. By expanding the
unconditional branch first, the conditional branch can be simply
inverted to jump over the inserted indirect branch block. If the
conditional branch is expanded first, it results in an additional
branch.

This avoids test regressions in future commits.

llvm-svn: 285722
2016-11-01 18:34:00 +00:00
Sanjay Patel 644d7c3b8a [InstCombine] clean up adjustMinMax(); NFCI
1. Change param names for readability
2. Change pointer param to ref
3. Early exit to reduce indent
4. Change switch to if/else

llvm-svn: 285718
2016-11-01 18:15:03 +00:00
Konstantin Zhuravlyov d971a1123f [AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32
This will prevent following regression when enabling i16 support (D18049):

test/CodeGen/AMDGPU/ctlz.ll
test/CodeGen/AMDGPU/ctlz_zero_undef.ll

Differential Revision: https://reviews.llvm.org/D25802

llvm-svn: 285716
2016-11-01 17:49:33 +00:00
Sanjay Patel 7ce658388b [InstCombine] add helper function for adjustMinMax(); NFCI
This is just a cut and paste; clean-up and enhancements to follow.

llvm-svn: 285715
2016-11-01 17:46:08 +00:00
Alex Bradbury b2e5472d85 [RISCV] Add stub backend
This contains just enough for lib/Target/RISCV to compile. Notably a basic 
RISCVTargetMachine and RISCVTargetInfo. At this point you can attempt llc 
-march=riscv32 myinput.ll and will find it fails due to the lack of 
MCAsmInfo.

See http://lists.llvm.org/pipermail/llvm-dev/2016-August/103748.html for 
further discussion

Differential Revision: https://reviews.llvm.org/D23560

llvm-svn: 285712
2016-11-01 17:27:54 +00:00
Tom Stellard 9677b60288 AMDGPU: Fix buildbots broken by r285704
llvm-svn: 285711
2016-11-01 17:20:03 +00:00
Alex Bradbury 1524f62b97 [RISCV] Add RISC-V ELF defines
Add the necessary definitions for RISC-V ELF files, including relocs. Also 
make necessary trivial change to ELFYaml, llvm-objdump, and llvm-readobj in 
order to work with RISC-V ELFs.

Differential Revision: https://reviews.llvm.org/D23557

llvm-svn: 285708
2016-11-01 16:59:37 +00:00
Alex Bradbury b6e784a240 [RISCV] Recognise riscv32 and riscv64 in triple parsing code
This is the first in a series of 10 initial patches that incrementally add an 
MC layer for RISC-V to LLVM. See 
<http://lists.llvm.org/pipermail/llvm-dev/2016-August/103748.html> for more 
discussion.

Differential Revision: https://reviews.llvm.org/D23557

llvm-svn: 285707
2016-11-01 16:47:54 +00:00
Alex Bradbury 58eba09949 [TableGen] Move OperandMatchResultTy enum to MCTargetAsmParser.h
As it stands, the OperandMatchResultTy is only included in the generated
header if there is custom operand parsing. However, almost all backends
make use of MatchOperand_Success and friends from OperandMatchResultTy for
e.g. parseRegister. This is a pain when starting an AsmParser for a new
backend that doesn't yet have custom operand parsing. Move the enum to
MCTargetAsmParser.h.

This patch is a prerequisite for D23563

Differential Revision: https://reviews.llvm.org/D23496

llvm-svn: 285705
2016-11-01 16:32:05 +00:00
Tom Stellard 94c21bc088 AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64
I wanted to implement this as a target independent expansion, however when
targets say they want to expand FP_TO_FP16 what they actually want is
the unsafe math expansion when possible and expansion to a libcall in all
other cases.

The only way to make this work as a target independent would be to add logic
to target's TargetLowering construction to mark theses nodes as Expand when
LegalizeDAG can use the unsafe expansion and mark them as LibCall when it
cannot.  I think this would be possible, but I think it would be too fragile
and complex as it would require targets to keep their expansion logic up
to date with the code in LegalizeDAG.

Reviewers: bogner, ab, t.p.northover, arsenm

Subscribers: wdng, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25999

llvm-svn: 285704
2016-11-01 16:31:48 +00:00
Simon Pilgrim 6dd8fab443 [InstCombine] Folding of shifts by the sum of positive values
This patch introduces the combine:

(C1 shift (A add C2)) -> ((C1 shift C2) shift A)
iff A and C2 are both positive

If both A and C2 are know to be positive then we can safely split into 2 shifts, permitting the folding of the Inner shift.

Fix for the spec benchmark case mentioned by @nadav on PR15141 (assuming we can prove that the inputs as positive).

Differential Revision: https://reviews.llvm.org/D26000

llvm-svn: 285696
2016-11-01 15:40:30 +00:00
James Molloy 70a3d6df52 [Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables
[Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment]

The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.

It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.

TBB example:
Before: lsls r0, r0, #2    After: add  r0, pc
        adr  r1, .LJTI0_0         ldrb r0, [r0, #6]
        ldr  r0, [r0, r1]         lsls r0, r0, #1
        mov  pc, r0               add  pc, r0
  => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.

The only case that can increase dynamic instruction count is the TBH case:

Before: lsls r0, r4, #2    After: lsls r4, r4, #1
        adr  r1, .LJTI0_0         add  r4, pc
        ldr  r0, [r0, r1]         ldrh r4, [r4, #6]
        mov  pc, r0               lsls r4, r4, #1
                                  add  pc, r4
  => 1 more instruction in prologue. Jump table shrunk by a factor of 2.

So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)

llvm-svn: 285690
2016-11-01 13:37:41 +00:00
Valery Pykhtin 8a89d3662a [AMDGPU] Expand vector mulhu/mulhs
Differential revision: https://reviews.llvm.org/D26077

llvm-svn: 285684
2016-11-01 10:26:48 +00:00
Nemanja Ivanovic e70fa63390 [PowerPC] Implement vector shift builtins - llvm portion
This patch corresponds to review https://reviews.llvm.org/D26095.
Committing on behalf of Tony Jiang.

llvm-svn: 285681
2016-11-01 09:42:32 +00:00
Serge Pavlov 6ac8e034f6 Allow resolving response file names relative to including file
If a response file included by construct @file itself includes a response file
and that file is specified by relative file name, current behavior is to resolve
the name relative to the current working directory. The change adds additional
flag to ExpandResponseFiles that may be used to resolve nested response file
names relative to including file. With the new mode a set of related response
files may be kept together and reference each other with short position
independent names.

Differential Revision: https://reviews.llvm.org/D24917

llvm-svn: 285675
2016-11-01 06:53:29 +00:00
Sanjoy Das 9c729067e9 [TBAA] Use wrapper objects instead of raw getOperand s; NFC
This is intended to make the semantic intent clearer.

The wrapper objects are now generic to avoid `const_cast` s.  Since
`const` ness is part of the API of `MDNode::getMostGenericTBAA` (and
therefore I can't make things `const` all the way through without some
code churn outside TypeBasedAliasAnalysis.cpp), this seemed like the
cleanest solution.

llvm-svn: 285665
2016-11-01 02:58:30 +00:00
Sanjoy Das a1a77f3a21 [TBAA] Rename accessors to be more idiomatic; NFC
llvm-svn: 285661
2016-11-01 01:21:57 +00:00
Peter Collingbourne d3a6c70b2d Bitcode: Simplify BitstreamWriter::EnterBlockInfoBlock() interface.
No block info block should need to define local abbreviations, so we can
always use a code width of 2.

Also change all block info block writers to use EnterBlockInfoBlock.

Differential Revision: https://reviews.llvm.org/D26168

llvm-svn: 285660
2016-11-01 01:18:57 +00:00
Matt Arsenault f3dd863031 AMDGPU: Whitespace fixes
llvm-svn: 285659
2016-11-01 00:55:14 +00:00
Sanjay Patel 70c5f02d25 [DAG] disable nsw/nuw for add/sub/mul when simplifying based on demanded bits (PR30841)
This bug was exposed by using nsw/nuw for more aggressive folds in:
https://reviews.llvm.org/rL284844

The changes mimic the IR demanded bits logic in InstCombiner::SimplifyDemandedUseBits(),
but we can't just flip flag bits in the DAG; we have to create a new node that has the
bits cleared.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30841 

llvm-svn: 285656
2016-10-31 23:28:45 +00:00
Davide Italiano 51cbe13a3f [Hexagon] Garbage collect dead code.
llvm-svn: 285654
2016-10-31 22:56:56 +00:00
Evgeniy Stepanov 1bd9fc7098 Fix a typo.
Found with PVS-Studio here: http://www.viva64.com/en/b/0446/

llvm-svn: 285652
2016-10-31 22:42:39 +00:00
Saleem Abdulrasool e1aa782bd0 CodeGen: further loosen -O0 CG for WoA division
Generate the slowest possible codepath for noopt CodeGen.  Even trying to be
clever with the negated jump can cause out-of-range jumps.  Use a wide branch
instead. Although the code is modelled simplistically, the later optimizations
would recombine the branching into `cbz` if possible.  This re-enables the
previous optimization as well as hopefully gives us working code in all cases.

Addresses PR30356!

llvm-svn: 285649
2016-10-31 22:12:37 +00:00
Teresa Johnson 002af9bbce [ThinLTO] Disable importing and other cross-module optis at -O0
Summary:
There is no point to importing at -O0, since we won't inline. We should
also disable other cross-module optimizations.

(Plan to backport this fix to the 3.9 branch to fix PR30774)

Reviewers: pcc

Subscribers: johanengelen, mehdi_amini

Differential Revision: https://reviews.llvm.org/D25918

llvm-svn: 285648
2016-10-31 22:12:21 +00:00
Justin Lebar ed1e312f05 [NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.
Summary:
This has been replaced by the NVPTXInferAddressSpaces pass.  We've had
the new one as the default with the old one accessible via a flag for
some months now, and we've had no problems.

Reviewers: tra

Subscribers: llvm-commits, jholewinski, jingyue, mgorny

Differential Revision: https://reviews.llvm.org/D26165

llvm-svn: 285642
2016-10-31 21:51:42 +00:00
Kevin Enderby d503940e8f More additional error checks for invalid Mach-O files when
the offsets and sizes of an element of the file overlaps with
another element in the Mach-O file.

This shows the approach to this testing for three elements
and contains for tests for their overlap.  Checking for all the
remain elements will be added next.

llvm-svn: 285632
2016-10-31 20:29:48 +00:00
Nemanja Ivanovic 60bdfe5a7c [PPC] add absolute difference altivec instructions and matching intrinsics
This patch corresponds to review https://reviews.llvm.org/D26072.
Committing on behalf of Sean Fertile.

llvm-svn: 285627
2016-10-31 19:47:52 +00:00
Victor Leschuk e1156c2eb0 DebugInfo: make DW_TAG_atomic_type valid
DW_TAG_atomic_type was already included in Dwarf.defs and emitted correctly,
however Verifier didn't recognize it as valid.
Thus we introduce the following changes:

  * Make DW_TAG_atomic_type valid tag for IR and DWARF (enabled only with -gdwarf-5)
  * Add it to related docs
  * Add DebugInfo tests

Differential Revision: https://reviews.llvm.org/D26144

llvm-svn: 285624
2016-10-31 19:09:38 +00:00
Kuba Brecka a28c9e8f09 [asan] Move instrumented null-terminated strings to a special section, LLVM part
On Darwin, simple C null-terminated constant strings normally end up in the __TEXT,__cstring section of the resulting Mach-O binary. When instrumented with ASan, these strings are transformed in a way that they cannot be in __cstring (the linker unifies the content of this section and strips extra NUL bytes, which would break instrumentation), and are put into a generic __const section. This breaks some of the tools that we have: Some tools need to scan all C null-terminated strings in Mach-O binaries, and scanning all the contents of __const has a large performance penalty. This patch instead introduces a special section, __asan_cstring which will now hold the instrumented null-terminated strings.

Differential Revision: https://reviews.llvm.org/D25026

llvm-svn: 285619
2016-10-31 18:51:58 +00:00
Tim Northover 037af52c8b GlobalISel: allow truncating pointer casts on AArch64.
llvm-svn: 285615
2016-10-31 18:31:09 +00:00
Tim Northover cdf23f1d93 GlobalISel: translate stack protector intrinsics
llvm-svn: 285614
2016-10-31 18:30:59 +00:00
Rui Ueyama ddc79225c3 Define DbiStreamBuilder::addSectionMap.
This change enables LLD to construct a Section Map stream in a PDB file.
I do not understand all these fields in the Section Map yet, but it seems
like a copy of a COFF section header in another format.

With this patch, DbiStreamBuilder can emit a Section Map which
llvm-pdbdump can dump.

Differential Revision: https://reviews.llvm.org/D26112

llvm-svn: 285606
2016-10-31 17:38:56 +00:00
Greg Clayton cddab279f6 Modify DWARFFormValue to remember the DWARFUnit that it was decoded with.
Modifying DWARFFormValue to remember the DWARFUnit that it was encoded with can simplify the usage of instances of this class. Previously users would have to try and pass in the same DWARFUnit that was used to decode the form value and there was a possibility that a different DWARFUnit might be supplied to the functions that extract values (strings, CU relative references, addresses) and cause problems. This fixes this potential issue by storing the DWARFUnit inside the DWARFFormValue so that this mistake can't be made. Instances of DWARFFormValue are not stored permanently and are used as temporary values, so the increase in size of an instance of DWARFFormValue isn't a big deal. This makes decoding form values more bullet proof and is a change that will be used by future modifications.

https://reviews.llvm.org/D26052

llvm-svn: 285594
2016-10-31 16:46:02 +00:00
Michael Zuckerman 68a5c53616 [x86][inline-asm][AVX512][llvm][PART-2]
Introducing "k" and "Yk" constraints for extended inline assembly, enabling use of AVX512 masked vectorized instructions.

Commit on behalf of mharoush

Extending inline assembly support, compatible with GCC as folowing:
"k" constraint hints the compiler to select any of AVX512 k0-k7 registers.
"Yk" constraint is a subset of "k" excluding k0 which is not allowd to be used as a mask.

Reviewer: 1. rnk

Differential Revision: https://reviews.llvm.org/D25062

llvm-svn: 285591
2016-10-31 16:19:58 +00:00
Artem Tamazov 54bfd548aa [AMDGPU][MC][gfx8] Support 20-bit immediate offset in SMEM instructions.
Fixes Bug 30808.
Note that passing subtarget information to predicates seems too complicated, so gfx8-specific def smrd_offset_20 introduced.
Old gfx6/7-specific def renamed to smrd_offset_8 for clarity.
Lit tests updated.

Differential Revision: https://reviews.llvm.org/D26085

llvm-svn: 285590
2016-10-31 16:07:39 +00:00
Krzysztof Parzyszek 22586dcb2a [Hexagon] Don't expand mux instructions with both sources identical
llvm-svn: 285588
2016-10-31 15:45:09 +00:00
Ulrich Weigand 2e5e51b3f3 [SystemZ] Rework processor feature definitions and add -mcpu=archX support
This patch implements two changes:

- Move processor feature definition into a new file SystemZFeatures.td,
  and provide explicit lists of supported and unsupported features for
  each level of the z/Architecture.  This allows specifying unsupported
  features in the scheduler definition files for each processor.

- Add optional aliases for the -mcpu processor names according to the
  level of the z/Architecture, for compatibility with other compilers
  on the platform.  The supported aliases are:
    -mcpu=arch8  equals  -mcpu=z10
    -mcpu=arch9  equals  -mcpu=z196
    -mcpu=arch10 equals  -mcpu=zEC12
    -mcpu=arch11 equals  -mcpu=z13

llvm-svn: 285577
2016-10-31 14:33:29 +00:00
Ulrich Weigand d28be373d4 [SystemZ] Guard LEFR/LFER with FeatureVector
The LEFR/LFER pseudos are aliases for vector instructions and should
therefore be guared by FeatureVector.  If they aren't, the TableGen
scheduler definition checking might complain that there is no data
for those pseudos for pre-z13 machines.

No functional change intended. 

llvm-svn: 285576
2016-10-31 14:28:43 +00:00
Ulrich Weigand d9001301d9 [SystemZ] Correctly diagnose missing features in AsmParser
Currently, when using an instruction that is not supported on the
currently selected architecture, the LLVM assembler is likely to
diagnose an "invalid operand" instead of a "missing feature".

This is because many operands require a custom parser in order to
be processed correctly, and if an instruction is not available
according to the current feature set, the generated parser code
will also not detect the associated custom operand parsers.

Fixed by temporarily enabling all features while parsing operands.
The missing features will then be correctly detected when actually
parsing the instruction itself.

llvm-svn: 285575
2016-10-31 14:25:05 +00:00
Ulrich Weigand ec5d779eb8 [SystemZ] Fix encoding of MVCK and .insn ss
LLVM currently treats the first operand of MVCK as if it were a
regular base+index+displacement address.  However, it is in fact
a base+displacement combined with a length register field.

While the two might look syntactically similar, there are two
semantic differences:
- %r0 is a valid length register, even though it cannot be used
  as an index register.
- In an expression with just a single register like 0(%rX), the
  register is treated as base with normal addresses, while it is
  treated as the length register (with an empty base) for MVCK.

Fixed by adding a new operand parser class BDRAddr and reworking
the assembler parser to distinguish between address + length
register operands and regular addresses.

llvm-svn: 285574
2016-10-31 14:21:36 +00:00
Dorit Nuzman bf2c15b5dc Second attempt at r285517.
llvm-svn: 285568
2016-10-31 13:17:31 +00:00
Jonas Paulsson 6788ddeac9 [SystemZ] Model 2 VBU units (not 1) in SystemZScheduleZ13.td.
NFC.

Review: Ulrich Weigand.
llvm-svn: 285566
2016-10-31 13:05:48 +00:00
Alexey Bataev d07c731d86 Improved cost model for FDIV and FSQRT, by Andrew Tischenko
There is a bug describing poor cost model for floating point operations:
Bug 29083 - [X86][SSE] Improve costs for floating point operations. This
patch is the second one in series of patches dealing with cost model.

Differential Revision: https://reviews.llvm.org/D25722

llvm-svn: 285564
2016-10-31 12:10:53 +00:00
Craig Topper d4e580705d [AVX-512] Add missing patterns for selecting masked vector extracts that started from shuffles.
llvm-svn: 285546
2016-10-31 05:55:57 +00:00
Sanjoy Das 1707869db5 [SCEV] Try to order n-ary expressions in CompareValueComplexity
llvm-svn: 285535
2016-10-31 03:32:43 +00:00
Sanjoy Das 299e67291c [SCEV] In CompareValueComplexity, order global values by their name
llvm-svn: 285529
2016-10-30 23:52:56 +00:00
Sanjoy Das b4830a84b9 [SCEV] Use auto for consistency with an upcoming change; NFC
llvm-svn: 285528
2016-10-30 23:52:53 +00:00
Sanjay Patel 339a51ac13 [DAG] x | x --> x
llvm-svn: 285522
2016-10-30 18:19:35 +00:00
Sanjay Patel 13aee345ca [DAG] x & x --> x
llvm-svn: 285521
2016-10-30 18:13:30 +00:00
Dorit Nuzman 06903d16af Revert r285517 due to build failures.
llvm-svn: 285518
2016-10-30 14:34:57 +00:00
Dorit Nuzman 3c1c658f24 [LoopVectorize] Make interleaved-accesses analysis less conservative about
possible pointer-wrap-around concerns, in some cases.

Before this patch, collectConstStridedAccesses (part of interleaved-accesses
analysis) called getPtrStride with [Assume=false, ShouldCheckWrap=true] when
examining all candidate pointers. This is too conservative. Instead, this
patch makes collectConstStridedAccesses use an optimistic approach, calling
getPtrStride with [Assume=true, ShouldCheckWrap=false], and then, once the
candidate interleave groups have been formed, revisits the pointer-wrapping
analysis but only where it matters: namely, in groups that have gaps, and where
the gaps are not at the very end of the group (in which case the loop is
peeled). This second time getPtrStride is called with [Assume=false,
ShouldCheckWrap=true], but this could further be improved to using Assume=true,
once we also add the logic to track that we are not going to meet the scev
runtime checks threshold.

Differential Revision: https://reviews.llvm.org/D25276

llvm-svn: 285517
2016-10-30 12:23:26 +00:00
Craig Topper b7781a95fd [X86] Use intrinsics table for PMADDUBSW and PMADDWD so that we can use the legacy intrinsics to select EVEX encoded instructions when available.
This removes a couple tablegen classes that become unused after this change. Another class gained an additional parameter to allow PMADDUBSW to specify a different result type from its input type.

llvm-svn: 285515
2016-10-30 06:56:16 +00:00
Teresa Johnson bf28c8fa45 [ThinLTO] Use per-summary flag to prevent exporting locals used in inline asm
Summary:
Instead of using the workaround of suppressing the entire index for
modules that call inline asm that may reference locals, use the
NoRename flag on the summary for any locals in the llvm.used set, and
add a reference edge from any functions containing inline asm.

This avoids issues from having no summaries despite the module defining
global values, which was preventing more aggressive index-based
optimization. It will be followed by a subsequent patch to make a
similar fix for local references in module level asm (to fix PR30610).

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26121

llvm-svn: 285513
2016-10-30 05:40:44 +00:00
Teresa Johnson 3bc8abdffc [ThinLTO] Correctly resolve linkonce when importing aliasee
Summary:
When we have an aliasee that is linkonce, while we can't convert
the non-prevailing copies to available_externally, we still need to
convert the prevailing copy to weak. If a reference to the aliasee
is exported, not converting a copy to weak will result in undefined
references when the linkonce is removed in its original module.

Add a new test and update existing tests.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26076

llvm-svn: 285512
2016-10-30 05:15:23 +00:00
Craig Topper bf9e5a16a4 [X86] Don't use loadv2i64 on SSE version of PMULHRSW. Use memopv2i64 instead.
This bug was introduced in r285501.

llvm-svn: 285510
2016-10-30 00:02:55 +00:00
NAKAMURA Takumi ff76cfefc0 NativeFormatting.cpp: Fix build for mingw. Where would writePadding() be?
llvm-svn: 285509
2016-10-29 23:14:18 +00:00
Teresa Johnson 38d4df714c [ThinLTO] Rename doPromoteLocalToGlobal to shouldPromoteLocalToGlobal (NFC)
Rename as suggested in code review for D26063.

llvm-svn: 285508
2016-10-29 21:52:23 +00:00
Teresa Johnson 1b9c2be8f4 [ThinLTO] Use NoPromote flag in summary during promotion
Summary:
Replace the check of whether a GV has a section with the flag check
in the summary. This is in preparation for using the NoPromote flag
to convey other situations when we can't promote (e.g. locals used in
inline asm).

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26063

llvm-svn: 285507
2016-10-29 21:31:48 +00:00
Peter Collingbourne 310474f576 IR: Remove a no longer needed assert.
This assert was checking for a miscompile in a version of GCC that
we no longer support.

llvm-svn: 285506
2016-10-29 20:57:12 +00:00
Craig Topper defe9ffbb5 [X86] Use intrinsics table for VPMULHRSW intrincis so that the legacy intrinsics can select EVEX encoded instructions when available.
This requires a minor rename of the instructions due to the use of different tablegen classes and how the names are concatenated.

llvm-svn: 285501
2016-10-29 18:41:45 +00:00
Sanjay Patel 36eeb6d6f6 [ValueTracking] recognize more variants of smin/smax
Try harder to detect obfuscated min/max patterns: the initial pattern was added with D9352 / rL236202. 
There was a bug fix for PR27137 at rL264996, but I think we can do better by folding the corresponding
smax pattern and commuted variants.

The codegen tests demonstrate the effect of ValueTracking on the backend via SelectionDAGBuilder. We
can't expose these differences minimally in IR because we don't have smin/smax intrinsics for IR.

Differential Revision: https://reviews.llvm.org/D26091

llvm-svn: 285499
2016-10-29 16:21:19 +00:00
Sanjay Patel 978f827d12 [InstCombine] re-use bitcasted compare operands in selects (PR28001)
These mixed bitcast patterns show up with SSE/AVX intrinsics because we bitcast function parameters to <2 x i64>.

The bitcasts obfuscate the expected min/max forms as shown in PR28001:
https://llvm.org/bugs/show_bug.cgi?id=28001#c6

Differential Revision: https://reviews.llvm.org/D25943

llvm-svn: 285495
2016-10-29 15:22:04 +00:00
Simon Pilgrim 75a697a17e [DAGCombiner] (REAPPLIED) Add vector demanded elements support to computeKnownBits
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.

I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.

DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.

This looked like this had caused compile time regressions on some buildbots (and was reverted in rL285381), but appears to have just been a harmless bystander!

Differential Revision: https://reviews.llvm.org/D25691

llvm-svn: 285494
2016-10-29 11:29:39 +00:00
Elena Demikhovsky 519b4ccd70 Fixed FMA + FNEG combine.
Masked form of FMA should be omitted in this optimization.

Differential Revision: https://reviews.llvm.org/D25984

llvm-svn: 285492
2016-10-29 08:44:46 +00:00
Matt Arsenault c88ba36eab AMDGPU: Use 1/2pi inline imm on VI
I'm guessing at how it is supposed to be printed

llvm-svn: 285490
2016-10-29 04:05:06 +00:00
Matthias Braun 7d78614ae9 AArch64DeadRegisterDefinitionsPass: Cleanup; NFC
- Fix doxygen file comment
- reduce indentation in loop
- Factor out some common subexpressions
- Move independent helper function out of class
- Fix Changed flag (this is not strictly NFC but a bugfix, but the flag
  seems ignored anyway)

llvm-svn: 285488
2016-10-29 01:03:41 +00:00
Rui Ueyama 77be2403f6 Define calculateDbgStreamSize for consistency.
llvm-svn: 285487
2016-10-29 00:56:44 +00:00
Tim Shen 1bab9cfbe5 [APFloat] Remove the redundent function body of uninitialized ctor, which should be done in r285468
llvm-svn: 285486
2016-10-29 00:51:41 +00:00
Zachary Turner 5b2243e884 Resubmit "Add support for advanced number formatting."
This resubmits r284436 and r284437, which were reverted in
r284462 as they were breaking the AArch64 buildbot.

The breakage on AArch64 turned out to be a miscompile which is
still not fixed, but is actively tracked at llvm.org/pr30748.

This resubmission re-writes the code in a way so as to make the
miscompile not happen.

llvm-svn: 285483
2016-10-29 00:27:22 +00:00
Davide Italiano 86168b23cf [DAGCombiner] Fix a crash visiting `AND` nodes.
Instead of asserting that the shift count is != 0 we just bail out
as it's not profitable trying to optimize a node which will be
removed anyway.

Differential Revision:  https://reviews.llvm.org/D26098

llvm-svn: 285480
2016-10-28 23:55:32 +00:00
Tom Stellard 6695ba0440 AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions
Summary:
Flat instruction can return out of order, so we need always need to wait
for all the outstanding flat operations.

Reviewers: tony-tye, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D25998

llvm-svn: 285479
2016-10-28 23:53:48 +00:00
Matt Arsenault 4e9c1e3a79 AMDGPU: Fix instruction flags for s_endpgm
Set isReturn, remove hasSideEffects. Also remove
hasCtrlDep, I'm not really sure what that does.

llvm-svn: 285476
2016-10-28 23:00:38 +00:00
Adrian Prantl 3cd37d0aeb Refactor DW_LNE_* into Dwarf.def
llvm-svn: 285475
2016-10-28 22:57:02 +00:00
Adrian Prantl 79deba6446 Refactor DW_LNS_* into Dwarf.def
llvm-svn: 285474
2016-10-28 22:56:59 +00:00
Adrian Prantl 8580d3f3d3 Refactor DW_APPLE_PROPERTY_* into Dwarf.def
llvm-svn: 285473
2016-10-28 22:56:56 +00:00
Adrian Prantl 44a4461b16 Refactor DW_CFA_* into Dwarf.def
llvm-svn: 285472
2016-10-28 22:56:53 +00:00
Adrian Prantl 23865816d5 Refactor all DW_FORM_* constants into Dwarf.def
llvm-svn: 285470
2016-10-28 22:56:45 +00:00
Tim Shen b4991548c8 [APFloat] Fix memory bugs revealed by MSan
Reviewers: eugenis, hfinkel, kbarton, iteratee, echristo

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D26102

llvm-svn: 285468
2016-10-28 22:45:33 +00:00
Justin Bogner db6b6a7f0c SDAG: Make sure we use an allocatable reg class when we create this vreg
As per the discussion on r280783, if constrainRegClass fails we need
to call getAllocatableClass like we did before that commit.

llvm-svn: 285467
2016-10-28 22:42:54 +00:00
Matt Arsenault 7b6475568d AMDGPU: Add definitions for scalar store instructions
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.

This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.

llvm-svn: 285463
2016-10-28 21:55:15 +00:00
Matt Arsenault 4b6a6cc8e9 AMDGPU: Rename glc operand type
While trying to add the glc bit to SMEM instructions on VI
with the new refactoring I ran into some kind of shadowing
problem for the glc operand when using the pseudoinstruction
as a multiclass parameter.

Everywhere that currently uses it defines the operand to have the same
name as its type, i.e. glc:$glc which works. For some reason now it
conflicts, and its up evaluating to the wrong thing. For the
real encoding classes,

let Inst{16} = !if(ps.has_glc, glc, ?); was not being evaluated
and still visible in the Inst initializer in the expanded td file.
In other cases I got a a different error about an illegal operand
where this was using { 0 } initializer from the bits<1> glc initializer
instead of evaluating it as false in the if.

For consistency all of the operand types should probably
be captialized to avoid conflicting with the variable names
unless somebody has a better idea of how to fix this.

llvm-svn: 285462
2016-10-28 21:55:08 +00:00
Justin Lebar f0a80ba385 [NVPTX] Compute 'rem' using the result of 'div', if possible.
Summary:
In isel, transform

  Num % Den

into

  Num - (Num / Den) * Den

if the result of Num / Den is already available.

Reviewers: tra

Subscribers: hfinkel, llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26090

llvm-svn: 285461
2016-10-28 21:44:00 +00:00
Justin Lebar 0ede5fb1bb Don't leave unused divs/rems sitting around in BypassSlowDivision.
Summary:
This "pass" eagerly creates div and rem instructions even when only one
is needed -- it relies on a later pass (machine DCE?) to clean them up.

This is problematic not just from a cleanliness perspective (this pass
is running during CodeGenPrepare, so should leave the IR in a better
state), but it also creates a problem for instruction selection.  If we
always have a div+rem, isel will always select a divrem instruction (if
possible), even when a single div or rem would do.

Specifically, in NVPTX, we want to compute rem from the output of div,
if available.  But if a div is not available, we want to leave the rem
alone.  This transformation is overeager if div is always available.

Because this code runs as part of CodeGenPrepare, it's nontrivial to
write a test for this change.  But this will effectively be tested by
a later patch which adds the aforementioned change to NVPTX isel.

Reviewers: tra

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26088

llvm-svn: 285460
2016-10-28 21:43:54 +00:00
Justin Lebar 468bf73209 Don't claim the udiv created in BypassSlowDivision is exact.
Summary:
In BypassSlowDivision's short-dividend path, we would create e.g.

  udiv exact i32 %a, %b

"exact" here means that we are asserting that %a is a multiple of %b.
But we have no reason to believe this must be true -- this is just a
bug, as far as I can tell.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D26097

llvm-svn: 285459
2016-10-28 21:43:51 +00:00
Matt Arsenault 4eae301995 AMDGPU: Diagnose using too many SGPRs
This is possible when using inline asm.

llvm-svn: 285447
2016-10-28 20:31:47 +00:00
Krzysztof Parzyszek 2717175c99 Handle non-~0 lane masks on live-in registers in LivePhysRegs
When LivePhysRegs adds live-in registers, it recognizes ~0 as a special
lane mask indicating the entire register. If the lane mask is not ~0,
it will only add the subregisters that overlap the specified lane mask.

The problem is that if a live-in register does not have subregisters,
and the lane mask is not ~0, it will not be added to the live set.
(The given lane mask may simply be the lane mask of its register class.)

If a register does not have subregisters, add it to the live set if
the lane mask is non-zero.

Differential Revision: https://reviews.llvm.org/D26094

llvm-svn: 285440
2016-10-28 20:06:37 +00:00
Matt Arsenault ef00283425 SpeculativeExecution: Allow speculating more inst types
Partial step towards removing the whitelist and only
using TTI's cost.

llvm-svn: 285438
2016-10-28 20:00:33 +00:00
Matt Arsenault 08906a3c62 AMDGPU: Fix using incorrect private resource with no allocation
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.

llvm-svn: 285435
2016-10-28 19:43:31 +00:00
Nemanja Ivanovic e28a0fc72a Implement vector count leading/trailing bytes with zero lsb and vector parity
builtins - llvm portion

This patch corresponds to review https://reviews.llvm.org/D26003.
Committing on behalf of Zaara Syeda.

llvm-svn: 285434
2016-10-28 19:38:24 +00:00
Teresa Johnson 7c31cb1665 [ThinLTO] Use flags from summary when writing variable summary (NFC)
We already read the flags out of the summary when writing the summary
records for functions and aliases, do the same for variables.

This is an NFC change for now since the flags computed on the fly from
the GlobalValue currently will always match those in the summary
already, but once I send a follow-on patch to set the NoRename flag for
locals in the llvm.used set this becomes a necessary change.

llvm-svn: 285433
2016-10-28 19:36:00 +00:00
George Burgess IV 013fd7315f [MemorySSA] Add const to getClobberingMemoryAccess.
Thanks to bryant for the patch!

Differential Revision: https://reviews.llvm.org/D26086

llvm-svn: 285432
2016-10-28 19:22:46 +00:00
Adrian Prantl 7e55f17825 Move the DWARF attribute constants into Dwarf.def and delete 300 lines of silly code.
llvm-svn: 285425
2016-10-28 18:21:39 +00:00
Matthias Braun de8c1b3433 MachineRegisterInfo: Remove unused arg from isConstantPhysReg(); NFC
llvm-svn: 285423
2016-10-28 18:05:09 +00:00
Matthias Braun 35a024fe0f TargetPassConfig: Move addPass of IPRA RegUsageInfoProp down.
TargetPassConfig::addMachinePasses() does some housekeeping first:
Handling the -print-machineinstrs flag and doing an initial printing
"After Instruction Selection". There is no reason for RegUsageInfoProp
to run before those two steps.

llvm-svn: 285422
2016-10-28 18:05:05 +00:00
Adrian Prantl c4fbbcf9ed Import/update constants from the DWARF 5 public review draft document.
https://reviews.llvm.org/D26051

llvm-svn: 285421
2016-10-28 17:59:50 +00:00
Krzysztof Parzyszek 87a47be039 [Hexagon] Maintain kill flags through splitting in expand-condsets
Do not use LiveIntervals to recalculate kills, because that cannot be
done accurately without implicit uses on predicated instructions.

llvm-svn: 285409
2016-10-28 15:50:22 +00:00
Tom Stellard 13068995b9 [Loads] Fix crash in is isDereferenceableAndAlignedPointer()
Summary:
We were trying to add APInt values with different bit sizes after
visiting an addrspacecast instruction which changed the bit width
of the pointer.

Reviewers: majnemer, hfinkel

Subscribers: hfinkel, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24774

llvm-svn: 285407
2016-10-28 15:32:28 +00:00
Simon Pilgrim d9189891fc [SelectionDAG] computeKnownBits - early-out if any BUILD_VECTOR element has no known bits
No need to check the remaining elements - no common known bits are available.

llvm-svn: 285399
2016-10-28 14:07:44 +00:00
Simon Pilgrim 8c043061e5 [SelectionDAG] Tidyup UDIV computeKnownBits implementation
No need to clear KnownOne2/KnownZero2 bits as the next call to computeKnownBits will overwrite them anyway

llvm-svn: 285398
2016-10-28 13:42:23 +00:00
Simon Pilgrim 755cef1ba8 [SelectionDAG] Increment computeKnownBits recursion depth for SMIN/SMAX/UMIN/UMAX like all other ops
llvm-svn: 285397
2016-10-28 13:13:16 +00:00
Igor Laevsky c3ccf5d77b [LCSSA] Perform LCSSA verification only for the current loop nest.
Now LPPassManager will run LCSSA verification only for the top-level loop
which was processed on the current iteration.

Differential Revision: https://reviews.llvm.org/D25873

llvm-svn: 285394
2016-10-28 12:57:20 +00:00
Juergen Ributzka 5cee232be4 Revert "[DAGCombiner] Add vector demanded elements support to computeKnownBits"
This seems to have increased LTO compile time bejond 2x of previous builds.
See http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto/10676/

llvm-svn: 285381
2016-10-28 04:01:12 +00:00
Davide Italiano 631cd27f29 [Reassociate] Removing instructions mutates the IR.
Fixes PR 30784. Discussed with Justin, who pointed out that
in the new PassManager infrastructure we can have more fine-grained
control on which analyses we want to preserve, but this is the
best we can do with the current infrastructure.

llvm-svn: 285380
2016-10-28 02:47:09 +00:00
Teresa Johnson 02563cd3a6 [ThinLTO] Create AliasSummary when building index
Summary:
Previously we were creating the alias summary on the fly while writing
the summary to bitcode. This moves the creation of these summaries to
the module summary index builder where we build the rest of the summary
index.

This is going to be necessary for setting the NoRename flag for values
possibly used in inline asm or module level asm.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26049

llvm-svn: 285379
2016-10-28 02:39:38 +00:00
Teresa Johnson 58fbc916a0 [ThinLTO] Rename HasSection to NoRename (NFC)
Summary:
This is in preparation for a change to utilize this flag for symbols
referenced/defined in either inline or module level assembly.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26048

llvm-svn: 285376
2016-10-28 02:24:59 +00:00
Davide Italiano 6231a7e4d1 [IR] Clang-format my previous commit. NFCI.
llvm-svn: 285375
2016-10-28 01:41:56 +00:00
Davide Italiano 30665147f9 [ConstantFold] Get the correct vector type when folding a getelementptr.
Differential Revision:  https://reviews.llvm.org/D26014

llvm-svn: 285371
2016-10-28 00:53:16 +00:00
Tom Stellard aea899e2a0 AMDGPU/SI: Handle hazard with s_rfe_b64
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25638

llvm-svn: 285368
2016-10-27 23:50:21 +00:00
Tom Stellard 04051b5fad AMDGPU/SI: Handle hazard with sgpr lane selects for v_{read,write}lane
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25637

llvm-svn: 285367
2016-10-27 23:42:29 +00:00
Tom Stellard 6b9c1be4ea AMDGPU/SI: Fix unused variable warning on non-debug builds
llvm-svn: 285363
2016-10-27 23:28:03 +00:00
Ekaterina Romanova b7f96d1241 Reverting back r285355: "Update .debug_line section version information to match DWARF version", while I'm investigating a test failure.
llvm-svn: 285362
2016-10-27 23:20:19 +00:00
Tom Stellard b133fbb9a4 AMDGPU/SI: Handle hazard with > 8 byte VMEM stores
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25577

llvm-svn: 285359
2016-10-27 23:05:31 +00:00
Tim Shen 139a58f75e Reapply r285351 "[APFloat] Add DoubleAPFloat mode to APFloat. NFC." with
a workaround for old clang.

llvm-svn: 285358
2016-10-27 22:52:40 +00:00
Ekaterina Romanova 0b82459c6c Update .debug_line section version information to match DWARF version.
In the past the compiler always emitted .debug_line version 2, though some opcodes from DWARF 3 (e.g. DW_LNS_set_prologue_end, DW_LNS_set_epilogue_begin or DW_LNS_set_isa) and from DWARF 4 could be emitted by the compiler. 

This patch changes version information of .debug_line to exactly match the DWARF version. For .debug_line version 4, a new field maximum_operations_per_instruction is emitted. 

Differential Revision: https://reviews.llvm.org/D16697

llvm-svn: 285355
2016-10-27 22:37:25 +00:00
Tim Shen 414b0155c4 Revert "[APFloat] Add DoubleAPFloat mode to APFloat. NFC."
This reverts r285351, since it breaks the build.

llvm-svn: 285354
2016-10-27 21:54:29 +00:00
Kostya Serebryany bcfb0802e2 [libFuzzer] enable use_cmp by default
llvm-svn: 285353
2016-10-27 21:44:37 +00:00
Tim Shen f38e87fa48 [APFloat] Add DoubleAPFloat mode to APFloat. NFC.
Summary:
This patch adds DoubleAPFloat mode to APFloat.

Now, an APFloat with semantics PPCDoubleDouble will have DoubleAPFloat layout
(APFloat.U.Double), which contains two underlying APFloats as
PPCDoubleDoubleImpl and IEEEdouble semantics. Currently the IEEEdouble APFloat
is not used, and the first APFloat behaves exactly the same before this change.

This patch consists of three kinds of logics:
1) Construction and destruction of APFloat. Now the ctors, dtor, assign
   opertors and factory functions construct different underlying layout
   based on the semantics passed in.
2) s/IEEE/getIEEE()/ for normal, lifetime-unrelated computation functions.
   These functions only access Floats[0] in DoubleAPFloat, which is the
   same as today's semantic.
3) A "Double dispatch" function, APFloat::convert. Converting between two
   different layouts requires appropriate logic.

Neither of these change the external behavior.

Reviewers: hfinkel, kbarton, echristo, iteratee

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D25977

llvm-svn: 285351
2016-10-27 21:39:51 +00:00
Peter Collingbourne fc0a99bfda BitcodeReader: Require clients to read the block info block at most once.
This change makes it the client's responsibility to call ReadBlockInfoBlock()
at most once. This is in preparation for a future change that will allow
there to be multiple block info blocks.

See also: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106512.html

Differential Revision: https://reviews.llvm.org/D26016

llvm-svn: 285350
2016-10-27 21:39:28 +00:00
Kyle Butt ab9cca7b0c CodeGen: Handle missed case of block removal during BlockPlacement.
There is a use after free bug in the existing code. Loop layout selects
a preferred exit block, and then lays out the loop. If this block is
removed during layout, it needs to be invalidated to prevent a use after
free.

llvm-svn: 285348
2016-10-27 21:37:20 +00:00
Sanjay Patel c0de9c9e40 [InstCombine] fix foldSPFofSPF() to handle vector splats
llvm-svn: 285345
2016-10-27 21:19:40 +00:00
Kevin Enderby bc5c29a65f Another additional error check for invalid Mach-O files for the
obsolete load commands.

Again the philosophy of the error checking in libObject for
Mach-O files, the idea behind the checking is that we never
will return a Mach-O file out of libObject that contains unknown
things the library code can’t operate on.  So known obsolete
load commands will cause a hard error.

Also to make things clear I have added comments to the
values and structures in Support/Mach-O.h and
Support/MachO.def as to what is obsolete.

As noted in a TODO in the code, there may need to be a
non-default mode to allow some unknown values for well
structured Mach-O files with things like unknown load
load commands.  So things like using an old lldb on a newer
Mach-O file could still provide some limited functionality.

llvm-svn: 285342
2016-10-27 20:59:10 +00:00
Tom Stellard 30d30824b4 AMDGPU/SI: Handle s_setreg hazard in GCNHazardRecognizer
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25528

llvm-svn: 285338
2016-10-27 20:39:09 +00:00
Haicheng Wu 430b3e4893 [LoopUnroll] Check partial unrolling is enabled before initialization. NFC.
Differential Revision: https://reviews.llvm.org/D23891

llvm-svn: 285330
2016-10-27 18:40:02 +00:00
Simon Pilgrim d23219b9ee [X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets
llvm-svn: 285329
2016-10-27 18:32:06 +00:00
Sanjay Patel 611f9f92fc [InstCombine] handle simple vector integer constants in IsFreeToInvert
llvm-svn: 285318
2016-10-27 17:30:50 +00:00
Simon Pilgrim 47c1ff7a43 [X86][AVX512DQ] Move v2i64 and v4i64 MUL lowering to tablegen
As suggested by @igorb on D26011

llvm-svn: 285313
2016-10-27 17:07:40 +00:00
Saleem Abdulrasool 075d2e3c59 ARM: ensure that the Windows DBZ check is in range
The Windows ARM target expects the compiler to emit a division-by-zero check.
The check would use the form of:

    cmp r?, #0
    cbz .Ltrap
    b .Lbody
  .Lbody:
    ...
  .Ltrap:
    udf #249 @ __brkdiv0

This works great most of the time.  However, if the body of the function is
greater than 127 bytes, the branch target limitation of cbz becomes an issue.
This occurs in the unoptimized code generation cases sometimes (like in
compiler-rt).

Since this is a matter of correctness, possibly pay a small penalty instead.  We
now form this slightly differently:

    cbnz .Lbody
    udf #249 @ __brkdiv0
  .Lbody:
    ...

The positive case is through the branch instead of being the next instruction.
However, because of the basic block layout, the negated branch is going to be
a short distance always (2 bytes away, after the inserted __brkdiv0).

The new t__brkdiv0 instruction is required to explicitly mark the instruction as
a terminator as the generic UDF instruction is not a terminator.

Addresses PR30532!

llvm-svn: 285312
2016-10-27 16:59:22 +00:00
Greg Clayton 6c273763a3 Switch all DWARF variables for tags, attributes and forms over to use the llvm::dwarf enumerations instead of using raw uint16_t values. This allows easier debugging as users can see the values of the enumerations in the variables view that will show the enumeration string instead of just a number.
https://reviews.llvm.org/D26013

llvm-svn: 285309
2016-10-27 16:32:04 +00:00
Dehao Chen b94c09baa0 Add Loop Sink pass to reverse the LICM based of basic block frequency.
Summary: LICM may hoist instructions to preheader speculatively. Before code generation, we need to sink down the hoisted instructions inside to loop if it's beneficial. This pass is a reverse of LICM: looking at instructions in preheader and sinks the instruction to basic blocks inside the loop body if basic block frequency is smaller than the preheader frequency.

Reviewers: hfinkel, davidxl, chandlerc

Subscribers: anna, modocache, mgorny, beanz, reames, dberlin, chandlerc, mcrosier, junbuml, sanjoy, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D22778

llvm-svn: 285308
2016-10-27 16:30:08 +00:00
Vasileios Kalintiris cfb005a0ee [mips] Do not allow -opt-bisect-limit to skip the PIC call optimization pass.
r282428 added the MipsOptimizePICCall as an opt-in pass that can be
skipped when using the -opt-bisect-limit option. However, this pass is
needed because it generates code that conforms to the o32 ABI
specification by using the $t9 register for PIC calls with JALR
instructions.

This bug was exposed by the fact that skipFunction() also checks for
the "optnone" attribute. This caused functions with that attribute to
break the requirements of the o32 ABI.

llvm-svn: 285305
2016-10-27 15:50:36 +00:00
Simon Pilgrim 820e1326d7 [X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq).

Updated cost table accordingly.

Differential Revision: https://reviews.llvm.org/D26011

llvm-svn: 285304
2016-10-27 15:27:00 +00:00
Sanjay Patel e372aecb8a [ValueTracking] fix matchSelectPattern to allow vector splat folds of min/max/abs/nabs
llvm-svn: 285303
2016-10-27 15:26:10 +00:00
Bjorn Pettersson 807f732ce8 Fix memory issue in AttrBuilder::removeAttribute uses.
Summary:
Found when running Valgrind.

This removes two unnecessary assignments when using
AttrBuilder::removeAttribute.

AttrBuilder::removeAttribute returns a reference to the object.
As the LHSes were the same as the callees, the assignments
resulted in memcpy calls where dst = src.

Commited on behalf-of: dstenb (David Stenberg)

Reviewers: mkuper, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25460

llvm-svn: 285298
2016-10-27 14:48:09 +00:00
Krzysztof Parzyszek 046da74699 [Hexagon] Do not expand ISD::SELECT for HVX vectors
llvm-svn: 285297
2016-10-27 14:30:16 +00:00
Simon Pilgrim 01e755eab1 [DAGCombiner] Add vector demanded elements support to computeKnownBits
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.

I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.

DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.

Differential Revision: https://reviews.llvm.org/D25691

llvm-svn: 285296
2016-10-27 14:29:28 +00:00
Alexey Bataev 46c0278e7d [SLP] Fix for PR30626: Compiler crash inside SLP Vectorizer.
After successfull horizontal reduction vectorization attempt for PHI node
vectorizer tries to update root binary op by combining vectorized tree
and the ReductionPHI node. But during vectorization this ReductionPHI
can be vectorized itself and replaced by the `undef` value, while the
instruction itself is marked for deletion. This 'marked for deletion'
PHI node then can be used in new binary operation, causing "Use still
stuck around after Def is destroyed" crash upon PHI node deletion.

Also the test is fixed to make it perform actual testing.

Differential Revision: https://reviews.llvm.org/D25671

llvm-svn: 285286
2016-10-27 12:02:28 +00:00
Sam Parker e7d9505c08 [ARM] Predicate UMAAL selection on hasDSP.
UMAAL is a DSP instruction and it is not available on thumbv7m
(Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong
CHECK prefix in longMAC.ll test.

Patch by Vadzim Dambrouski.

Differential Revision: https://reviews.llvm.org/D25890

llvm-svn: 285278
2016-10-27 09:47:10 +00:00
Dylan McKay dd680cc753 [AVR] Generate all of the TableGen files we need
This enables generation of all of the TableGen files that are used
downstream.

llvm-svn: 285274
2016-10-27 08:20:47 +00:00
Nicolai Haehnle 7b0e25b7ad AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.

Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect].

The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25829

llvm-svn: 285273
2016-10-27 08:15:07 +00:00
Dylan McKay 00009d4824 [AVR] Compile the disassembler
This also updates references of 'TheAVRTarget' to the new
'getTheAVRTarget()' method.

llvm-svn: 285272
2016-10-27 08:09:15 +00:00
Dylan McKay ec47065795 [AVR] Add AVRISelDAGToDAG.cpp
Summary: This pulls the AVR instruction selector in-tree.

Reviewers: arsenm, kparzysz

Subscribers: llvm-commits, wdng, beanz, japaric, mgorny

Differential Revision: https://reviews.llvm.org/D25278

llvm-svn: 285270
2016-10-27 07:03:47 +00:00
Dylan McKay 6eaa4e4bcc [AVR] Add the machine code emitter
Reviewers: arsenm, kparzysz

Subscribers: wdng, beanz, japaric, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D25388

llvm-svn: 285269
2016-10-27 06:56:46 +00:00
Nemanja Ivanovic 32b5fed639 [PowerPC] - No SExt/ZExt needed for count trailing zeros
This patch corresponds to review:
https://reviews.llvm.org/D25896

It just eliminates the redundant ZExt after a count trailing zeros instruction.

llvm-svn: 285267
2016-10-27 05:17:58 +00:00
Kostya Serebryany 94c427c23e [libFuzzer] speculatively trying to fix the Mac build; second attempt
llvm-svn: 285262
2016-10-27 00:36:38 +00:00
Kostya Serebryany 3d945f6247 [libFuzzer] revert 285259 -- hit commit too soon
llvm-svn: 285260
2016-10-27 00:24:34 +00:00
Kostya Serebryany 15cd6b4b10 [libFuzzer] speculatively trying to fix the Mac build
llvm-svn: 285259
2016-10-27 00:22:39 +00:00
Evandro Menezes ca8370396a [AArch64] Create feature set for Samsung Exynos-M2
Since Exynos-M2 improved the FP square root unit a bit over the one in
Exynos-M1, it does not benefit from using the Newton series for such
operations.

llvm-svn: 285246
2016-10-26 22:06:20 +00:00
Victor Leschuk a37660c669 DebugInfo: fix incorrect alignment type (NFC)
Change type of some missed DebugInfo-related alignment variables,
that are still uint64_t, to uint32_t.

Original change introduced in r284482.

llvm-svn: 285242
2016-10-26 21:32:29 +00:00
Tim Northover a9cc385664 ARM: don't rely on push/pop reglists being in order when folding SP adjust.
It would be a very nice invariant to rely on, but unfortunately it doesn't
necessarily hold (and the causes of mis-sorted reglists appear to be quite
varied) so to be robust the frame lowering code can't assume that the first
register in the list is also the first one that actually gets pushed.

Should fix an issue where we were turning something like:

    push {r8, r4, r7, lr}
    sub sp, #24

into nonsense like:

    push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr}

llvm-svn: 285232
2016-10-26 20:01:00 +00:00
Nemanja Ivanovic 275853e777 Do not assume that FP vector operands are never legalized by expanding
This patch ensures that if a floating point vector operand is legalized by
expanding, it is legalized through the stack rather than by calling
DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand
is a non-integer type.

This fixes PR 30715.

llvm-svn: 285231
2016-10-26 19:51:35 +00:00
Sanjoy Das 01969218a4 Simplify `x >=u x >> y` and `x >=u x udiv y`
Summary:
Extends InstSimplify to handle both `x >=u x >> y` and `x >=u x udiv y`.

This is a folloup of rL258422 and
https://github.com/rust-lang/rust/pull/30917 where llvm failed to
optimize away the bounds checking in a binary search.

Patch by Arthur Silva!

Reviewers: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25941

llvm-svn: 285228
2016-10-26 19:18:43 +00:00
Chad Rosier 4447d7a816 Revert "[AliasSetTracker] Make AST smarter about intrinsics that don't actually affect memory."
This reverts commit r285191.

LICM appears to rely on the Alias Set Tracker hitting lifetime markers to prevent
code from being moved outside of the original scope.

llvm-svn: 285227
2016-10-26 19:18:19 +00:00
Nemanja Ivanovic 0f45998bc6 [PowerPC] Implement vec_insert_exp builtins - llvm portion
This revision corresponds to review: https://reviews.llvm.org/D25957.
Committing on behalf of Zaara Syeda.

llvm-svn: 285225
2016-10-26 19:03:40 +00:00
Kostya Serebryany 2fabecaee3 [libFuzzer] simplify TracePC::HandleTrace even further. Also, when dealing with -exit_on_src_pos, symbolize every PC only once
llvm-svn: 285223
2016-10-26 18:52:04 +00:00
Chad Rosier 0c621fda0d [AArch64] Avoid materializing constant 1 when generating cneg instructions.
Instead of

 cmp w0, #1
 orr w8, wzr, #0x1
 cneg w0, w8, ne

we now generate

 cmp w0, #1
 csinv w0, w0, wzr, eq

PR28965

llvm-svn: 285217
2016-10-26 18:15:32 +00:00
Dan Gohman 68a423bf84 [WebAssembly] Update the README.txt.
Update the README.txt with newer information, add a link to the Emscripten
page explaining the current easiest way to use the LLVM wasm backend, and
mention that other ways of using the LLVM wasm backend are in development.

llvm-svn: 285215
2016-10-26 17:44:09 +00:00
Nirav Dave e2369aafb9 [MC] Fix comma typo in .loc parsing
llvm-svn: 285214
2016-10-26 17:28:58 +00:00
Robert Lougher 660f2f9560 Reapply: "Remove debug location from common tail when tail-merging"
This reapplies revision 285093.  Original commit message:

The branch folding pass tail merges blocks into a common-tail.  However, the
tail retains the debug information from one of the original inputs to the
merge (chosen randomly).  This is a problem for sampled-based PGO, as hits
on the common-tail will be attributed to whichever block was chosen,
irrespective of which path was actually taken to the common-tail.

This patch fixes the issue by nulling the debug location for the common-tail.

Differential Revision: https://reviews.llvm.org/D25742

llvm-svn: 285212
2016-10-26 17:01:47 +00:00
Yaxun Liu 94add85adb AMDGPU: Refactor processor definition to use ISA version features
Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend.

Refactor processor definition to use ISA version features.

Fixed ISA version for stoney.

Based on Laurent Morichetti's patch.

Differential Revision: https://reviews.llvm.org/D25919

llvm-svn: 285210
2016-10-26 16:37:56 +00:00
Dehao Chen e713000eb6 Introduce updateDiscriminator interface to DILocation to make it cleaner assigning discriminators.
Summary: This patch introduces updateDiscriminator to DILocation so that it can be directly called by AddDiscriminator. It also makes it easier to update the discriminator later.

Reviewers: dnovillo, dblaikie, aprantl, echristo

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D25959

llvm-svn: 285207
2016-10-26 15:48:45 +00:00
Matt Arsenault 39787bdcbb Reapply "AMDGPU: Don't use offen if it is 0"
This reverts r283003

llvm-svn: 285203
2016-10-26 15:08:16 +00:00
Matt Arsenault 1110f14b42 AMDGPU: Fix counting si_mask_branch as 4 bytes
llvm-svn: 285202
2016-10-26 14:53:54 +00:00
Matt Arsenault 8fac501602 Fix nondeterministic output in local stack slot alloc pass
This finds all of the references to a frame index in a function, and
sorts by the offset. If multiple instructions use the same offset,
nothing was breaking the tie for sorting.

This avoids the test failures the reverted r282999 introduced.

llvm-svn: 285201
2016-10-26 14:53:50 +00:00
Sanjay Patel 8d7196bfde [InstCombine] clean up commonCastTransforms; NFC
1. Use 'auto' with dyn_cast.
2. Variables start with a capital letter.
3. Use proper punctuation in comments.

llvm-svn: 285200
2016-10-26 14:52:35 +00:00
Tom Stellard 284cf32ab4 LegalizeDAG: Support promoting [US]DIV and [US]REM operations
Summary:
AMDGPU will need this one i16 is added as a legal type.  This is tested by:

test/CodeGen/AMDGPU/sdiv.ll
test/CodeGen/AMDGPU/sdivrem24.ll
test/CodeGen/AMDGPU/udiv.ll
test/CodeGen/AMDGPU/udivrem24.ll

Reviewers: bogner, efriedma

Subscribers: efriedma, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D25699

llvm-svn: 285199
2016-10-26 14:52:25 +00:00
Tom Stellard f8e6eaff6e AMDGPU/SI: Don't emit multi-dword flat memory ops when they might access scratch
Summary:
A single flat memory operations that might access the scratch buffer
can only access MaxPrivateElementSize bytes.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25788

llvm-svn: 285198
2016-10-26 14:38:47 +00:00
Zvi Rackover aa3402b41e [X86] AVX512 fallback for floating-point scalar selects
Summary:
In the case where of 'select i1 , f32, f32' or select i1, f64, f64 prefer lowering to masked-moves over branches.

Fixes pr30561

Reviewers: igorb, aymanmus, delena

Differential Revision: https://reviews.llvm.org/D25310

llvm-svn: 285196
2016-10-26 14:12:46 +00:00
Chad Rosier 1408628ffa [AliasSetTracker] Make AST smarter about intrinsics that don't actually affect memory.
Differential Revision: https://reviews.llvm.org/D25969

llvm-svn: 285191
2016-10-26 12:42:11 +00:00
Victor Leschuk 3c9899842b DebugInfo: support for DWARFv5 DW_AT_alignment attribute
* Assume that clang passes non-zero alignment value to DIBuilder
only in case when it was forced by C++11 'alignas', C11 '_Alignas'
or compiler attribute '__attribute__((aligned (N)))'.

* Emit DW_AT_alignment if alignment is specified for type/object.

Differential Revision: https://reviews.llvm.org/D24425

llvm-svn: 285189
2016-10-26 11:59:03 +00:00
Andrea Di Biagio 9bcb064f19 [IndVarSimplify][DebugLoc] When widening the exit loop condition, correctly reuse the debug location of the original comparison.
When the loop exit condition is canonicalized as a != compaison, reuse the
debug location of the original (non canonical) comparison.

Before this patch, the debug location of the new icmp was obtained from the
loop latch terminator. This patch fixes the issue by correctly setting the
IRBuilder's "current debug location" to the location of the original compare.

Differential Revision: https://reviews.llvm.org/D25953

llvm-svn: 285185
2016-10-26 10:28:32 +00:00
Vassil Vassilev df5042ab61 Revert r285181 "DebugInfo: support for DWARFv5 DW_AT_alignment attribute".
The commit broke the builds.

llvm-svn: 285183
2016-10-26 10:13:47 +00:00
Victor Leschuk e398c6afa9 DebugInfo: support for DWARFv5 DW_AT_alignment attribute
* Assume that clang passes non-zero alignment value to DIBuilder
only in case when it was forced by C++11 'alignas', C11 '_Alignas'
or compiler attribute '__attribute__((aligned (N)))'.

* Emit DW_AT_alignment if alignment is specified for type/object.

Differential Revision: https://reviews.llvm.org/D24425

llvm-svn: 285181
2016-10-26 08:55:27 +00:00
Craig Topper 812d3d30ae [AVX-512] Add scalar vfmsub/vfnmsub mask3 intrinsics
Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions.

Reviewers: igorb, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25933

llvm-svn: 285173
2016-10-26 04:59:58 +00:00
Peter Collingbourne 7b7bac367c Cloning: Also clone global variable attached metadata.
llvm-svn: 285161
2016-10-26 02:57:33 +00:00
Kostya Serebryany 06b8757b57 [libFuzzer] simplify the code in TracePC::HandleTrace a bit more
llvm-svn: 285147
2016-10-26 00:42:52 +00:00
Kostya Serebryany a5b2e54fcb [libFuzzer] simplify the code to print new PCs
llvm-svn: 285145
2016-10-26 00:20:51 +00:00
Evgeniy Stepanov ea6d49d3ee Utility functions for appending to llvm.used/llvm.compiler.used.
llvm-svn: 285143
2016-10-25 23:53:31 +00:00
Kostya Serebryany 275e260258 [libFuzzer] simplify the code in TracePC::HandleTrace
llvm-svn: 285142
2016-10-25 23:52:25 +00:00
Kostya Serebryany 117976818e [libFuzzer] add StandaloneFuzzTargetMain.c and a test for it
llvm-svn: 285135
2016-10-25 22:30:34 +00:00
James Y Knight 2e64b8b79e [Sparc] Don't overlap variable-sized allocas with other stack variables.
On SparcV8, it was previously the case that a variable-sized alloca
might overlap by 4-bytes the last fixed stack variable, effectively
because 92 (the number of bytes reserved for the register spill area) !=
96 (the offset added to SP for where to start a DYNAMIC_STACKALLOC).

It's not as simple as changing 96 to 92, because variables that should
be 8-byte aligned would then be misaligned.

For now, simply increase the allocation size by 8 bytes for each dynamic
allocation -- wastes space, but at least doesn't overlap. As the large
comment says, doing this more efficiently will require larger changes in
llvm.

Also adds some test cases showing that we continue to not support
dynamic stack allocation and over-alignment in the same function.

llvm-svn: 285131
2016-10-25 22:13:28 +00:00
Bob Haarman 26a87bd030 [codeview] support emitting indirect virtual base class information
Summary:
Fixes PR28281.

MSVC lists indirect virtual base classes in the field list of a class,
using LF_IVBCLASS records. This change makes LLVM emit such records
when processing DW_TAG_inheritance tags with the DIFlagVirtual and
(newly introduced) DIFlagIndirect tags.

Reviewers: rnk, ruiu, zturner

Differential Revision: https://reviews.llvm.org/D25578

llvm-svn: 285130
2016-10-25 22:11:52 +00:00
Simon Pilgrim de86241a09 [DAGCombiner] Enable (urem x, (shl pow2, y)) -> (and x, (add (shl pow2, y), -1)) combine for splatted vectors
llvm-svn: 285129
2016-10-25 22:01:09 +00:00
Rong Xu 33308f92eb [PGO] Fix select instruction annotation
Summary:
Select instruction annotation in IR PGO uses the edge count to infer the
branch count. It's currently placed in setInstrumentedCounts() where
no all the BB counts have been computed. This leads to wrong branch weights.
Move the annotation after all BB counts are populated.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25961

llvm-svn: 285128
2016-10-25 21:47:24 +00:00
Simon Pilgrim f534573e8c [DAGCombiner] Enable srem(x.y) -> urem(x,y) combine for vectors
SelectionDAG::SignBitIsZero (via SelectionDAG::computeKnownBits) has supported vectors since rL280927

llvm-svn: 285123
2016-10-25 21:20:18 +00:00
Simon Pilgrim 4ebb04510a [DAGCombiner] Enable sdiv(x.y) -> udiv(x,y) combine for vectors
SelectionDAG::SignBitIsZero (via SelectionDAG::computeKnownBits) has supported vectors since rL280927

llvm-svn: 285118
2016-10-25 20:56:42 +00:00
Guozhi Wei ae541f6a71 [InstCombine] Resubmit the combine of A->B->A BitCast and fix for pr27996
The original patch of the A->B->A BitCast optimization was reverted by r274094 because it may cause infinite loop inside compiler https://llvm.org/bugs/show_bug.cgi?id=27996.

The problem is with following code

xB = load (type B); 
xA = load (type A); 
+yA = (A)xB; B -> A
+zAn = PHI[yA, xA]; PHI 
+zBn = (B)zAn; // A -> B
store zAn;
store zBn;

optimizeBitCastFromPhi generates

+zBn = (B)zAn; // A -> B

and expects it will be combined with the following store instruction to another

store zAn 

Unfortunately before combineStoreToValueType is called on the store instruction, optimizeBitCastFromPhi is called on the new BitCast again, and this pattern repeats indefinitely.

optimizeBitCastFromPhi only generates BitCast for load/store instructions, only the BitCast before store can cause the reexecution of optimizeBitCastFromPhi, and BitCast before store can easily be handled by InstCombineLoadStoreAlloca.cpp. So the solution to the problem is if all users of a CI are store instructions, we should not do optimizeBitCastFromPhi on it. Then optimizeBitCastFromPhi will not be called on the new BitCast instructions.

Differential Revision: https://reviews.llvm.org/D23896

llvm-svn: 285116
2016-10-25 20:43:42 +00:00
Robert Lougher 3080d71fc8 revert: "Remove debug location from common tail when tail-merging"
This reverts r285093, as it caused unexpected buildbot failures on
clang-ppc64le-linux, clang-ppc64be-linux, clang-ppc64be-linux-multistage
and clang-ppc64be-linux-lnt.  Failing test ubsan/TestCases/TypeCheck/vptr.cpp.

llvm-svn: 285110
2016-10-25 20:17:58 +00:00
Kostya Serebryany c48c93184a [libFuzzer] when mutating based on CMP traces also try adding +/- 1 to the desired bytes. Add another test for use_cmp
llvm-svn: 285109
2016-10-25 20:15:15 +00:00
Sanjay Patel f3dda13bd2 [InstCombine] Ensure that truncated int types are legal.
Fixes the FIXMEs in D25952 and rL285075.

Patch by bryant!

Differential Revision: https://reviews.llvm.org/D25955

llvm-svn: 285108
2016-10-25 20:11:47 +00:00
Evandro Menezes 7696dc0685 [AArch64] Adjust the cost model for Exynos M1.
Modify the maximum jump table size.

llvm-svn: 285106
2016-10-25 20:05:42 +00:00
Tim Shen 85de51db85 [APFloat] Make APFloat an interface class to the internal IEEEFloat. NFC.
Summary:
The intention is to make APFloat an interface class, so that later I can add a second implementation class DoubleAPFloat to correctly implement PPCDoubleDouble semantic. The interface of IEEEFloat is not public, and can be simplified (currently it's exactly the same as the old APFloat), but that belongs to a separate patch.

DoubleAPFloat should look like:
class DoubleAPFloat {
  const fltSemantics *Semantics;
  std::unique_ptr<APFloat> APFloats;  // Two heap-allocated APFloats.
};

There is no functional change, nor public interface change.

Reviewers: hfinkel, chandlerc, iteratee, echristo, kbarton

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D25536

llvm-svn: 285105
2016-10-25 19:55:59 +00:00
Evandro Menezes eb97e3554c Add option to specify minimum number of entries for jump tables
Add an option to allow easier experimentation by target maintainers with the
minimum number of entries to create jump tables.  Also clarify the name of
the other existing option governing the creation of jump tables.

Differential revision: https://reviews.llvm.org/D25883

llvm-svn: 285104
2016-10-25 19:53:51 +00:00
Evandro Menezes 601f4cb9f7 Switch lowering: improve partitioning of jump tables
When there's a tie between partitionings of jump tables, consider also cases
that result in no jump tables, but in one or a few cases.  The motivation is
that many contemporary processors typically perform case switches fairly
quickly.

Differential revision: https://reviews.llvm.org/D25212

llvm-svn: 285099
2016-10-25 19:11:43 +00:00
Matthew Simpson c62266d680 [LV] Sink scalar operands of predicated instructions
When we predicate an instruction (div, rem, store) we place the instruction in
its own basic block within the vectorized loop. If a predicated instruction has
scalar operands, it's possible to recursively sink these scalar expressions
into the predicated block so that they might avoid execution. This patch sinks
as much scalar computation as possible into predicated blocks. We previously
were able to sink such operands only if they were extractelement instructions.

Differential Revision: https://reviews.llvm.org/D25632

llvm-svn: 285097
2016-10-25 18:59:45 +00:00
Michael Ilseman e542804343 Add -strip-nonlinetable-debuginfo capability
This adds a new function to DebugInfo.cpp that takes an llvm::Module
as input and removes all debug info metadata that is not directly
needed for line tables, thus effectively stripping all type and
variable information from the module.

The primary motivation for this feature was the bitcode work flow
(cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html
for more background). This is not wired up yet, but will be in
subsequent patches.  For testing, the new functionality is exposed to
opt with a -strip-nonlinetable-debuginfo option.

The secondary use-case (and one that works right now!) is as a
reduction pass in bugpoint. I added two new bugpoint options
(-disable-strip-debuginfo and -disable-strip-debug-types) to control
the new features. By default it will first attempt to remove all debug
information, then only the type info, and then proceed to hack at any
remaining MDNodes.

Thanks to Adrian Prantl for stewarding this patch!

llvm-svn: 285094
2016-10-25 18:44:13 +00:00
Robert Lougher e32564774c Remove debug location from common tail when tail-merging
The branch folding pass tail merges blocks into a common-tail.  However, the
tail retains the debug information from one of the original inputs to the
merge (chosen randomly).  This is a problem for sampled-based PGO, as hits
on the common-tail will be attributed to whichever block was chosen,
irrespective of which path was actually taken to the common-tail.

This patch fixes the issue by nulling the debug location for the common-tail.

Differential Revision: https://reviews.llvm.org/D25742

llvm-svn: 285093
2016-10-25 18:44:07 +00:00
Michael Kuperstein cffedc4a94 Fix 80-char violations. NFC.
llvm-svn: 285092
2016-10-25 18:31:23 +00:00
Dan Gohman f50d964bdb [WebAssembly] Add immediate fields to call_indirect and memory operators.
call_indirect, grow_memory, and current_memory now have immediate
operands in the 0xd binary encoding.

llvm-svn: 285085
2016-10-25 16:55:52 +00:00
Dehao Chen c1472b5092 Move discriminator assignment to where it is used. (NFC)
llvm-svn: 285084
2016-10-25 16:50:27 +00:00
Andrea Di Biagio 824cabd06d [IndVarSimplify][Dwarf] When widening the IV increment, correctly set the debug loc.
When indvars widened an induction variable, the debug location for the loop
increment computation was incorrectly set equal to the debug loc of the loop
latch terminator.

This patch fixes the issue by propagating the correct location from the
original loop increment instruction to the new widened increment.

Differential Revision: https://reviews.llvm.org/D25872

llvm-svn: 285083
2016-10-25 16:45:17 +00:00
Pavel Labath ec534e6dfe Replace TimeValue by TimePoint in LegacyPassManager. NFC.
llvm-svn: 285081
2016-10-25 16:20:07 +00:00
Geoff Berry 91e9a5cc23 [EarlyCSE] Make MemorySSA memory dependency check more aggressive.
Now that MemorySSA keeps track of whether MemoryUses are optimized, use
getClobberingMemoryAccess() to check MemoryUse memory dependencies since
it should no longer be so expensive.

This is a follow-up change to https://reviews.llvm.org/D25881

llvm-svn: 285080
2016-10-25 16:18:47 +00:00
Sanjay Patel e3de152530 fix formatting; NFC
llvm-svn: 285078
2016-10-25 16:12:31 +00:00
Ulrich Weigand 7bdb485e18 [SystemZ] Do not use LOC(G) for volatile loads
It is not safe to use LOAD ON CONDITION to implement access to a memory
location marked "volatile", since the architecture leaves it unspecified
whether or not an access happens if the condition is false.

The current code already appears to care about that:
  def LOC  : CondUnaryRSY<"loc",  0xEBF2, nonvolatile_load, GR32, 4>;

Unfortunately, that "nonvolatile_load" operator is simply ignored
by the CondUnaryRSY class, and there was no test to catch it.

llvm-svn: 285077
2016-10-25 15:39:15 +00:00
Sanjay Patel d59f7f9047 [InstCombine] add test and code comment to show potentially misguided icmp trunc transform
llvm-svn: 285075
2016-10-25 15:16:39 +00:00
Simon Pilgrim 5c3c9707c3 [X86][SSE] Add support for (V)PMOVSX* constant folding
We already have (V)PMOVZX* combining support, this is the beginning of handling (V)PMOVSX* similarly - other combines in combineVSZext can be generalized in future patches.

This unearthed an interesting bug in that we were generating illegal build vectors on 32-bit targets - it was proving difficult to create a test for it from PMOVZX, but it fired immediately with PMOVSX. I've created a more general form of the existing getConstVector to handle these cases - ideally this should be handled in non-target-specific code but I couldn't find an equivalent.

Differential Revision: https://reviews.llvm.org/D25874

llvm-svn: 285072
2016-10-25 14:29:25 +00:00
Rafael Espindola 20aa1779d0 fix warning
llvm-svn: 285064
2016-10-25 12:28:26 +00:00
Zvi Rackover 124470a202 [DAGCombine] Preserve shuffles when one of the vector operands is constant
Summary:
Do *not* perform combines such as:

    vector_shuffle<4,1,2,3>(build_vector(Ud, C0, C1 C2), scalar_to_vector(X))
    ->
    build_vector(X, C0, C1, C2)

Keeping the shuffle allows lowering the constant build_vector to a materialized
constant vector (such as a vector-load from the constant-pool or some other idiom).

Reviewers: delena, igorb, spatel, mkuper, andreadb, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25524

llvm-svn: 285063
2016-10-25 12:14:19 +00:00
Rafael Espindola 7912110ddc Make the LTO comdat api more symbol table friendly.
In an IR symbol table I would expect the comdats to be represented as:

- A table of strings, one for each comdat name.
- Each symbol has an optional index into that table.

The natural api for accessing that would be

InputFile:
ArrayRef<StringRef> getComdatTable() const;

Symbol:
int getComdatIndex() const;

This patch implements an API as close to that as possible.  The
implementation on top of the current IRObjectFile is a bit hackish,
but should map just fine over a symbol table and is very convenient to
use.

llvm-svn: 285061
2016-10-25 12:02:03 +00:00
Benjamin Kramer 7df3043db3 Fix an unused warning in WebAssemblyInstPrinter with NDEBUG.
Patch by Sam McCall!

Differential Revision: https://reviews.llvm.org/D25934

llvm-svn: 285055
2016-10-25 09:08:50 +00:00
Craig Topper 01e4667e02 [AVX-512] Add support for creating SIGN_EXTEND_VECTOR_INREG and ZERO_EXTEND_VECTOR_INREG for 512-bit vectors to support vpmovzxbq and vpmovsxbq.
Summary: The one tricky thing about this is that the sign/zero_extend_inreg uses v64i8 as an input type which isn't legal without BWI support. Though the vpmovsxbq and vpmovzxbq instructions themselves don't require BWI. To support this we need to add custom lowering for ZERO_EXTEND_VECTOR_INREG with v64i8 input. This can mostly reuse the existing sign extend code with a couple checks for sign extend vs zero extend added.

Reviewers: delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25594

llvm-svn: 285053
2016-10-25 04:00:29 +00:00
Peter Collingbourne 4f3b2df9bb GlobalDCE: Restore a statement accidentally removed in r285048.
llvm-svn: 285052
2016-10-25 02:57:27 +00:00
Matthias Braun c8440dddb2 MachineInstrBundle: Pass iterators to getBundle(Start|End); NFC
This is a function to go backwards in a block to find the first
instruction in a bundle, so iterator is a more natural choice for
parameter/return rather than a reference to a MachineInstruction.

llvm-svn: 285051
2016-10-25 02:55:17 +00:00
Peter Collingbourne ca7664e761 IR: Deduplicate getParent() functions on derived classes of GlobalValue into the base class. NFCI.
llvm-svn: 285050
2016-10-25 02:54:08 +00:00
Kostya Serebryany 3364f90783 [libFuzzer] simplify the code for use_cmp, also use the position hint when available, add a test
llvm-svn: 285049
2016-10-25 02:04:43 +00:00
Peter Collingbourne 7695cb6da8 GlobalDCE: Deduplicate code. NFCI.
llvm-svn: 285048
2016-10-25 01:58:26 +00:00
Dan Gohman 48abaa9c74 [WebAssembly] Reorder load/store operands to match binary encoding.
The p2align operand of a load/store is encoded before the offset
operand; reorder the MachineInstr operands accordingly.

llvm-svn: 285044
2016-10-25 00:17:11 +00:00
Dan Gohman 3acb187d95 [WebAssembly] Implement more WebAssembly binary encoding.
This changes locals from being declared by the emitLocal hook in
WebAssemblyTargetStreamer, rather than with an instruction. After exploring
the infastructure in LLVM more, this seems to make more sense since
declaring locals doesn't use an encoded opcode.

This also adds more 0xd opcodes, type encodings, and miscellaneous
binary encoding bits.

llvm-svn: 285040
2016-10-24 23:27:49 +00:00
Matthias Braun 8b38ffaa98 CodeGen/Passes: Pass MachineFunction as functor arg; NFC
Passing a MachineFunction as argument is more natural and avoids an
unnecessary round-trip through the logic determining the correct
Subtarget because MachineFunction already has a reference anyway.

llvm-svn: 285039
2016-10-24 23:23:02 +00:00
Eli Friedman c5b7262073 Fix regression from my recent GlobalsAA fix.
There are two fixes here: one, AnalyzeUsesOfPointer can't return
false until it has checked all the uses of the pointer. Two, if a
global uses another global, we have to assume the address of the
first global escapes.

Fixes https://llvm.org/bugs/show_bug.cgi?id=30707 .

Differential Revision: https://reviews.llvm.org/D25798

llvm-svn: 285034
2016-10-24 21:47:44 +00:00
Simon Pilgrim e3e6585c2d [SelectionDAG] Update ComputeNumSignBits SRA/SHL handlers to accept scalar or vector splats
Use isConstOrConstSplat helper.

Also use APInt instead of getZExtValue directly to avoid out of range issues.

llvm-svn: 285033
2016-10-24 21:47:19 +00:00
Matthias Braun fc371558a0 Use MachineInstr::mop_iterator instead of MIOperands; NFC
(Const)?MIOperands is equivalent to the C++ style
MachineInstr::mop_iterator. Use the latter for consistency except for a
few callers of MIOperands::analyzePhysReg().

llvm-svn: 285029
2016-10-24 21:36:43 +00:00
Kevin Enderby 79d6c63f61 nother additional error check for an invalid Mach-O file
when contained in a Mach-O universal file and the
cputypes in both headers don’t match.

llvm-svn: 285026
2016-10-24 21:15:11 +00:00
Simon Pilgrim d8ec09c74f Use SDValue::getConstantOperandVal() helper. NFCI.
llvm-svn: 285025
2016-10-24 20:56:52 +00:00
Dan Gohman 5d3391f859 [WebAssembly] Fix a broken URL.
llvm-svn: 285017
2016-10-24 20:35:17 +00:00
Dan Gohman 4becc58587 [WebAssembly] Define the `end` opcode value.
CFGStackify differentiates between END_LOOP and END_BLOCK, but wasm
itself doesn't. For now, just use the same opcode for both.

llvm-svn: 285016
2016-10-24 20:32:04 +00:00