Commit Graph

42307 Commits

Author SHA1 Message Date
Simon Pilgrim 027bb453d9 [X86][SSE] Add support for combining ANDNP byte masks with target shuffles
llvm-svn: 293178
2017-01-26 14:31:12 +00:00
Daniil Fukalov b09dac59fc [SCEV] Introduce add operation inlining limit
Inlining in getAddExpr() can cause abnormal computational time in some cases.
New parameter -scev-addops-inline-threshold is intruduced with default value 500.

Reviewers: sanjoy

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D28812

llvm-svn: 293176
2017-01-26 13:33:17 +00:00
Valery Pykhtin 75d1de903f [AMDGPU] Fix typo in GCNSchedStrategy
Differential revision: https://reviews.llvm.org/D28980

llvm-svn: 293171
2017-01-26 10:51:47 +00:00
Simon Dardis 5b67a4f75f Revert "[mips] N64 static relocation model support"
This reverts commit r293164. There are multiple tests failing.

llvm-svn: 293170
2017-01-26 10:46:07 +00:00
Chandler Carruth 6f4ed077d0 [LV] Fix an issue where forming LCSSA in the place that we did would
change the set of uniform instructions in the loop causing an assert
failure.

The problem is that the legalization checking also builds data
structures mapping various facts about the loop body. The immediate
cause was the set of uniform instructions. If these then change when
LCSSA is formed, the data structures would already have been built and
become stale. The included test case triggered an assert in loop
vectorize that was reduced out of the new PM's pipeline.

The solution is to form LCSSA early enough that no information is cached
across the changes made. The only really obvious position is outside of
the main logic to vectorize the loop. This also has the advantage of
removing one case where forming LCSSA could mutate the loop but we
wouldn't track that as a "Changed" state.

If it is significantly advantageous to do some legalization checking
prior to this, we can do a more careful positioning but it seemed best
to just back off to a safe position first.

llvm-svn: 293168
2017-01-26 10:41:09 +00:00
Simon Dardis 09e65efd09 [mips] N64 static relocation model support
This patch makes one change to GOT handling and two changes to N64's
relocation model handling. Furthermore, the jumptable encodings have
been corrected for static N64.

Big GOT handling is now done via a new SDNode MipsGotHi - this node is
unconditionally lowered to an lui instruction.

The first change to N64's relocation handling is the lifting of the
restriction that N64 always uses PIC. Now it is possible to target static
environments.

The second change adds support for 64 bit symbols and enables them by
default. Previously N64 had patterns for sym32 mode only. In this mode all
symbols are assumed to have 32 bit addresses. sym32 mode support
is selectable with attribute 'sym32'. A follow on patch for clang will
add the necessary frontend parameter.

This partially resolves PR/23485.

Thanks to Brooks Davis for reporting the issue!

Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris

Differential Revision: https://reviews.llvm.org/D23652

llvm-svn: 293164
2017-01-26 10:19:02 +00:00
Diana Picus 278c722e6d [ARM] GlobalISel: Load i1, i8 and i16 args from stack
Add support for loading i1, i8 and i16 arguments from the stack, with or without
the ABI extension flags.

When the ABI extension flags are present, we load a 4-byte value, otherwise we
preserve the size of the load and let the instruction selector replace it with a
LDRB/LDRH. This generates the same thing as DAGISel.

Differential Revision: https://reviews.llvm.org/D27803

llvm-svn: 293163
2017-01-26 09:20:47 +00:00
Alexey Bataev 7a7510ea97 [SLP] Add one more reduction operation for extra argument test to make
it vectorizable.

llvm-svn: 293162
2017-01-26 09:18:41 +00:00
Alexey Bataev 7046a852b3 [SLP] Fixed test for extra arguments in horizontal reductions.
llvm-svn: 293153
2017-01-26 06:19:52 +00:00
Peter Collingbourne 1729133fb1 gold-plugin: Fix test case.
llvm-svn: 293137
2017-01-26 02:15:08 +00:00
Chandler Carruth eab3b90a14 [PM] Simplify the new PM interface to the loop unroller and expose two
factory functions for the two modes the loop unroller is actually used
in in-tree: simplified full-unrolling and the entire thing including
partial unrolling.

I've also wired these up to nice names so you can express both of these
being in a pipeline easily. This is a precursor to actually enabling
these parts of the O2 pipeline.

Differential Revision: https://reviews.llvm.org/D28897

llvm-svn: 293136
2017-01-26 02:13:50 +00:00
Peter Collingbourne 6201d78653 gold-plugin: Simplify naming of object files created with save-temps or obj-path.
Now we never append a number to the file name for task ID 0.

Differential Revision: https://reviews.llvm.org/D29160

llvm-svn: 293132
2017-01-26 02:07:05 +00:00
Matt Arsenault 53f0cc238c AMDGPU: Fold fneg into round instructions
llvm-svn: 293127
2017-01-26 01:25:36 +00:00
Sanjoy Das c38a74d886 [ImplicitNullChecks] Add a test demonstrating a case we don't get today
llvm-svn: 293126
2017-01-26 01:07:33 +00:00
Michael Kuperstein 5dd55e8405 [LoopUnroll] Properly update loopinfo for runtime unrolling by 2
Even when we don't create a remainder loop (that is, when we unroll by 2), we
may duplicate nested loops into the remainder. This is complicated by the fact
the remainder may itself be either inserted into an outer loop, or at the top
level. In the latter case, we may need to create new top-level loops.

Differential Revision: https://reviews.llvm.org/D29156

llvm-svn: 293124
2017-01-26 01:04:11 +00:00
Davide Italiano ccbbc8313f [NewGVN] Skip uses in unreachable blocks.
Otherwise we ask for a domtree node that's not there, and we crash.

Differential Revision:  https://reviews.llvm.org/D29145

llvm-svn: 293122
2017-01-26 00:42:42 +00:00
Adam Nemet 916923e689 [llc] Add -pass-remarks-output
This is the opt/llc counterpart of -fsave-optimization-record to output
optimization remarks in a YAML file.

llvm-svn: 293121
2017-01-26 00:39:51 +00:00
Peter Collingbourne 1df6e858ef LowerTypeTests: Ignore external globals with type metadata.
Thanks to Davide Italiano for finding the problem and providing a test case.

llvm-svn: 293119
2017-01-26 00:32:15 +00:00
Justin Lebar 7e3184c412 [ValueTracking] Implement SignBitMustBeZero correctly for sqrt.
Summary:
Previously we assumed that the result of sqrt(x) always had 0 as its
sign bit.  But sqrt(-0) == -0.

Reviewers: hfinkel, efriedma, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28928

llvm-svn: 293115
2017-01-26 00:10:26 +00:00
Kevin Enderby a31f9dd69a Change the test added in r293099 so it does not have the string "llvm-nm" to fix
the clang-x86-windows-msvc2015 bot as the name is "llvm-nm.EXE" in that case.

llvm-svn: 293114
2017-01-25 23:57:32 +00:00
Adam Nemet 2ada300821 [llc] Add -pass-remarks-with-hotness
Analogous to the code in opt, this enables hotness in opt-remarks.

llvm-svn: 293113
2017-01-25 23:55:59 +00:00
Adam Nemet a964066705 New OptimizationRemarkEmitter pass for MIR
This allows MIR passes to emit optimization remarks with the same level
of functionality that is available to IR passes.

It also hooks up the greedy register allocator to report spills.  This
allows for interesting use cases like increasing interleaving on a loop
until spilling of registers is observed.

I still need to experiment whether reporting every spill scales but this
demonstrates for now that the functionality works from llc
using -pass-remarks*=<pass>.

Differential Revision: https://reviews.llvm.org/D29004

llvm-svn: 293110
2017-01-25 23:20:33 +00:00
Kevin Enderby 31e8530063 Add a warning when the llvm-nm -print-size flag is used on a Mach-O file as
Mach-O files don’t have size information about the symbols in the object file
format unlike ELF.

Also add the part of the fix to llvm-nm that was missed with r290001 so
-arch armv7m works.

rdar://25681018

llvm-svn: 293099
2017-01-25 21:33:38 +00:00
Daniel Jasper 65144c852d Revert "[PPC] Give unaligned memory access lower cost on processor that supports it"
This reverts commit r292680. It is causing significantly worse
performance and test timeouts in our internal builds. I have already
routed reproduction instructions your way.

llvm-svn: 293092
2017-01-25 21:21:08 +00:00
Zachary Turner 840dee30d3 [pdb] Fix failing test
llvm-svn: 293091
2017-01-25 21:21:02 +00:00
Zachary Turner 29da5db7a0 [pdb] Correctly parse the hash adjusters table from TPI stream.
This is not a list of pairs, it is a hash table data structure. We now
correctly parse this out and dump it from llvm-pdbdump.

We still need to understand the conditions that lead to a type
getting an entry in the hash adjuster table.  That will be done
in a followup investigation / patch.

Differential Revision: https://reviews.llvm.org/D29090

llvm-svn: 293090
2017-01-25 21:17:40 +00:00
Tim Northover 470f070b7d SDag: fix how initial loads are formed when splitting vector ops.
Later code expects the vector loads produced to be directly
concatenable, which means we shouldn't pad anything except the last load
produced with UNDEF.

llvm-svn: 293088
2017-01-25 20:58:26 +00:00
Matt Arsenault 5d9101941f AMDGPU: Set call_convention bit in kernel_code_t
According to the documentation this is supposed to be -1
if indirect calls are not supported.

llvm-svn: 293081
2017-01-25 20:21:57 +00:00
Serge Rogatch bc2d34394d [XRay][AArch64] More staging for tail call support in XRay on AArch64 - in LLVM
Summary:
This patch prepares more for tail call support in XRay. Until the logging part supports tail calls, this is just staging, so it seems LLVM part is mostly ready with this patch.
Related: https://reviews.llvm.org/D28948 (compiler-rt)

Reviewers: dberris, rengolin

Reviewed By: dberris

Subscribers: llvm-commits, iid_iunknown, aemerson

Differential Revision: https://reviews.llvm.org/D28947

llvm-svn: 293080
2017-01-25 20:21:49 +00:00
Alexey Bataev 1da8ba2adc [SLP] Extra test for functionality with extra args.
llvm-svn: 293076
2017-01-25 17:24:31 +00:00
Chad Rosier 4f724dce42 Revert "Do not verify dominator tree if it has no roots"
This reverts commit r293033, per Danny's comment.  In short, we require
domtrees to have roots at all times.

llvm-svn: 293075
2017-01-25 17:15:48 +00:00
Artur Pilipenko 8fb3d57e67 [Guards] Introduce loop-predication pass
This patch introduces guard based loop predication optimization. The new LoopPredication pass tries to convert loop variant range checks to loop invariant by widening checks across loop iterations. For example, it will convert

  for (i = 0; i < n; i++) {
    guard(i < len);
    ...
  }

to

  for (i = 0; i < n; i++) {
    guard(n - 1 < len);
    ...
  }

After this transformation the condition of the guard is loop invariant, so loop-unswitch can later unswitch the loop by this condition which basically predicates the loop by the widened condition:

  if (n - 1 < len)
    for (i = 0; i < n; i++) {
      ...
    } 
  else
    deoptimize

This patch relies on an NFC change to make ScalarEvolution::isMonotonicPredicate public (revision 293062).

Reviewed By: sanjoy

Differential Revision: https://reviews.llvm.org/D29034

llvm-svn: 293064
2017-01-25 16:00:44 +00:00
Artur Pilipenko b85f7a5d99 [InstCombine] Canonicalize guards for NOT OR condition
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine

Reviewed By: apilipenko

Differential Revision: https://reviews.llvm.org/D29075

Patch by Maxim Kazantsev.

llvm-svn: 293061
2017-01-25 14:45:12 +00:00
Simon Pilgrim 6f6b279109 [InstCombine][SSE] Add support for PACKSS/PACKUS constant folding
Differential Revision: https://reviews.llvm.org/D28949

llvm-svn: 293060
2017-01-25 14:37:24 +00:00
Artur Pilipenko 4df4c4a4aa [InstCombine] Canonicalize guards for AND condition
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine

Reviewed By: apilipenko

Differential Revision: https://reviews.llvm.org/D29074

Patch by Maxim Kazantsev.

llvm-svn: 293058
2017-01-25 14:20:52 +00:00
Artur Pilipenko e812ca00bb [InstCombine] Allow InstrCombine to remove one of adjacent guards if they are equivalent
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine

Reviewed By: majnemer, apilipenko

Differential Revision: https://reviews.llvm.org/D29071

Patch by Maxim Kazantsev.

llvm-svn: 293056
2017-01-25 14:12:12 +00:00
Alexey Bataev d28ab559a7 [SLP] Improve horizontal vectorization for non-power-of-2 number of
instructions.

If number of instructions in horizontal reduction list is not power of 2
then only PowerOf2Floor(NumberOfInstructions) last elements are actually
vectorized, other instructions remain scalar. Patch tries to vectorize
the remaining elements either.

Differential Revision: https://reviews.llvm.org/D28959

llvm-svn: 293042
2017-01-25 09:54:38 +00:00
whitequark 16f1e5f1ca Mark @llvm.powi.* as safe to speculatively execute.
Floating point intrinsics in LLVM are generally not speculatively
executed, since most of them are defined to behave the same as libm
functions, which set errno.

However, the @llvm.powi.* intrinsics do not correspond to any libm
function, and lacks any defined error handling semantics in LangRef.
It most certainly does not alter errno.

llvm-svn: 293041
2017-01-25 09:32:30 +00:00
Mohammed Agabaria 20caee95e1 [X86] enable memory interleaving for X86\SLM arch.
Differential Revision: https://reviews.llvm.org/D28547

llvm-svn: 293040
2017-01-25 09:14:48 +00:00
Artur Pilipenko 41c0005aa3 [DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2.
The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html

This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used.

From the original commit:

Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: RKSimon, filcab, chandlerc 

Differential Revision: https://reviews.llvm.org/D27861

llvm-svn: 293036
2017-01-25 08:53:31 +00:00
Diana Picus d83df5d372 [ARM] GlobalISel: Support i1 add and ABI extensions
Add support for:
* i1 add
* i1 function arguments, if passed through registers
* i1 returns, with ABI signext/zeroext

Differential Revision: https://reviews.llvm.org/D27706

llvm-svn: 293035
2017-01-25 08:47:40 +00:00
Diana Picus 8b6c6bedcb [ARM] GlobalISel: Support i8/i16 ABI extensions
At the moment, this means supporting the signext/zeroext attribute on the return
type of the function. For function arguments, signext/zeroext should be handled
by the caller, so there's nothing for us to do until we start lowering calls.

Note that this does not include support for other extensions (i8 to i16), those
will be added later.

Differential Revision: https://reviews.llvm.org/D27705

llvm-svn: 293034
2017-01-25 08:10:40 +00:00
Serge Pavlov 43a7759f4b Do not verify dominator tree if it has no roots
If dominator tree has no roots, the pass that calculates it is
likely to be skipped. It occures, for instance, in the case of
entities with linkage available_externally. Do not run tree
verification in such case.

Differential Revision: https://reviews.llvm.org/D28767

llvm-svn: 293033
2017-01-25 07:58:10 +00:00
Dean Michael Berris d09bf194fa Implemented color coding and Vertex labels in XRay Graph
Summary:
A patch to enable the llvm-xray graph subcommand to color edges and
vertices based on statistics and to annotate vertices with statistics.

Depends on D27243

Reviewers: dblaikie, dberris

Reviewed By: dberris

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D28225

llvm-svn: 293031
2017-01-25 07:14:43 +00:00
Coby Tayree 77807d93af [X86]Enable the use of 'mov' with a 64bit GPR and a large immediate
Enable the next form (intel style):
"mov <reg64>, <largeImm>"
which is should be available,
where <largeImm> stands for immediates which exceed the range of a singed 32bit integer

Differential Revision: https://reviews.llvm.org/D28988

llvm-svn: 293030
2017-01-25 07:09:42 +00:00
Matt Arsenault 74a576e7d3 AMDGPU: Check nsz instead of unsafe math
llvm-svn: 293028
2017-01-25 06:27:02 +00:00
Akira Hatanaka 4ec7b20ef6 [SimplifyCFG] Do not sink and merge inline-asm instructions.
Conservatively disable sinking and merging inline-asm instructions as doing so
can potentially create arguments that cannot satisfy the inline-asm constraints.

For example, SimplifyCFG used to do the following transformation:

(before)
if.then:
  %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 8)
  br label %if.end
if.else:
  %1 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 6)
  br label %if.end

(after)
  %.sink = select i1 %tobool, i32 6, i32 8
  %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 %.sink)

This would result in a crash in the backend since only immediate integer operands
are permitted for constraint "n".

rdar://problem/30110806

Differential Revision: https://reviews.llvm.org/D29111

llvm-svn: 293025
2017-01-25 06:21:51 +00:00
Matt Arsenault 732a531506 DAG: Recognize no-signed-zeros-fp-math attribute
clang already emits this with -cl-no-signed-zeros, but codegen
doesn't do anything with it. Treat it like the other fast math
attributes, and change one place to use it.

llvm-svn: 293024
2017-01-25 06:08:42 +00:00
NAKAMURA Takumi 83bb048d10 Ignore llvm/test/tools/llvm-symbolizer/coff-exports.test on mingw.
FIXME: Demangler could behave along not host but target.
For example, assume host=mingw, target=msc.

llvm-svn: 293021
2017-01-25 05:26:23 +00:00
Matt Arsenault 8a27aee6ae DAGCombiner: Allow negating ConstantFP after legalize
llvm-svn: 293019
2017-01-25 04:54:34 +00:00