Commit Graph

114763 Commits

Author SHA1 Message Date
Max Kazantsev f5ba37182e [IndVarSimplify] Ignore unreachable users of truncs
If a trunc has a user in a block which is not reachable from entry,
we can safely perform trunc elimination as if this user didn't exist.

llvm-svn: 335816
2018-06-28 08:20:03 +00:00
Petar Jovanovic d175aeb881 [DwarfDebug] Remove unused argument (NFC)
Remove unused ByteStreamer argument from function emitDebugLocValue.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D48590

llvm-svn: 335811
2018-06-28 04:50:40 +00:00
Craig Topper ec5d568ac1 [X86] Use PatFrag with hardcoded numbers for FROUND_NO_EXC/FROUND_CURRENT instead of ImmLeafs with predicates where one of the two numbers was hardcoded.
This more efficient for the isel table generator since we can use CheckChildInteger instead of MoveChild, CheckPredicate, MoveParent. This reduced the table size by 1-2K.

I wish there was a way to share the values with X86BaseInfo.h and still use a PatFrag like this. These numbers are fixed by the X86 intrinsic spec going back many years and we should never need to change them. So we shouldn't waste table bytes to support sharing.

llvm-svn: 335806
2018-06-28 01:45:44 +00:00
Craig Topper ab70f58891 [X86] Change how we prefer shift by immediate over folding a load into a shift.
BMI2 added new shift by register instructions that have the ability to fold a load.

Normally without doing anything special isel would prefer folding a load over folding an immediate because the load folding pattern has higher "complexity". This would require an instruction to move the immediate into a register. We would rather fold the immediate instead and have a separate instruction for the load.

We used to enforce this priority by artificially lowering the complexity of the load pattern.

This patch changes this to instead reject the load fold in isProfitableToFoldLoad if there is an immediate. This is more consistent with other binops and feels less hacky.

llvm-svn: 335804
2018-06-28 00:47:41 +00:00
Michael J. Spencer 98f5475f44 [CGProfile] Fix unused variable warning.
llvm-svn: 335797
2018-06-28 00:12:04 +00:00
Michael J. Spencer 5bf1ead377 Add support for generating a call graph profile from Branch Frequency Info.
=== Generating the CG Profile ===

The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions.  For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.

After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
```
!llvm.module.flags = !{!0}

!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}
```

Differential Revision: https://reviews.llvm.org/D48105

llvm-svn: 335794
2018-06-27 23:58:08 +00:00
Zachary Turner ee8010abe3 Move some code from PDBFileBuilder to MSFBuilder.
The code to emit the pieces of the MSF file were actually in
PDBFileBuilder.  Move this to MSFBuilder so that we can
theoretically emit an MSF without having a PDB file.

llvm-svn: 335789
2018-06-27 21:18:15 +00:00
Benjamin Kramer e214f046af [X86] Make folding table checking threadsafe
This is a benign race, but tsan likes to complain about it. Just make it
happy.

llvm-svn: 335788
2018-06-27 21:01:53 +00:00
Craig Topper 880e34ed45 [X86] In X86DAGToDAGISel::PreprocessISelDAG, make sure we don't access N after we delete it.
If we turn X86ISD::AND into ISD::AND, we delete N. But we were continuing onto the next block of code even though N no longer existed.

Just happened to notice it. I assume asan didn't notice it because we explicitly unpoison deleted nodes and give them a DELETE_NODE opcode.

llvm-svn: 335787
2018-06-27 20:58:46 +00:00
Sameer AbuAsal 9b65ffb097 [RISCV] Add machine function pass to merge base + offset
Summary:
   In r333455 we added a peephole to fix the corner cases that result
   from separating base + offset lowering of global address.The
   peephole didn't handle some of the cases because it only has a basic
   block view instead of a function level view.

   This patch replaces that logic with a machine function pass. In
   addition to handling the original cases it handles uses of the global
   address across blocks in function and folding an offset from LW\SW
   instruction. This pass won't run for OptNone compilation, so there
   will be a negative impact overall vs the old approach at O0.

Reviewers: asb, apazos, mgrang

Reviewed By: asb

Subscribers: MartinMosbeck, brucehoult, the_o, rogfer01, mgorny, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits, edward-jones

Differential Revision: https://reviews.llvm.org/D47857

llvm-svn: 335786
2018-06-27 20:51:42 +00:00
Nirav Dave 7c57ae57a8 [DAGCombine] Disable TokenFactor simplifications when optnone.
llvm-svn: 335773
2018-06-27 19:41:25 +00:00
Fangrui Song b0d57a535b [X86] Fix unmatched parenthesis in r335768
llvm-svn: 335769
2018-06-27 19:12:07 +00:00
Craig Topper 6bea2c7f9b [X86] Teach the disassembler to use %eiz/%riz instead of NoRegister when the SIB byte is present, but doesn't encode an index register and there was another shorter encoding that would achieve the same result.
The %eiz/%riz are dummy registers that force the encoder to emit a SIB byte when it normally wouldn't. By emitting them in the disassembly output we ensure that assembling the disassembler output would also produce a SIB byte.

This should match the behavior of objdump from binutils.

llvm-svn: 335768
2018-06-27 19:03:36 +00:00
Daniel Sanders bdeb880d14 [globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64
Now that we have the ability to legalize based on MMO's. Add support for
legalizing based on AtomicOrdering and use it to correct the legalization
of the atomic instructions.

Also extend all() to be a variadic template as this ruleset now requires
3 and 4 argument versions.

llvm-svn: 335767
2018-06-27 19:03:21 +00:00
Sanjay Patel d052de856d [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc 
can produce -0.0 where the original code does not:

#include <stdio.h>
  
int main(int argc) {
  float x;
  x = -0.8 * argc;
  printf("%f\n", (float)((int)x));
  return 0;
}

$ clang -O0 -mavx fp.c ; ./a.out 
0.000000
$ clang -O1 -mavx fp.c ; ./a.out 
-0.000000

Ideally, we'd use IR/node flags to predicate the transform, but the IR parser 
doesn't currently allow fast-math-flags on the cast instructions. So for now, 
just use the function attribute that corresponds to clang's "-fno-signed-zeros" 
option.

Differential Revision: https://reviews.llvm.org/D48085

llvm-svn: 335761
2018-06-27 18:16:40 +00:00
Teresa Johnson 7e7b13d016 [ThinLTO] Print names in function import debug messages when available
Summary:
Rather than just print the GUID, when it is available in the index,
print the global name as well in the function import thin link debug
messages. Names will be available when the combined index is being
built by the same process, e.g. a linker or "llvm-lto2 run".

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D48612

llvm-svn: 335760
2018-06-27 18:03:39 +00:00
Jessica Paquette f472f6159a [MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across
It isn't safe to outline sequences of instructions where x16/x17/nzcv live
across the sequence.

This teaches the outliner to check whether or not a specific canidate has
x16/x17/nzcv live across it and discard the candidate in the case that that is
true.

https://bugs.llvm.org/show_bug.cgi?id=37573
https://reviews.llvm.org/D47655

llvm-svn: 335758
2018-06-27 17:43:27 +00:00
Craig Topper 812fcb35e7 [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position
If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index.

Fixes PR37938.

llvm-svn: 335754
2018-06-27 16:47:39 +00:00
Jakub Kuderski 555e41bbf2 [AliasSet] Fix UnknownInstructions printing
Summary:
AliasSet::print uses `I->printAsOperand` to print UnknownInstructions. The problem is that not all UnknownInstructions have names (e.g. call instructions). When such instructions are printed, they appear as `<badref>` in AliasSets, which is very confusing, as the values are perfectly valid.

This patch fixes that by printing UnknownInstructions without a name using `print` instead of `printAsOperand`.

Reviewers: asbirlea, chandlerc, sanjoy, grosser

Reviewed By: asbirlea

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48609

llvm-svn: 335751
2018-06-27 16:34:30 +00:00
Craig Topper 31cbe75b3b [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".

llvm-svn: 335744
2018-06-27 15:57:53 +00:00
Stanislav Mekhanoshin 1a1687f1bb [AMDGPU] Convert rcp to rcp_iflag
If a source of rcp instruction is a result of any conversion from
an integer convert it into rcp_iflag instruction. No FP exception
can ever happen except division by zero if a single precision rcp
argument is a representation of an integral number.

Differential Revision: https://reviews.llvm.org/D48569

llvm-svn: 335742
2018-06-27 15:33:33 +00:00
Luke Geeson 316327150b [AArch64] Reverting FP16 vcvth_n_s64_f16 to fix
llvm-svn: 335737
2018-06-27 14:34:40 +00:00
Adhemerval Zanella cadcfed7aa [AArch64] Add custom lowering for v4i8 trunc store
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735
2018-06-27 13:58:46 +00:00
Ivan A. Kosarev 7231598fce [NEON] Support vldNq intrinsics in AArch32 (LLVM part)
This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.

Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.

The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.

This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.

Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.

That's what we generate with this patch applied:

vld2q_dup_f16:
  vld2.16 {d0[], d2[]}, [r0]
  vld2.16 {d1[], d3[]}, [r0]

vld3q_dup_f16:
  vld3.16 {d0[], d2[], d4[]}, [r0]
  vld3.16 {d1[], d3[], d5[]}, [r0]

vld4q_dup_f16:
  vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
  vld4.16 {d1[], d3[], d5[], d7[]}, [r0]

Differential Revision: https://reviews.llvm.org/D48439

llvm-svn: 335733
2018-06-27 13:57:52 +00:00
Simon Pilgrim d3e583a52d [DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in pow2 expansion
For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs

llvm-svn: 335727
2018-06-27 12:45:31 +00:00
Simon Pilgrim e835f662fa [DAGCombiner] visitSDIV - simplify pow2 handling. NFCI.
Use the builtin constant folding of getNode() etc. instead of doing it manually.

llvm-svn: 335720
2018-06-27 10:51:55 +00:00
Simon Pilgrim dfbcc66adc [DAGCombiner] Fold SDIV(%X, MIN_SIGNED) -> SELECT(%X == MIN_SIGNED, 1, 0)
Fixes PR37569.

llvm-svn: 335719
2018-06-27 10:21:06 +00:00
Simon Pilgrim 0a566bc0ae [DAGCombiner] Don't accept signbit sdiv divisors in sdiv-by-pow2 vector expansion (PR37569)
llvm-svn: 335717
2018-06-27 09:41:22 +00:00
Luke Geeson 68cb233c0f [AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns
llvm-svn: 335715
2018-06-27 09:20:13 +00:00
Konstantin Zhuravlyov 30f03b3bc0 AMDGPU/NFC: Fix typo in comment
llvm-svn: 335707
2018-06-27 05:36:03 +00:00
Vedant Kumar f6c0b41fb7 [InstCombine] Avoid creating mis-sized dbg.values in commonCastTransforms()
This prevents InstCombine from creating mis-sized dbg.values when
replacing a sequence of casts with a simpler cast. For example, in:

  (fptrunc (floor (fpext X))) -> (floorf X)

We no longer emit dbg.value(X) (with a 32-bit float operand) to describe
(fpext X) (which is a 64-bit float).

This was diagnosed by the debugify check added in r335682.

llvm-svn: 335696
2018-06-27 00:47:53 +00:00
Craig Topper 33aba0eb4c [X86] Don't store register and memory FMA3 opcodes in the same X86InstrFMA3Group.
Nothing was using this relationship. By splitting them we no longer need to worry about register or memory entries being empty in a group.

The memory folding tables in X86InstrInfo.cpp can be used to access this relationship if needed.

llvm-svn: 335694
2018-06-27 00:42:24 +00:00
Evgeniy Stepanov 289a7d4c7d Revert "[asan] Instrument comdat globals on COFF targets"
Causes false positive ODR violation reports on __llvm_profile_raw_version.

llvm-svn: 335681
2018-06-26 22:43:48 +00:00
Lang Hames 2f17824463 [ORC] Don't call isa<> on a null value.
This should fix the recent builder failures in the test-global-ctors.ll testcase.

llvm-svn: 335680
2018-06-26 22:43:01 +00:00
Lang Hames 8f9dbb1d64 [ORC] Fix a missing return value.
llvm-svn: 335677
2018-06-26 22:30:42 +00:00
Michael Zolotukhin d3b8bdef01 [JumpThreading] Don't try to rewrite a use if it's already valid.
Summary:
When recording uses we need to rewrite after cloning a loop we need to
check if the use is not dominated by the original def. The initial
assumption was that the cloned basic block will introduce a new path and
thus the original def will only dominate the use if they are in the same
BB, but as the reproducer from PR37745 shows it's not always the case.

This fixes PR37745.

Reviewers: haicheng, Ka-Ka

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48111

llvm-svn: 335675
2018-06-26 22:19:48 +00:00
Lang Hames bf7b532cbc [ORC] Add a dependence on MC to LLVMBuild.txt
llvm-svn: 335673
2018-06-26 22:12:02 +00:00
Lang Hames 6a94134b11 [ORC] Add LLJIT and LLLazyJIT, and replace OrcLazyJIT in LLI with LLLazyJIT.
LLJIT is a prefabricated ORC based JIT class that is meant to be the go-to
replacement for MCJIT. Unlike OrcMCJITReplacement (which will continue to be
supported) it is not API or bug-for-bug compatible, but targets the same
use cases: Simple, non-lazy compilation and execution of LLVM IR.

LLLazyJIT extends LLJIT with support for function-at-a-time lazy compilation,
similar to what was provided by LLVM's original (now long deprecated) JIT APIs.

This commit also contains some simple utility classes (CtorDtorRunner2,
LocalCXXRuntimeOverrides2, JITTargetMachineBuilder) to support LLJIT and
LLLazyJIT.

Both of these classes are works in progress. Feedback from JIT clients is very
welcome!

llvm-svn: 335670
2018-06-26 21:35:48 +00:00
Konstantin Zhuravlyov 777477705a AMDGPU: Silence unused warnings in waitcnt insertion pass in release build
Differential Revision: https://reviews.llvm.org/D48607

llvm-svn: 335669
2018-06-26 21:33:38 +00:00
Jessica Paquette 67599c2e1e [X86][AsmParser] Recommit r335658
Recommit of r335658 so that it does not change the behaviour of any
existing error output.

llvm-svn: 335668
2018-06-26 21:30:34 +00:00
Vedant Kumar 1cb63dc2d5 Rename skipDebugInfo -> skipDebugIntrinsics, NFC
This addresses post-commit feedback about the name 'skipDebugInfo' being
misleading. This name could be interpreted as meaning 'a function that
skips instructions with debug locations'.

The new name, 'skipDebugIntrinsics', makes it clear that this function
only skips debug info intrinsics.

Thanks to Adrian Prantl for pointing this out!

llvm-svn: 335667
2018-06-26 21:16:59 +00:00
Lang Hames 2795a0a06e [ORC] Reset AsynchronousSymbolQuery's NotifySymbolsResolved callback on error.
AsynchronousSymbolQuery::canStillFail checks the value of the callback to
prevent sending it redundant error notifications, so we need to reset it after
running it.

llvm-svn: 335664
2018-06-26 20:59:50 +00:00
Lang Hames 831c575829 [ORC] Move the VSOList typedef out of VSO.
llvm-svn: 335663
2018-06-26 20:59:49 +00:00
Lang Hames ec8f5c8e5a [ORC] Fix a FIXME by moving MangleAndInterner to Core.h.
llvm-svn: 335661
2018-06-26 20:59:46 +00:00
Jessica Paquette 0a80af0761 Revert "[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode"
This reverts commit 4850a9aae8b38c7deadc103d634ec7397e6c323b.

It caused MC/X86/x86_errors.s to fail. Will fix and recommit shortly.

llvm-svn: 335660
2018-06-26 20:57:19 +00:00
Jessica Paquette 0e40d4bfc3 [X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode
Right now, when we use RIP-relative instructions in 32-bit mode, we'll just
assert and crash.

This adds an error message which tells the user that they can't do that in
32-bit mode, so that we don't crash (and also can see the issue outside of
assert builds).

llvm-svn: 335658
2018-06-26 20:33:46 +00:00
Stanislav Mekhanoshin dacda79ee6 [AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic
This intrinsic selects v_mad_f32 regardless of fp32 denorm support.

Differential Revision: https://reviews.llvm.org/D48573

llvm-svn: 335654
2018-06-26 20:04:19 +00:00
Sanjay Patel fb9c440ba5 [DAGCombiner] use isBitwiseNot to simplify code; NFC
llvm-svn: 335652
2018-06-26 19:46:56 +00:00
Matt Arsenault 8c4a35237a AMDGPU: Add pass to lower kernel arguments to loads
This replaces most argument uses with loads, but for
now not all.

The code in SelectionDAG for calling convention lowering
is actively harmful for amdgpu_kernel. It attempts to
split the argument types into register legal types, which
results in low quality code for arbitary types. Since
all kernel arguments are passed in memory, we just want the
raw types.

I've tried a couple of methods of mitigating this in SelectionDAG,
but it's easier to just bypass this problem alltogether. It's
possible to hack around the problem in the initial lowering,
but the real problem is the DAG then expects to be able to use
CopyToReg/CopyFromReg for uses of the arguments outside the block.

Exposing the argument loads in the IR also has the advantage
that the LoadStoreVectorizer can merge them.

I'm not sure the best approach to dealing with the IR
argument list is. The patch as-is just leaves the IR arguments
in place, so all the existing code will still compute the same
kernarg size and pointlessly lowers the arguments.

Arguably the frontend should emit kernels with an empty argument
list in the first place. Alternatively a dummy array could be
inserted as a single argument just to reserve space.

This does have some disadvantages. Local pointer kernel arguments can
no longer have AssertZext placed  on them as the equivalent !range
metadata is not valid on pointer  typed loads. This is mostly bad
for SI which needs to know about the known bits in order to use the
DS instruction offset, so in this case this is not done.

More importantly, this skips noalias arguments since this pass
does not yet convert this to the equivalent !alias.scope and !noalias
metadata. Producing this metadata correctly seems to be tricky,
although this logically is the same as inlining into a function which
doesn't exist. Additionally, exposing these loads to the vectorizer
may result in degraded aliasing information if a pointer load is
merged with another argument load.

I'm also not entirely sure this is preserving the current clover
ABI, although I would greatly prefer if it would stop widening
arguments and match the HSA ABI. As-is I think it is extending
< 4-byte arguments to 4-bytes but doesn't align them to 4-bytes.

llvm-svn: 335650
2018-06-26 19:10:00 +00:00
Matt Arsenault 7e991d30c0 ConstantFold: Don't fold global address vs. null for addrspace != 0
Not sure why this logic seems to be repeated in 2 different places,
one called by the other.

On AMDGPU addrspace(3) globals start allocating at 0, so these
checks will be incorrect (not that real code actually tries
to compare these addresses)

llvm-svn: 335649
2018-06-26 18:55:43 +00:00
Vedant Kumar 78ff0f1b83 Use a variable to appease a no-asserts bot, NFC
Failure URL:
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/22836

llvm-svn: 335648
2018-06-26 18:55:26 +00:00
Tim Shen b32823cbe9 [ConstantRange] Add support of mul in makeGuaranteedNoWrapRegion.
Summary: This is trying to add support for r334428.

Reviewers: sanjoy

Subscribers: jlebar, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D48399

llvm-svn: 335646
2018-06-26 18:54:10 +00:00
Matt Arsenault 2c1a570aab LoopUnroll: Allow analyzing intrinsic call costs
I'm not sure why the code here is skipping calls since
TTI does try to do something for general calls, but it
at least should allow intrinsics.

Skip intrinsics that should not be omitted as calls, which
is by far the most common case on AMDGPU.

llvm-svn: 335645
2018-06-26 18:51:17 +00:00
Vedant Kumar c85ca4cdab [Local] Add a convenient insertReplacementDbgValues overload, NFC
Add an overload for the common case where the replacement dbg.values
have the same DIExpressions as the originals.

llvm-svn: 335643
2018-06-26 18:44:53 +00:00
Vedant Kumar de46f65bbd [Local] Sink salvageDI's early exit into helper functions, NFC
salvageDebugInfo() performs a check that allows it to exit early without
doing a DenseMap lookup. It's a bit neater and marginally more useful to
sink this early exit into the findDbg{Addr,Users,Values} helpers.

llvm-svn: 335642
2018-06-26 18:44:52 +00:00
Brendon Cahoon b7169c435a [Hexagon] Add a "generic" cpu
Add the generic processor for Hexagon so that it can be used
with 3rd party programs that create a back-end with the
"generic" CPU. This patch also enables the JIT for Hexagon.

Differential Revision: https://reviews.llvm.org/D48571

llvm-svn: 335641
2018-06-26 18:44:05 +00:00
Simon Pilgrim 7f55af37f4 [DAGCombiner] Don't accept -1 sdiv divisors in sdiv-by-pow2 vector expansion (PR37119)
Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported.

llvm-svn: 335637
2018-06-26 17:46:51 +00:00
Sanjay Patel ad0bfb844d [InstSimplify] fold shifts by sext bool
https://rise4fun.com/Alive/c3Y

llvm-svn: 335633
2018-06-26 17:31:38 +00:00
Sanjay Patel 9adea01c9f [InstCombine] simplify code for urem fold; NFCI
llvm-svn: 335623
2018-06-26 16:39:29 +00:00
Sanjay Patel 3575f0c0b3 [InstCombine] fold urem with sext bool divisor
Similar to other patches in this series:
https://reviews.llvm.org/rL335512
https://reviews.llvm.org/rL335527
https://reviews.llvm.org/rL335597
https://reviews.llvm.org/rL335616

...this is filling a gap in analysis that is exposed by an unrelated select-of-constants transform.
I didn't see a way to unify the sext cases because each div/rem opcode results in a different fold.

Note that in this case, the backend might want to convert the select into math:
Name: sext urem
%e = sext i1 %x to i32
%r = urem i32 %y, %e
=>
%c = icmp eq i32 %y, -1
%z = zext i1 %c to i32
%r = add i32 %z, %y

llvm-svn: 335622
2018-06-26 16:30:00 +00:00
Simon Pilgrim bbfc18b5b5 [SLPVectorizer] Recognise non uniform power of 2 constants
Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them.

As SLP works with arrays of values I don't think we can easily use the pattern match helpers here.

Differential Revision: https://reviews.llvm.org/D48214

llvm-svn: 335621
2018-06-26 16:20:16 +00:00
Simon Pilgrim 133b1cdf08 [DAGCombiner] Pull out VT bitwidth in visitSDIV. NFCI.
llvm-svn: 335617
2018-06-26 15:39:16 +00:00
Sanjay Patel 2b7e31095d [InstSimplify] fold srem with sext bool divisor
llvm-svn: 335616
2018-06-26 15:32:54 +00:00
Krzysztof Parzyszek 9f199ebec0 Silence "unused variable" warning in LiveIntervals.cpp after r335607
llvm-svn: 335610
2018-06-26 14:55:04 +00:00
Krzysztof Parzyszek 70f027022c Account for undef values from predecessors in extendSegmentsToUses
It is legal for a PHI node not to have a live value in a predecessor
as long as the end of the predecessor is jointly dominated by an undef
value.

llvm-svn: 335607
2018-06-26 14:37:16 +00:00
Simon Pilgrim aa2bf2be31 [TargetLowering] isVectorClearMaskLegal - use ArrayRef<int> instead of const SmallVectorImpl<int>&
This is more generic and matches isShuffleMaskLegal.

Differential Revision: https://reviews.llvm.org/D48591

llvm-svn: 335605
2018-06-26 14:15:31 +00:00
Than McIntosh 3190993a02 [X86,ARM] Retain split-stack prolog check for sibling calls
Summary:
If a routine with no stack frame makes a sibling call, we need to
preserve the stack space check even if the local stack frame is empty,
since the call target could be a "no-split" function (in which case
the linker needs to be able to fix up the prolog sequence in order to
switch to a larger stack).

This fixes PR37807.

Reviewers: cherry, javed.absar

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D48444

llvm-svn: 335604
2018-06-26 14:11:30 +00:00
Simon Pilgrim cfe2f9d4d2 Fix spelling mistakes in comments. NFCI.
llvm-svn: 335603
2018-06-26 14:06:23 +00:00
Teresa Johnson 63ee0e73e4 [ThinLTO] Parse module summary index from assembly
Summary:
Adds assembly parsing support for the module summary index (follow on
to r333335 which added the assembly writing support).

I added support to llvm-as to invoke the index parsing, so that it can
create either a bitcode file with a Module and a per-module index, or
a combined index without a Module.

I will send follow on patches soon to do the following:
- add support to tools such as llvm-lto2 to parse the per-module indexes
from assembly instead of bitcode when testing the thin link.
- verification support.

Depends on D47844 and D47842.

Reviewers: pcc, dexonsmith, mehdi_amini

Subscribers: inglorion, eraman, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D47905

llvm-svn: 335602
2018-06-26 13:56:49 +00:00
Sanjay Patel 7c45debaea [InstCombine] fold udiv with sext bool divisor
Note: I didn't add a hasOneUse() check because the existing,
related fold doesn't have that check. I suspect that the
improved analysis and codegen make these some of the rare
canonicalization cases where we allow an increase in
instructions.

llvm-svn: 335597
2018-06-26 12:41:15 +00:00
Tim Northover b73efb85ba ARM: correctly decode VFP instructions following unpredictable t2IT
When the condition code for an IT instruction is "AL" we get strange "15"
predicates on subsequent instructions. These are dealt with for most
instructions by treating them as "ARMCC::AL", but VFP takes a different path
which didn't have this code.

llvm-svn: 335594
2018-06-26 11:39:20 +00:00
Tim Northover bf54858115 ARM: diagnose unpredictable IT instructions
IT instructions are allowed to have the 'AL' predicate, but it must never
result in an 'NV' predicated instruction. Essentially this means that all
branches must be 't' rather than 'e' if the predicate is 'AL'.

This patch adds a diagnostic for this during assembly (error because parsing
hits an assertion if allowed to continue) and an annotation during disassembly.

llvm-svn: 335593
2018-06-26 11:38:41 +00:00
Simon Pilgrim bfaa09220b [X86] Just use ArrayRef instead of SmallVectorImpl in a few static method arguments. NFCI.
llvm-svn: 335590
2018-06-26 10:45:41 +00:00
Florian Hahn 4a69b0bb36 [IPSCCP] Change dead blocks to unreachable after visiting all executable blocks.
changeToUnreachable may remove PHI nodes from executable blocks we found values
for and we would fail to replace them. By changing dead blocks to unreachable after
we replaced constants in all executable blocks, we ensure such PHI nodes are replaced
by their known value before.

Fixes PR37780.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D48421

llvm-svn: 335588
2018-06-26 10:15:02 +00:00
Simon Pilgrim dcf5bd271f Fix MSVC "signed/unsigned mismatch" warning. NFCI.
llvm-svn: 335587
2018-06-26 10:02:12 +00:00
Simon Pilgrim 9b3b0fe763 Fix MSVC "not all control paths return a value" warnings. NFCI.
llvm-svn: 335584
2018-06-26 09:31:18 +00:00
Bjorn Pettersson 550517bcab Improve ConvertDebugDeclareToDebugValue
Summary:
This is a follow-up to r334830 and r335031.

In the valueCoversEntireFragment check we now also handle
the situation when there is a variable length array (VLA)
involved, and the length of the array has been reduced to
a constant.

The ConvertDebugDeclareToDebugValue functions that are related
to PHI nodes and load instructions now avoid inserting dbg.value
intrinsics when the value does not, for certain, cover the
variable/fragment that should be described.
In r334830 we assumed that the value always covered the entire
var/fragment and we had assertions in the code to show that
assumption. However, those asserts failed when compiling code
with VLAs, so we removed the asserts in r335031. Now when we
know that the valueCoversEntireFragment check can fail also for
PHI/Load instructions we avoid to insert the faulty dbg.value
intrinsic in such situations. Compared to the Store instruction
scenario we simply drop the dbg.value here (as the variable does
not change its value due to PHI/Load, so an earlier dbg.value
describing the variable should still be valid).

Reviewers: aprantl, vsk, efriedma

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48547

llvm-svn: 335580
2018-06-26 06:17:00 +00:00
Gil Rapaport da2e2caa6c [InstCombine] (A + 1) + (B ^ -1) --> A - B
Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B).
This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B.

Differential Revision: https://reviews.llvm.org/D48535

llvm-svn: 335579
2018-06-26 05:31:18 +00:00
Craig Topper 08dae1682d [X86] Don't use getScalarShiftAmountTy to get the immediate type for target specific VSHLDQ/VSRLDQ nodes.
These opcodes have a fixed type of i8 for their immediate and shouldn't have anything to do with the scalar shift amount used by target independent shift nodes.

llvm-svn: 335578
2018-06-26 04:53:42 +00:00
Dan Gohman 910ba33d0c [WebAssembly] Fix lowering of varargs functions with non-legal fixed arguments.
CallLoweringInfo's NumFixedArgs field gives the number of fixed arguments
before legalization. The ISD::OutputArg "Outs" array holds legalized
arguments, so when indexing into it to find the non-fixed arguemn, we need
to use the number of arguments after legalization.

Fixes PR37934.

llvm-svn: 335576
2018-06-26 03:18:38 +00:00
Craig Topper c42ed4e3c4 [X86] Use XOR for SUB (C, X) during isel if will help fold an immediate
Summary:
Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining.

This seems like the safest way to do this trick. And we consider doing it as a DAG combine later.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48557

llvm-svn: 335575
2018-06-26 03:11:15 +00:00
Dan Gohman fd2f7aeb12 [WebAssembly] Fix a typo in a comment.
llvm-svn: 335574
2018-06-26 03:03:41 +00:00
Teresa Johnson 519055336d [ThinLTO] Add string saver onto index for value names
Summary:
Adds a string saver to the ModuleSummaryIndex so it can store value
names in the case of adding a ValueInfo for a GUID when we don't
have the name stored in a Module string table. This is motivated
by the upcoming summary parser patch, where we will read value names
from the summary entry and want to store them, even when a Module
is not available.

Currently this allows us to store the name in the legacy bitcode case,
and I have added a test to show that.

Reviewers: pcc, dexonsmith

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D47842

llvm-svn: 335570
2018-06-26 02:29:08 +00:00
Craig Topper 689e363ff2 [X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction.
This recommits r335562 and 335563 as a single commit.

The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects.

By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away.

llvm-svn: 335568
2018-06-26 01:37:02 +00:00
Teresa Johnson 9766fd64fb [ThinLTO] Add per-module indexes to combined index consistently
Summary:
Without this change we only add module paths to the combined index when
there is a module hash or at least one global value. Make this more
consistent by adding the module to the index whenever there is a summary
section, and it is a per-module summary (had a MODULE_CODE_SOURCE_FILENAME
record).

Since we will no longer add module paths lazily, add a new interface to get
the module info from the index that asserts it is already added.

Fixes PR37899.

Reviewers: Vlad, pcc

Subscribers: mehdi_amini, inglorion, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D48511

llvm-svn: 335567
2018-06-26 01:32:58 +00:00
Craig Topper 6f4fdfa9af Revert r335562 and 335563 "[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction."
These were supposed to have been squashed to a single commit.

llvm-svn: 335566
2018-06-26 01:31:53 +00:00
Lang Hames ce72161ddf [ORC] Add a symbolAliases function to the Core APIs.
symbolAliases can be used to define symbol aliases within a VSO.

llvm-svn: 335565
2018-06-26 01:22:29 +00:00
Craig Topper 9b4322ce31 foo
llvm-svn: 335562
2018-06-26 00:43:34 +00:00
Teresa Johnson 7bea1aad6a [ThinLTO] Compute GUID directly from GV when building per-module index
Summary:
I discovered when writing the summary parsing support that the
per-module index builder and writer are computing the GUID from the
value name alone (ignoring the linkage type). This was ok since those
GUID were not emitted in the bitcode, and there are never multiple
conflicting names in a single module.

However, I don't see a reason for making the GUID computation different
for the per-module case. It also makes things simpler on the parsing
side to have the GUID computation consistent. So this patch changes the
summary analysis phase and the per-module summary writer to compute the
GUID using the facility on the GlobalValue.

Reviewers: pcc, dexonsmith

Subscribers: llvm-commits, inglorion

Differential Revision: https://reviews.llvm.org/D47844

llvm-svn: 335560
2018-06-26 00:20:49 +00:00
Eric Christopher b7a52bb28a Add a warning if someone attempts to add extra section flags to sections
with well defined semantics like .rodata.

llvm-svn: 335558
2018-06-25 23:53:54 +00:00
Tim Shen 802c31cc28 [APInt] Add helpers for rounding u/sdivs.
Reviewers: sanjoy, craig.topper

Subscribers: jlebar, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D48498

llvm-svn: 335557
2018-06-25 23:49:20 +00:00
Chandler Carruth 1652996fd6 [PM/LoopUnswitch] Teach the new unswitch to handle nontrivial
unswitching of switches.

This works much like trivial unswitching of switches in that it reliably
moves the switch out of the loop. Here we potentially clone the entire
loop into each successor of the switch and re-point the cases at these
clones.

Due to the complexity of actually doing nontrivial unswitching, this
patch doesn't create a dedicated routine for handling switches -- it
would duplicate far too much code. Instead, it generalizes the existing
routine to handle both branches and switches as it largely reduces to
looping in a few places instead of doing something once. This actually
improves the results in some cases with branches due to being much more
careful about how dead regions of code are managed. With branches,
because exactly one clone is created and there are exactly two edges
considered, somewhat sloppy handling of the dead regions of code was
sufficient in most cases. But with switches, there are much more
complicated patterns of dead code and so I've had to move to a more
robust model generally. We still do as much pruning of the dead code
early as possible because that allows us to avoid even cloning the code.

This also surfaced another problem with nontrivial unswitching before
which is that we weren't as precise in reconstructing loops as we could
have been. This seems to have been mostly harmless, but resulted in
pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we
have to get this *right*, and everything benefits from it.

While the testing may seem a bit light here because we only have two
real cases with actual switches, they do a surprisingly good job of
exercising numerous edge cases. Also, because we share the logic with
branches, most of the changes in this patch are reasonably well covered
by existing tests.

The new unswitch now has all of the same fundamental power as the old
one with the exception of the single unsound case of *partial* switch
unswitching -- that really is just loop specialization and not
unswitching at all. It doesn't fit into the canonicalization model in
any way. We can add a loop specialization pass that runs late based on
profile data if important test cases ever come up here.

Differential Revision: https://reviews.llvm.org/D47683

llvm-svn: 335553
2018-06-25 23:32:54 +00:00
Sanjay Patel 38a86d3136 [InstCombine] cleanup udiv folds; NFCI
This removes a "UDivFoldAction" in favor of a simple constant
matcher. In theory, the existing code could do more matching,
but I don't see any evidence or need for it. I've left a TODO
about using ValueTracking in case we see any regressions.

llvm-svn: 335545
2018-06-25 22:50:26 +00:00
Benjamin Kramer 1649774816 [Instrumentation] Remove unused include
It's also a layering violation.

llvm-svn: 335528
2018-06-25 21:43:09 +00:00
Sanjay Patel 6a96d90acd [InstCombine] fold sdiv with sext bool divisor
llvm-svn: 335527
2018-06-25 21:39:41 +00:00
Florian Hahn b10b141a79 Revert r335513: [SCEVExp] Advance found insertion point
llvm-svn: 335522
2018-06-25 20:55:26 +00:00
Craig Topper 27847868b7 [LoopIdiomRecognize] Fix a couple places where it appears we were unintenionally making copies of DebugLoc.
llvm-svn: 335521
2018-06-25 20:45:45 +00:00
Craig Topper 913abc8b58 [X86] Simplify intrinsic table binary search to not require a temporary struct.
std::lower_bound doesn't require the thing to search for to be the same type as the table entries. We just need to define an appropriate comparison function that can take an table entry and an intrinsic number.

llvm-svn: 335518
2018-06-25 20:27:46 +00:00
Craig Topper 614f192471 [X86] Add comment about the sorting of the memory folding tables added in r335501.
llvm-svn: 335517
2018-06-25 20:11:16 +00:00
Lei Huang 5d109ee3d4 [PowerPC] Fix incorrectly encoded wait instruction
Encoding for the wait instruction was wrong. Fix according to ISA 3.0.

Differential Revision: https://reviews.llvm.org/D48550

llvm-svn: 335514
2018-06-25 19:28:27 +00:00
Florian Hahn 5947c17fd4 [SCEVExp] Advance found insertion point until we find a non-dbg instruction.
This avoids creating unnecessary casts if the IP used to be a dbg info
intrinsic. Fixes PR37727.

Reviewers: vsk, aprantl, sanjoy, efriedma

Reviewed By: vsk, efriedma

Differential Revision: https://reviews.llvm.org/D47874

llvm-svn: 335513
2018-06-25 19:17:29 +00:00
Sanjay Patel 1e911fa746 [InstSimplify] fold div/rem of zexted bool
I was looking at an unrelated fold and noticed that
we don't have this simplification (because the other
fold would break existing tests).

Name: zext udiv
  %z = zext i1 %x to i32
  %r = udiv i32 %y, %z
=>
  %r = %y

Name: zext urem
  %z = zext i1 %x to i32
  %r = urem i32 %y, %z
=>
  %r = 0

Name: zext sdiv
  %z = zext i1 %x to i32
  %r = sdiv i32 %y, %z
=>
  %r = %y

Name: zext srem
  %z = zext i1 %x to i32
  %r = srem i32 %y, %z
=>
  %r = 0

https://rise4fun.com/Alive/LZ9

llvm-svn: 335512
2018-06-25 18:51:21 +00:00
Kamil Rytarowski a8448ad098 Handle NetBSD specific path in findDebugBinary()
Summary:
The NetBSD Operating System installs debuginfo
files into /usr/libdata/debug, rather than other path
like in some other popular distribution.

This change makes llvm-symbolizer functional with
the basesystem executables.

Reviewers: joerg, vitalybuka

Reviewed By: vitalybuka

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48525

llvm-svn: 335511
2018-06-25 18:49:13 +00:00
Reid Kleckner 88fee5fdbc Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
  .LtmpN:
    leaq .LtmpN(%rip), %rbx
    movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
    addq %rax, %rbx # GOT base reg

From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
    movq extern_gv@GOT(%rbx), %rax
    movq local_gv@GOTOFF(%rbx), %rax

All calls end up being indirect:
    movabsq $local_fn@GOTOFF, %rax
    addq %rbx, %rax
    callq *%rax

The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
    leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg

DSO local data accesses will use it:
    movq local_gv@GOTOFF(%rbx), %rax

Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
    movq extern_gv@GOTPCREL(%rip), %rax

Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.

This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.

I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC
code model is not implemented for MachO yet.

Differential Revision: https://reviews.llvm.org/D47211

llvm-svn: 335508
2018-06-25 18:16:27 +00:00
Craig Topper 3cc6cb1d35 [X86] Sort the static memory folding tables by reg opcode. Remove the reg->mem DenseMaps in favor of binary search.
With the static tables sorted we can binary search them directly for reg->mem lookups. This removes 6 DenseMaps that had to be created when X86InstrInfo is constructed.

We still have one Mem->Reg DenseMap for the reverse direction. This is created just as before by walking the reg->mem arrays to populate it.

Differential Revision: https://reviews.llvm.org/D48527

llvm-svn: 335501
2018-06-25 17:26:56 +00:00
Craig Topper b9cb88a4b0 [X86] Allow base and index for gather instructions to appear in other order for Intel syntax.
llvm-svn: 335500
2018-06-25 17:26:51 +00:00
Vedant Kumar b725c69f12 [SelectionDAG] Remove debug locations from ConstantSD(FP)Nodes
This removes debug locations from ConstantSDNode and ConstantSDFPNode.

When this kind of node is materialized we no longer create a line table
entry which jumps back to the constant's first point of use. This makes
single-stepping behavior smoother, and it matches the model used by IR,
where Constants have no locations. See this thread for more context:

  http://lists.llvm.org/pipermail/llvm-dev/2018-June/124164.html

I'd like to handle constant BuildVectorSDNodes and to try to eliminate
passing SDLocs to SelectionDAG::getConstant*() in follow-up commits.

Differential Revision: https://reviews.llvm.org/D48468

llvm-svn: 335497
2018-06-25 17:06:18 +00:00
Alexander Richardson 85e200e934 Add Triple::isMIPS()/isMIPS32()/isMIPS64(). NFC
There are quite a few if statements that enumerate all these cases. It gets
even worse in our fork of LLVM where we also have a Triple::cheri (which
is mips64 + CHERI instructions) and we had to update all if statements that
check for Triple::mips64 to also handle Triple::cheri. This patch helps to
reduce our diff to upstream and should also make some checks more readable.

Reviewed By: atanasyan

Differential Revision: https://reviews.llvm.org/D48548

llvm-svn: 335493
2018-06-25 16:49:20 +00:00
Matt Arsenault b1cc4f52ff AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptr
Note a normal select test is not currently possible because this
relies on input registers tracked in SIMachineFunctionInfo which
are not currently serializable in MIR, but this does work end-to-end
from the IR.

llvm-svn: 335490
2018-06-25 16:17:48 +00:00
Matt Arsenault 921f7a27cc StackSlotColoring: Decide colors per stack ID
I thought I fixed this in r308673, but that fix was
very broken. The assumption that any frame index can be used
in place of another was more widespread than I realized.
Even when stack slot sharing was disabled, this was still
replacing frame index uses with a different ID with a different
stack slot.

Really fix this by doing the coloring per-stack ID, so all of
the coloring logically done in a separate namespace. This is a lot
simpler than trying to figure out how to change the color if
the stack ID is different.

llvm-svn: 335488
2018-06-25 16:05:55 +00:00
Matt Arsenault 2811a20f77 AMDGPU: Remove commented out code
llvm-svn: 335486
2018-06-25 15:42:20 +00:00
Matt Arsenault b3feccd7fa AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointers
llvm-svn: 335485
2018-06-25 15:42:12 +00:00
Wei Mi e555127435 [SampleFDO] Add an option to turn on/off warning about samples unused.
If a function has sample to use, but cannot use them because of no debug
information, currently a warning will be issued to inform the missing
opportunity.

This warning assumes the binary generating the profile and the binary using
the profile are similar enough. It is not always the case. Sometimes even
if the binaries are not quite similar, we may still get some benefit by
using sampleFDO. In those cases, we may still want to apply sampleFDO but
not want to see a lot of such warnings pop up.

The patch adds an option for the warning.

Differential Revision: https://reviews.llvm.org/D48510

llvm-svn: 335484
2018-06-25 15:40:31 +00:00
David Green 8699492304 [DA] Delinearise AddRecs if we can prove they don't wrap
We can prove that some delinearized subscripts do not wrap around to become
negative by the fact that they are from inbound geps of load/store locations.
This helps improve the delinearisation in cases where we can't prove that they
are non-negative from SCEV alone.

Differential Revision: https://reviews.llvm.org/D48481

llvm-svn: 335481
2018-06-25 15:13:26 +00:00
Matt Arsenault 73eeb42e50 AMDGPU: Respect align argument parameter
This should avoid relying on the pointee type
to get the alignment, particularly since pointee
types are supposed to be removed at some point.

Also fixes not getting the alignment for unsized types.

llvm-svn: 335478
2018-06-25 14:29:04 +00:00
Artur Pilipenko ac6e6864e8 SafepointIRVerifier should ignore dead blocks and dead edges
Not only should SafepointIRVerifier ignore unreachable blocks (as suggested in https://reviews.llvm.org/D47011) but it also has to ignore dead blocks.

In @test2 (see the new tests):

  br i1 true, label %right, label %left
left:
  ...
right:
  ...
merge:
  %val = phi i8 addrspace(1)* [ ..., %left ], [ ..., %right ]
  use %val
both left and right branches are reachable.
If they collide then SafepointIRVerifier reports an error.

Because of the foldable branch condition GVN finds the left branch dead and removes the phi node entry that merges values from right and left. Then the use comes from the right branch. This results in no collision.

So, SafepointIRVerifier ends up in different results depending on either GVN is run or not.

To solve this issue this patch adds Dead Block detection to SafepointIRVerifier which can ignore dead blocks while validating IR. The Dead Block detection algorithm is taken from GVN but modified to not split critical edges. That is needed to keep CFG unchanged by SafepointIRVerifier.

Patch by Yevgeny Rouban.

Reviewed By: anna, apilipenko, DaniilSuchkov

Differential Revision: https://reviews.llvm.org/D47441

llvm-svn: 335473
2018-06-25 13:51:11 +00:00
Krzysztof Parzyszek 4581f37e7c Improve handling of COPY instructions with identical value numbers
Testcases provided by Tim Renouf.

Differential Revision: https://reviews.llvm.org/D48102

llvm-svn: 335472
2018-06-25 13:46:41 +00:00
Artur Pilipenko ddc7f391d2 Revert change 335077 "[InlineSpiller] Fix a crash due to lack of forward progress from remat specifically for STATEPOINT"
This change caused widespread assertion failures in our downstream testing:
lib/CodeGen/LiveInterval.cpp:409: bool llvm::LiveRange::overlapsFrom(const llvm::LiveRange&, llvm::LiveRange::const_iterator) const: Assertion `!empty() && "empty range"' failed.

llvm-svn: 335462
2018-06-25 12:58:13 +00:00
Simon Pilgrim 79e474bf46 Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 bits" MSVC warning (again). NFCI.
llvm-svn: 335457
2018-06-25 11:46:24 +00:00
Simon Pilgrim 3a0e13f347 Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 bits" MSVC warning. NFCI.
llvm-svn: 335454
2018-06-25 11:38:27 +00:00
Simon Pilgrim 5b6b500687 Fix -Wparentheses gcc warning. NFCI.
llvm-svn: 335451
2018-06-25 11:19:05 +00:00
Craig Topper facea6b4a6 [X86] Block commuting operand 1 of FMA*_Int instructions in findThreeSrcCommutedOpIndices. Remove uncommutable returns from getThreeSrcCommuteCase/getFMA3OpcodeToCommuteOperands.
We should be blocking the operand while we are in the routine that tries to find commutable operand indices. Doing it later means we might have missed out on another valid set of operands we could have commuted.

The intrinsic case was the only case that could really prevent commuting in getFMA3OpcodeToCommuteOperands. All the other cases in getThreeSrcCommuteCase were not reachable conditions as they were protected by findThreeSrcCommutedOpIndices.

With that abort case pushed earlier, we can remove all the abort checks and replace with asserts.

llvm-svn: 335446
2018-06-25 06:05:37 +00:00
George Burgess IV 97ec62455d [MSSA] Add domination number verifier; NFC
It's easy for domination numbers to get out-of-date, and this is no more
costly than any of the other verifiers we already have, so it seems nice
to have.

A stage3 build with this Works On My Machine, so this hasn't caught any
bugs... yet. :)

llvm-svn: 335444
2018-06-25 05:30:36 +00:00
Heejin Ahn 04c4894911 [WebAssembly] Add WebAssemblyException information analysis
Summary:
A WebAssemblyException object contains BBs that belong to a 'catch' part
of the try-catch-end structure. Because CFGSort requires all the BBs
within a catch part to be sorted together as it does for loops, this
pass calculates the nesting structure of catch part of exceptions in a
function. Now this assumes the use of Windows EH instructions.

Reviewers: dschuff, majnemer

Subscribers: jfb, mgorny, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D44134

llvm-svn: 335439
2018-06-25 01:20:21 +00:00
Heejin Ahn 4934f76b58 [WebAssembly] Add WebAssemblyLateEHPrepare pass
Summary:
Add WebAssemblyLateEHPrepare pass that does several small jobs for
exception handling. This runs before CFGSort, and is different from
WasmEHPrepare pass that runs before ISel, even though the names are
similar.

Reviewers: dschuff, majnemer

Subscribers: sbc100, jgravelle-google, sunfish, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D46803

llvm-svn: 335438
2018-06-25 01:07:11 +00:00
Craig Topper 3b18bdc46d [X86] Simplify some code by using isOneConstant. NFC
llvm-svn: 335437
2018-06-25 01:01:47 +00:00
Craig Topper 4331d6218d [X86] Remove the changes to combineScalarToVector made in r335037.
They appear to be untested other than the test case for p37879.ll and I believe we should be using SimplifyDemandedElts here to handle these cases.

llvm-svn: 335436
2018-06-25 00:21:53 +00:00
Craig Topper ecf7c5b75f [X86] Reduce the number of patterns needed for masked scalar ceil/floor isel.
The scalar to vector on the mask register should not be part of the patterns.

llvm-svn: 335435
2018-06-25 00:05:09 +00:00
Brad Smith df1f50579f [mips][ias] Enable IAS by default for OpenBSD / FreeBSD mips64/mips64el.
Reviewers: atanasyan

Differential Review: https://reviews.llvm.org/D31557

llvm-svn: 335434
2018-06-24 15:44:47 +00:00
Sanjay Patel 962ee178fa [DAGCombiner] eliminate setcc bool math when input is low-bit of some value
This patch has the same motivating example as D48466:
define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) {
    %c.0282 = and i32 %c.0282.in, 268435455
    %a16 = lshr i64 32508, %x
    %a17 = and i64 %a16, 1
    %tobool = icmp eq i64 %a17, 0
    %. = select i1 %tobool, i32 1, i32 2
    %.286 = select i1 %tobool, i32 27, i32 26
    %shr97 = lshr i32 %c.0282, %.
    %shl98 = shl i32 %c.0282.in, %.286
    %or99 = or i32 %shr97, %shl98
    %shr100 = lshr i32 %d.0280, %.
    %shl101 = shl i32 %d.0280, %.286
    %or102 = or i32 %shr100, %shl101
    store i32 %or99, i32* %ptr0
    store i32 %or102, i32* %ptr1
    ret void
}

...but I'm trying to kill the setcc bool math sooner rather than later.

By matching a larger pattern that includes both the low-bit mask and the trailing add/sub, 
we can create a universally good fold because we always eliminate the condition code 
intermediate value.

Here are Alive proofs for these (currently instcombine folds the 'add' variants, but 
misses the 'sub' patterns):
https://rise4fun.com/Alive/Gsyp

Name: sub of zext cmp mask
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %z = zext i1 %c to i32
  %r = sub i32 C1, %z
  =>
  %optional_cast = zext i8 %a to i32
  %r = add i32 %optional_cast, C1-1

Name: add of zext cmp mask
  %a = and i32 %x, 1
  %c = icmp eq i32 %a, 0
  %z = zext i1 %c to i8
  %r = add i8 %z, C1
  =>
  %optional_cast = trunc i32 %a to i8
  %r = sub i8 C1+1, %optional_cast

All of the tests look like improvements or neutral to me. But it is possible that x86 
test+set+bitop is better than what we now show here. I suspect we could do better by 
adding another fold for the 'sub' variants.

We start with select-of-constant in IR in the larger motivating test, so that's why I 
included tests with selects. Proofs for those variants:
https://rise4fun.com/Alive/Bx1

Name: true const is bigger
Pre: C2 == (C1 + 1)
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %r = select i1 %c, i64 C2, i64 C1
  =>
  %z = zext i8 %a to i64
  %r = sub i64 C2, %z

Name: false const is bigger
Pre: C2 == (C1 + 1)
  %a = and i8 %x, 1
  %c = icmp eq i8 %a, 0
  %r = select i1 %c, i64 C1, i64 C2
  =>
  %z = zext i8 %a to i64
  %r = add i64 C1, %z

Differential Revision: https://reviews.llvm.org/D48466

llvm-svn: 335433
2018-06-24 14:37:30 +00:00
Craig Topper 03523f6741 [X86] Regroup some isel patterns. NFC
For some reason the 64-bit patterns were separated from their 8/16/32-bit friends, but only for add/sub/mul. For and/or/xor they were together.

llvm-svn: 335429
2018-06-24 06:56:49 +00:00
Craig Topper 19772c89c7 [X86] Rename VFPCLASSSS and VFPCLASSSD internal instruction names to include a Z to match other EVEX instructions.
llvm-svn: 335428
2018-06-24 06:29:50 +00:00
Brad Smith 8c17d5921d Add OpenBSD support to the Threading code
llvm-svn: 335426
2018-06-23 22:02:59 +00:00
Duncan P. N. Exon Smith f4c82cffc6 ADT: Use EBO to shrink SmallVector size 1
SmallVectorStorage is empty when its size is 1; use inheritance so that
the empty base class optimization kicks in.

llvm-svn: 335421
2018-06-23 18:39:44 +00:00
Jonas Devlieghere c02bf01f64 [TableGen] Use WithColor for printing errors/warnings
Use the WithColor helper from support to print errors and warnings.

llvm-svn: 335415
2018-06-23 16:48:03 +00:00
Craig Topper d8d64a56b5 [X86] Make %eiz usage in 64-bit mode, force a 0x67 address size prefix. Fix some test CHECK lines.
llvm-svn: 335414
2018-06-23 06:15:04 +00:00
Craig Topper 2545529034 [X86] Teach disassembler to use %eip instead of %rip when 0x67 prefix is used on a rip-relative address.
llvm-svn: 335413
2018-06-23 06:03:48 +00:00
Craig Topper 68d64e3859 [X86][AsmParser] Improve base/index register checks.
-Ensure EIP isn't used with an index reigster.
-Ensure EIP isn't used as index register.
-Ensure base register isn't a vector register.
-Ensure eiz/riz usage matches the size of their base register.

llvm-svn: 335412
2018-06-23 05:53:00 +00:00
Stanislav Mekhanoshin d8c9374797 Fix invariant fdiv hoisting in LICM
FDiv is replaced with multiplication by reciprocal and invariant
reciprocal is hoisted out of the loop, while multiplication remains
even if invariant.

Switch checks for all invariant operands and only invariant
denominator to fix the issue.

Differential Revision: https://reviews.llvm.org/D48447

llvm-svn: 335411
2018-06-23 04:01:28 +00:00
Reid Kleckner fd7c9ab971 [AMDGPU] Update includes for intrinsic changes :(
llvm-svn: 335409
2018-06-23 03:05:39 +00:00
Lang Hames d716a26e8b [ORC] Fix formatting and list pending queries in VSO::dump.
llvm-svn: 335408
2018-06-23 02:22:10 +00:00
Reid Kleckner f5890e4e43 [IR] Split Intrinsics.inc into enums and implementations
Implements PR34259

Intrinsics.h is a very popular header. Most LLVM TUs care about things
like dbg_value, but they don't care how they are implemented. After I
split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU
from scanning 1.7 MB of source that gets pre-processed away.

It also means we can modify intrinsic properties without triggering a
full rebuild, but that's probably less of a win.

I think the next best thing to do would be to split out the target
intrinsics into their own header. Very, very few TUs care about
target-specific intrinsics. It's very hard to split up the target
independent intrinsics like llvm.expect, assume, and dbg.value, though.

llvm-svn: 335407
2018-06-23 02:02:38 +00:00
Craig Topper abdbb2c67a [X86][AsmParser] Rework that allows (%dx) to be used in place of %dx with in/out instructions.
Previously, to support (%dx) we left a wide open hole in our 16-bit memory address checking. This let this address value be used with any instruction without error in the parser. It would later fail in the encoder with an assertion failure on debug builds and who knows what on release builds.

This patch passes the mnemonic down to the memory operand parsing function so we can allow the (%dx) form only on specific instructions.

llvm-svn: 335403
2018-06-23 00:03:20 +00:00
Reid Kleckner 330f65b3e8 [RuntimeDyld] Implement the ELF PIC large code model relocations
Prerequisite for https://reviews.llvm.org/D47211 which improves our ELF
large PIC codegen.

llvm-svn: 335402
2018-06-22 23:53:22 +00:00
Eli Friedman 203eaaf5ba [LoopReroll] Rewrite induction variable rewriting.
This gets rid of a bunch of weird special cases; instead, just use SCEV
rewriting for everything.  In addition to being simpler, this fixes a
bug where we would use the wrong stride in certain edge cases.

The one bit I'm not quite sure about is the trip count handling,
specifically the FIXME about overflow.  In general, I think we need to
widen the exit condition, but that's probably not profitable if the new
type isn't legal, so we probably need a check somewhere.  That said, I
don't think I'm making the existing problem any worse.

As a followup to this, a bunch of IV-related code in root-finding could
be cleaned up; with SCEV-based rewriting, there isn't any reason to
assume a loop will have exactly one or two PHI nodes.

Differential Revision: https://reviews.llvm.org/D45191

llvm-svn: 335400
2018-06-22 22:58:55 +00:00
George Burgess IV 2cbf9730b0 [MSSA] Remove incorrect comment + `auto`ify dyn_cast results; NFC
llvm-svn: 335399
2018-06-22 22:34:07 +00:00
Craig Topper 10e2f73793 [X86][AsmParser] Keep track of whether an explicit scale was specified while parsing an address in Intel syntax. Use it for improved error checking.
This allows us to check these:
-16-bit addressing doesn't support scale so we should error if we find one there.
-Multiplying ESP/RSP by a scale even if the scale is 1 should be an error because ESP/RSP can't be an index.

llvm-svn: 335398
2018-06-22 22:28:39 +00:00
Craig Topper 1d707539e4 [X86][AsmParser] In Intel syntax make sure we support ESP/RSP being the second register in memory expressions like [EAX+ESP].
By default, the second register gets assigned to the index register slot. But ESP can't be an index register so we need to swap it with the other register.

There's still a slight bug that we allow [EAX+ESP*1]. The existence of the multiply even though its with 1 should force ESP to the index register and trigger an error, but it doesn't currently.

llvm-svn: 335394
2018-06-22 21:57:24 +00:00
Tobias Edler von Koch 7609cb83e6 Re-land "[LTO] Enable module summary emission by default for regular LTO"
Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).

See https://reviews.llvm.org/D34156 for details on the original change.

This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.

llvm-svn: 335385
2018-06-22 20:23:21 +00:00
Craig Topper 9bc2c059c3 [X86] Don't accept (%si,%bp) 16-bit address expressions.
The second register is the index register and should only be %si or %di if used with a base register. And in that case the base register should be %bp or %bx.

This makes us compatible with gas.

We do still need to support both orders with Intel syntax which uses [bp+si] and [si+bp]

llvm-svn: 335384
2018-06-22 20:20:38 +00:00
Craig Topper c26c62e0e5 [X86][AsmParser] Allow (%bp,%si) and (%bp,%di) to be encoded without using a zero displacement.
(%bp) can't be encoded without a displacement. The encoding is instead used for displacement alone. So a 1 byte displacement of 0 must be used. But if there is an index register we can encode without a displacement.

llvm-svn: 335379
2018-06-22 19:42:21 +00:00
Craig Topper cd18bb523c [X86][AsmParser] Check for invalid 16-bit base register in Intel syntax.
llvm-svn: 335373
2018-06-22 17:50:40 +00:00
Craig Topper 22d1db122a [X86] Don't allow ESP/RSP to be used as an index register in assembly.
Fixes PR37892

llvm-svn: 335370
2018-06-22 17:15:58 +00:00
Alina Sbirlea bee50036d3 [LoopUnswitch]Fix comparison for DomTree updates.
Summary:
In LoopUnswitch when replacing a branch Parent -> Succ with a conditional
branch Parent -> True & Parent->False, the DomTree updates should insert an edge for
each of True/False if True/False are different than Succ, and delete Parent->Succ edge
if both are different. The comparison with Succ appears to be incorect,
it's comparing with Parent instead.
There is no test failing either before or after this change, but it seems to me this is
the right way to do the update.

Reviewers: chandlerc, kuhar

Subscribers: sanjoy, jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D48457

llvm-svn: 335369
2018-06-22 17:14:35 +00:00
Krzysztof Parzyszek 358a916aa8 Initialize LiveRegs once in BranchFolder::mergeCommonTails
llvm-svn: 335365
2018-06-22 16:38:38 +00:00
Simon Pilgrim 9d3ef8ee2b [SLPVectorizer] Support alternate opcodes in tryToVectorizeList
Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree.

NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc.

Differential Revision: https://reviews.llvm.org/D48488

llvm-svn: 335364
2018-06-22 16:37:34 +00:00
Simon Pilgrim 213cb1b82d [SLPVectorizer] reorderAltShuffleOperands should just take InstructionsState. NFCI.
All calls were extracting the InstructionsState Opcode/AltOpcode values so we might as well pass it directly

llvm-svn: 335359
2018-06-22 16:10:26 +00:00
Paul Robinson 11539b0969 [DWARFv5] Allow ".loc 0" to refer to the root file.
DWARF v5 explicitly represents file #0 in the line table.  Prior
versions did not, so ".loc 0" is still an error in those cases.

Differential Revision: https://reviews.llvm.org/D48452

llvm-svn: 335350
2018-06-22 14:16:11 +00:00
Simon Pilgrim 1e564504bb [SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair
SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle.

This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle.

Differential Revision: https://reviews.llvm.org/D48477

llvm-svn: 335349
2018-06-22 14:04:06 +00:00
Sanjay Patel a52963b404 [InstCombine] rearrange shuffle-of-binops logic; NFC
The commutative matcher makes things more complicated
here, and I'm planning an enhancement where this 
form is more readable.

llvm-svn: 335343
2018-06-22 12:46:16 +00:00
George Rimar dcf59c5480 Recommit r335333 "[MC] - Add .stack_size sections into groups and link them with .text"
With compilation fix.

Original commit message:

D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. 
    The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
    eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
   With that linker will be able to do -gc-sections on dead stack sizes sections.

Differential revision: https://reviews.llvm.org/D46874

llvm-svn: 335336
2018-06-22 10:53:47 +00:00
Simon Pilgrim e0a6eb1f4f [IR] Use Instruction::isBinaryOp helper instead of raw enum range tests. NFCI.
llvm-svn: 335335
2018-06-22 10:48:02 +00:00
George Rimar 6d448da1be Revert r335332 "[MC] - Add .stack_size sections into groups and link them with .text"
It broke bots.

http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/12891
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/9443
http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-buildserver/builds/25551

llvm-svn: 335333
2018-06-22 10:27:33 +00:00
George Rimar e14485a0c6 [MC] - Add .stack_size sections into groups and link them with .text
D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. 
    The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
    eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
   With that linker will be able to do -gc-sections on dead stack sizes sections.

Differential revision: https://reviews.llvm.org/D46874

llvm-svn: 335332
2018-06-22 10:10:53 +00:00
Sjoerd Meijer 1043dffbd3 Recommit of r335326, with the test fixed that I missed.
llvm-svn: 335331
2018-06-22 10:03:03 +00:00
Simon Pilgrim 9c8f9374b5 [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.

This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.

I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.

Differential Revision: https://reviews.llvm.org/D48172

llvm-svn: 335329
2018-06-22 09:45:31 +00:00
Sjoerd Meijer 7ee5b090de Reverting r335326 while I look at the test failure
llvm-svn: 335328
2018-06-22 09:17:08 +00:00
Eugene Leviant 6d711ca168 Revert r335324 due to a builtbot failure
llvm-svn: 335327
2018-06-22 08:57:01 +00:00
Sjoerd Meijer 8d2f1565b7 [ARM] ARMv6m and v8m.baseline strict align
This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline,
because it has no support for unaligned accesses.
It looks like we always pass target feature "+strict-align" from
Clang, so this is not a user facing problem, but querying the subtarget
(in e.g. llc) for unaligned access support is incorrect.

Differential Revision: https://reviews.llvm.org/D48437

llvm-svn: 335326
2018-06-22 08:48:13 +00:00
Matt Arsenault 3f8e7a3dbc AMDGPU: Add patterns for i32/i64 local atomic load/store
Not sure why the 32/64 split is needed in the atomic_load
store hierarchies. The regular PatFrags do this, but we don't
do it for the existing handling for global.

llvm-svn: 335325
2018-06-22 08:39:52 +00:00
Eugene Leviant ea19c9473c [Evaluator] Improve evaluation of call instruction
Differential revision: https://reviews.llvm.org/D46584

llvm-svn: 335324
2018-06-22 08:29:36 +00:00
Mikhail Dvoretckii 0963562083 [X86] Changing the check for valid inputs in combineScalarToVector
Changing the logic of scalar mask folding to check for valid input types rather
than against invalid ones, making it more robust and fixing PR37879.

Differential Revision: https://reviews.llvm.org/D48366

llvm-svn: 335323
2018-06-22 08:28:05 +00:00
Chandler Carruth aa5f4d2e23 Revert r335306 (and r335314) - the Call Graph Profile pass.
This is the first pass in the main pipeline to use the legacy PM's
ability to run function analyses "on demand". Unfortunately, it turns
out there are bugs in that somewhat-hacky approach. At the very least,
it leaks memory and doesn't support -debug-pass=Structure. Unclear if
there are larger issues or not, but this should get the sanitizer bots
back to green by fixing the memory leaks.

llvm-svn: 335320
2018-06-22 05:33:57 +00:00
Tom Stellard 6af7307650 AMDGPU/GlobalISel: Default to using TableGen'd instruction selector
Summary:
We can select all instructions that are marked as legal in a full piglit run,
so now is a good time to make the TableGen'd instruction selector default
for all opcodes.  This is NFC for a full piglit run, which is why there are
no tests.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48198

llvm-svn: 335319
2018-06-22 03:04:35 +00:00
Tom Stellard 26fac0f8e1 AMDGPU/GlobalISel: legalize and select 32-bit G_ASHR
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D48196

llvm-svn: 335318
2018-06-22 02:54:57 +00:00
Chandler Carruth fe70b29cf7 [LegacyPM] Fix PR37888 by teaching the legacy loop pass manager how to
clear out deleted loops from the current queue beyond just the current
loop.

This is important because SimpleLoopUnswitch will now enqueue the same
loop to be re-processed. When it does this with the legacy PM, we don't
have a way of canceling the rest of the pipeline and so we can end up
deleting the loop before we reprocess it. =/

This change also makes it easy to support deleting other loops in the
queue to process, although I don't have any use cases for that.

Differential Revision: https://reviews.llvm.org/D48470

llvm-svn: 335317
2018-06-22 02:43:41 +00:00
Tom Stellard 9a6535718e AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFP
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48195

llvm-svn: 335316
2018-06-22 02:34:29 +00:00
Tom Stellard 7712ee8891 AMDGPU/GlobalISel: Implement select() for COPY
Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46151

llvm-svn: 335315
2018-06-22 00:44:29 +00:00
Sanjay Patel 4784e1506e [InstCombine] fix shuffle-of-binops bug
With non-commutative binops, we could be using the same
variable value as operand 0 in 1 binop and operand 1 in 
the other, so we have to check for that possibility and
bail out.

llvm-svn: 335312
2018-06-21 23:56:59 +00:00
Tom Stellard 3f1c6fe156 AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEF
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46150

llvm-svn: 335307
2018-06-21 23:38:20 +00:00
Michael J. Spencer fc93dd8e18 [Instrumentation] Add Call Graph Profile pass
This patch adds support for generating a call graph profile from Branch Frequency Info.

The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.

After scanning all the functions, it generates an appending module flag containing the data. The format looks like:

!llvm.module.flags = !{!0}

!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}

Differential Revision: https://reviews.llvm.org/D48105

llvm-svn: 335306
2018-06-21 23:31:10 +00:00
Reid Kleckner 2ef486690c [X86] Fix 32-bit mingw comdat names, only add one underscore
llvm-svn: 335304
2018-06-21 23:06:33 +00:00
Reid Kleckner 3a2fd1c2f3 Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
MCJIT can't handle R_X86_64_GOT64 yet.

llvm-svn: 335300
2018-06-21 22:19:05 +00:00
Reid Kleckner 3286a6c896 [X86] Commit some comments that weren't in the medium code model patch
llvm-svn: 335298
2018-06-21 21:57:44 +00:00
Reid Kleckner 247fe6aeab [X86] Implement more of x86-64 large and medium PIC code models
Summary:
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
  .LtmpN:
    leaq .LtmpN(%rip), %rbx
    movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
    addq %rax, %rbx # GOT base reg

From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
    movq extern_gv@GOT(%rbx), %rax
    movq local_gv@GOTOFF(%rbx), %rax

All calls end up being indirect:
    movabsq $local_fn@GOTOFF, %rax
    addq %rbx, %rax
    callq *%rax

The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
    leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg

DSO local data accesses will use it:
    movq local_gv@GOTOFF(%rbx), %rax

Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
    movq extern_gv@GOTPCREL(%rip), %rax

Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.

This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.

Reviewers: chandlerc, echristo

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D47211

llvm-svn: 335297
2018-06-21 21:55:08 +00:00
Matthew Voss 30648ab233 [GVN] Avoid casting a vector of size less than 8 bits to i8
Summary:
A reprise of D25849.

This crash was found through fuzzing some time ago and was documented in PR28879.

No check for load size has been added due to the following tests:
  - Transforms/GVN/invariant.group.ll
  - Transforms/GVN/pr10820.ll

These tests expect load sizes that are not a multiple of eight.

Thanks to @davide for the original patch.

Reviewers: nlopes, davide, RKSimon, reames, efriedma

Reviewed By: efriedma

Subscribers: davide, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D48330

llvm-svn: 335294
2018-06-21 21:43:20 +00:00
Tim Shen 63f244c4f4 [SCEV] Re-apply r335197 (with Polly fixes).
Summary:
This initiates a discussion on changing Polly accordingly while re-applying r335197 (D48338).

I have never worked on Polly. The proposed change to param_div_div_div_2.ll is not educated, but just patterns that match the output.

All LLVM files are already reviewed in D48338.

Reviewers: jdoerfert, bollu, efriedma

Subscribers: jlebar, sanjoy, hiraditya, llvm-commits, bixia

Differential Revision: https://reviews.llvm.org/D48453

llvm-svn: 335292
2018-06-21 21:29:54 +00:00
Konstantin Zhuravlyov e004b3d97b AMDGPU: Remove ability to reserve VGPRs for debugger
Differential Revision: https://reviews.llvm.org/D48234

llvm-svn: 335288
2018-06-21 20:28:19 +00:00
Reid Kleckner 13c9ee684c [mingw] Fix GCC ABI compatibility for comdat things
Summary:
GCC and the binutils COFF linker do comdats differently from MSVC.
If we want to be ABI compatible, we have to do what they do, which is to
emit unique section names like ".text$_Z3foov" instead of short section
names like ".text". Otherwise, the binutils linker gets confused and
reports multiple definition errors when two object files from GCC and
Clang containing the same inline function are linked together.

The best description of the issue is probably at
https://github.com/Alexpux/MINGW-packages/issues/1677, we don't seem to
have a good one in our tracker.

I fixed up the .pdata and .xdata sections needed everywhere other than
32-bit x86. GCC doesn't use associative comdats for those, it appears to
rely on the section name.

Reviewers: smeenai, compnerd, mstorsjo, martell, mati865

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D48402

llvm-svn: 335286
2018-06-21 20:27:38 +00:00
Sanjay Patel a76b70069d [InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806)
This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have a common variable operand used in a pair of binops with vector constants 
that are vector selected together, then we can constant shuffle the constant vectors 
to eliminate the shuffle instruction.

This has some tricky parts that are hopefully addressed in the tests and their 
respective comments:

  1. If the shuffle mask contains an undef element, then that lane of the result is 
     undef:
     http://llvm.org/docs/LangRef.html#shufflevector-instruction

     Therefore, we can replace the constant in that lane with an undef value except 
     for div/rem. With div/rem, an undef in the divisor would cause the whole op to 
     be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'.

  2. Intersect the wrapping and FMF of the original binops for the new binop. There 
     should be no extra poison or fast-math potential in the new binop that wasn't 
     possible in the original code.

  3. Disregard other uses. Given that we're eliminating uses (shortening the 
     dependency chain), I think that's always the right IR canonicalization. But 
     I purposely chose the udiv test to demonstrate the scenario where both 
     intermediate values have other uses because that seems likely worse for 
     codegen with an expensive math op. This seems like a very rare possibility to 
     me, so I don't think it requires a backend patch first.

Differential Revision: https://reviews.llvm.org/D48401

llvm-svn: 335283
2018-06-21 20:15:09 +00:00
Scott Linder 1e8c2c705d [AMDGPU] Update assembler for HSA Code Object v3
Update AMDGPU assembler syntax behind the code-object-v3 feature:

* Replace/rename most AMDGPU assembler directives/symbols and document them.
* Provide more diagnostics (e.g. values out of range, missing values, repeated
  values).
* Provide path for backwards compatibility, even with underlying descriptor
  changes.

Differential Revision: https://reviews.llvm.org/D47736

llvm-svn: 335281
2018-06-21 19:38:56 +00:00
Francis Visoiu Mistrih ac599b6951 Revert r335206 "Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions."
This reverts commit r335206.

As discussed here: https://reviews.llvm.org/rL333740, a fix will come
tomorrow. In the meanwhile, revert this to fix some bots.

llvm-svn: 335272
2018-06-21 19:18:36 +00:00
Simon Dardis 3505045b42 [mips] Modify comment to test new email address (NFC).
llvm-svn: 335269
2018-06-21 18:52:32 +00:00
Scott Linder 5792dd0f39 [AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcnts
BlockWaitcntProcessedSet was not being cleared between calls, so it was
producing incorrect counts in cases where MBB addresses happened to coincide
across multiple calls.

Differential Revision: https://reviews.llvm.org/D48391

llvm-svn: 335268
2018-06-21 18:48:48 +00:00
Konstantin Zhuravlyov 766c77efd7 AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/Z
and everything that comes with it from implementation
and v3 header files.

Leave definition in v2 header files for backwards
compatibility.

Differential Revision: https://reviews.llvm.org/D48191

llvm-svn: 335267
2018-06-21 18:36:04 +00:00
Matt Davis d041f21810 [DebugInfo] Ignore DBG_VALUE instructions in PostRA Machine Sink
Summary:
The logic for handling the sinking of COPY instructions was generating
different code when building with debug flags.

The original code did not take into consideration debug instructions.  This
resulted in the registers in the DBG_VALUE instructions being treated as used,
and prevented the COPY from being sunk.  This patch avoids analyzing debug
instructions when trying to sink COPY instructions.

This patch also creates a routine from the code in MachineSinking::SinkInstruction to
perform the logic of sinking an instruction along with its debug instructions.
This functionality is used in multiple places, including the code for sinking COPY instrs.


Reviewers: junbuml, javed.absar, MatzeB, bjope

Reviewed By: bjope

Subscribers: aprantl, probinson, thegameg, jonpa, bjope, vsk, kristof.beyls, JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D45637

llvm-svn: 335264
2018-06-21 17:59:52 +00:00
Sanjay Patel 3244537a3c [InstCombine] use constant pattern matchers with icmp+sext
The previous code worked with vectors, but it failed when the
vector constants contained undef elements. 
The matchers handle those cases.

llvm-svn: 335262
2018-06-21 17:51:44 +00:00
Sanjay Patel 7b0fc75f73 [InstCombine] simplify binops before trying other folds
This is outwardly NFC from what I can tell, but it should be more efficient 
to simplify first (despite the name, SimplifyAssociativeOrCommutative does
not actually simplify as InstSimplify does - it creates/morphs instructions).

This should make it easier to refactor duplicated code that runs for all binops.

llvm-svn: 335258
2018-06-21 17:06:36 +00:00
Paul Robinson 547adcaaac [DWARF] Warn on and ignore ".file 0" for DWARF v4 and earlier.
This had been messing with the directory table for prior versions, and
also could induce a crash when generating asm output.

llvm-svn: 335254
2018-06-21 16:42:03 +00:00
Sirish Pande b60acb9e48 Revert "[AArch64] Coalesce Copy Zero during instruction selection"
This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7.

Original Commit:

Author: Haicheng Wu <haicheng@codeaurora.org>
Date:   Sun Feb 18 13:51:33 2018 +0000

    [AArch64] Coalesce Copy Zero during instruction selection

    Add special case for copy of zero to avoid a double copy.

    Differential Revision: https://reviews.llvm.org/D36104

Author's intention is to remove a BB that has one mov instruction. In
order to do that, d8f571050 pessmizes MachineSinking by introducing a
copy, such that mov instruction is NOT moved to the BB. Optimization
downstream gets rid of the BB with only mov instruction. This works well
if we have only one fall through branch as there is only one "extra"
mov instruction.

If we have multiple fall throughs, we will have a lot of redundant movs.
In such a case, it's better to have this BB which has one mov instruction.

This is causing degradation in jpeg, fft and other codebases. I believe
if we want to remove a BB with only one branch instruction, we should not
pessimize Machine Sinking at all, and find some other solution.

llvm-svn: 335251
2018-06-21 16:05:24 +00:00
Stanislav Mekhanoshin 22ee191c3e DAG combine "and|or (select c, -1, 0), x" -> "select c, x, 0|-1"
Allowed folding for "and/or" binops with non-constant operand if
arguments of select are 0/-1 values.

Normally this code with "and" opcode does not get to a DAG combiner
and simplified yet in the InstCombine. However AMDGPU produces it
during lowering and InstCombine has no chance to optimize it out.

In turn the same pattern with "or" opcode can reach DAG.

Differential Revision: https://reviews.llvm.org/D48301

llvm-svn: 335250
2018-06-21 16:02:05 +00:00
David Green 21a2973cc4 [ARM] Enable useAA() for the in-order Cortex-R52
This option allows codegen (such as DAGCombine or MI scheduling) to use alias
analysis information, which can help with the codegen on in-order cpu's,
especially machine scheduling. Here I have done things the same way as AArch64,
adding a subtarget feature to enable this for specific cores, and enabled it for
the R52 where we have a schedule to make use of it.

Differential Revision: https://reviews.llvm.org/D48074

llvm-svn: 335249
2018-06-21 15:48:29 +00:00
Sanjay Patel 3e5c051a06 [InstCombine] make div/rem vector constant utility function; NFCI
This was originally in D48401 and will be used there.

llvm-svn: 335242
2018-06-21 14:59:35 +00:00
Sameer AbuAsal e01e711c64 [RISCV] Tail calls don't need to save return address
Summary:
 When expanding the PseudoTail in expandFunctionCall() we were using X6
 to save the return address. Since this is a tail call the return
 address is not needed, this patch replaces it with X0 to be ignored.

 This matches the behaviour listed in the ISA V2.2 document page 110.
 tail offset -----> jalr x0, x6, offset

 GCC exhibits the same behavior.

Reviewers: apazos, asb, mgrang

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01

Differential Revision: https://reviews.llvm.org/D48343

llvm-svn: 335239
2018-06-21 14:37:09 +00:00
Mikhail Dvoretckii 22c82af5c8 [x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w]
This should help in lowering the following four intrinsics:
 _mm256_cvtepi32_epi8
 _mm256_cvtepi64_epi16
 _mm256_cvtepi64_epi8
 _mm512_cvtepi64_epi8

Differential Revision: https://reviews.llvm.org/D46957

llvm-svn: 335238
2018-06-21 14:16:45 +00:00
Krzysztof Parzyszek 6a3229301f [CodeGen] Avoid handling DBG_VALUE in LiveRegUnits::stepBackward
Patch by Jesper Antonsson.

Differential Revision: https://reviews.llvm.org/D48420

llvm-svn: 335233
2018-06-21 13:38:43 +00:00
Nicolai Haehnle 15745ba5c1 AMDGPU: Remove redundant MIMG instruction variants
Summary:
For sample and gather ops, we can accurately determine the set of
vaddr-size instruction variants that are required. This reduces
the size of instruction tables by ~5%.

The number of machine instruction opcodes is reduced from 10002
to 9476.

Change-Id: Ie7fc65d3657b762c7816017fe70b2e9bec644a8a

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D48168

llvm-svn: 335232
2018-06-21 13:37:55 +00:00
Nicolai Haehnle db6911a6f9 AMDGPU: Remove old-style image intrinsics
Summary:
This also removes the need for atomic pseudo instructions, since
we select the correct encoding directly in SITargetLowering::lowerImage
for dimension-aware image intrinsics.

Mesa uses dimension-aware image intrinsics since
commit a9a7993441.

Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a

Reviewers: arsenm, rampitec, mareko, tpr, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48167

llvm-svn: 335231
2018-06-21 13:37:45 +00:00
Nicolai Haehnle b29ee70122 InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded
Summary:
Use the expanded features of the TableGen generic tables to avoid manually
adding the combinatorially exploded set of intrinsics. The
getAMDGPUImageDimIntrinsic lookup function is early-out,
i.e. non-AMDGPU intrinsics will never look at the underlying table.

Use a generic approach for getting the new intrinsic overload to keep the
code simple, and make the image dmask handling more generic:
- handle non-sampler image loads
- handle the case where the set of demanded elements is not a prefix

There is some overlap between this code and an optimization that happens
in the backend during code generation. They currently complement each other:

- only the codegen optimization can generate vec3 loads
- only the InstCombine optimization can handle D16

The InstCombine optimization also likely covers more cases since the
codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization
in codegen once the infrastructure for vec3 is in place (which will probably
take a long time).

Modify the test cases to use dimension-aware intrinsics. This makes it
easier to see that the test coverage for the new intrinsics is equivalent,
and the old style intrinsics will be removed in a follow-up commit anyway.

Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad

Reviewers: arsenm, rampitec, majnemer

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48165

llvm-svn: 335230
2018-06-21 13:37:31 +00:00
Nicolai Haehnle 7a9c03f484 AMDGPU: Select MIMG instructions manually in SITargetLowering
Summary:
Having TableGen patterns for image intrinsics is hitting limitations:
for D16 we already have to manually pre-lower the packing of data
values, and we will have to do the same for A16 eventually.

Since there is already some custom C++ code anyway, it is arguably easier
to just do everything in C++, now that we can use the beefed-up generic
tables backend of TableGen to provide all the required metadata and map
intrinsics to corresponding opcodes. With this approach, all image
intrinsic lowering happens in SITargetLowering::lowerImage. That code is
dense due to all the cases that it handles, but it should still be easier
to follow than what we had before, by virtue of it all being done in a
single location, and by virtue of not relying on the TableGen pattern
magic that very few people really understand.

This means that we will have MachineSDNodes with MIMG instructions
during DAG combining, but that seems alright: previously we had
intrinsic nodes instead, but those are similarly opaque to the generic
CodeGen infrastructure, and the final pattern matching just did a 1:1
translation to machine instructions anyway. If anything, the fact that
we now merge the address words into a vector before DAG combine should
be an advantage.

Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6

Reviewers: arsenm, rampitec, rtaylor, tstellar

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48017

llvm-svn: 335228
2018-06-21 13:36:57 +00:00
Nicolai Haehnle 0ab200b6c9 AMDGPU: Refactor MIMG instruction TableGen using generic tables
Summary:
This allows us to access rich information about MIMG opcodes from C++ code.
Simplifying the mapping between equivalent opcodes of different data size
becomes quite natural.

This also flattens the MIMG-related class and multiclass hierarchy a little,
and collapses together some of the scaffolding for sample and gather4 opcodes.

Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48016

llvm-svn: 335227
2018-06-21 13:36:44 +00:00
Nicolai Haehnle e741d7e0fd AMDGPU: Use generic tables instead of SearchableTable
Summary:

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48014

Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641
llvm-svn: 335226
2018-06-21 13:36:33 +00:00
Nicolai Haehnle 2367f03565 AMDGPU: Pass AMDGPUSampleVariant to MIMG_{Sampler,Gather}(_WQM)
Summary:
This will allows us to provide rich metadata about the instructions
in tables that are accessible by custom C++ code.

Change-Id: Id9305a26304ab6a6cceb6c65c8cd49141cc0101d

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48011

llvm-svn: 335224
2018-06-21 13:36:13 +00:00
Nicolai Haehnle b3a9b68513 AMDGPU: Add implicit def of SCC to kill and indirect pseudos
Summary:
Kill instructions sometimes do use SCC in unusual circumstances, when
v_cmpx cannot be used due to the operands that are involved.

Additionally, even if SCC was never defined by the expansion, kill pseudos
could previously occur between an s_cmp and an s_cbranch_scc, which breaks
the SCC liveness tracking when the pseudo is expanded to split the basic
block. While it would be possible to explicitly mark the SCC as live-in for
the successor basic block, it's simpler to just mark the pseudo as using SCC,
so that such a sequence is never emitted by instruction selection in the
first place.

A similar issue affects indirect source/dest pseudos in principle, although
I haven't been able to come up with a test case where it actually matters
(this affects instruction selection, so a MIR test can't be used).

Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always
Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47761

llvm-svn: 335223
2018-06-21 13:36:08 +00:00
Nicolai Haehnle f267431901 AMDGPU: Turn D16 for MIMG instructions into a regular operand
Summary:
This allows us to reduce the number of different machine instruction
opcodes, which reduces the table sizes and helps flatten the TableGen
multiclass hierarchies.

We can do this because for each hardware MIMG opcode, we have a full set
of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata
and vaddr registers. Instead of having separate D16 machine instructions,
a packed D16 instructions loading e.g. 4 components can simply use the
same V2 opcode variant that non-D16 instructions use.

We still require a TSFlag for D16 buffer instructions, because the
D16-ness of buffer instructions is part of the opcode. Renaming the flag
should help avoid future confusion.

The one non-obvious code change is that for gather4 instructions, the
disassembler can no longer automatically decide whether to use a V2 or
a V4 variant. The existing logic which choose the correct variant for
other MIMG instruction is extended to cover gather4 as well.

As a bonus, some of the assembler error messages are now more helpful
(e.g., complaining about a wrong data size instead of a non-existing
instruction).

While we're at it, delete a whole bunch of dead legacy TableGen code.

Change-Id: I89b02c2841c06f95e662541433e597f5d4553978

Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor

Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47434

llvm-svn: 335222
2018-06-21 13:36:01 +00:00
Nicolai Haehnle 7d69e0f37d TableGen: Allow foreach in multiclass to depend on template args
Summary:
This also allows inner foreach loops to have a list that depends on
the iteration variable of an outer foreach loop. The test cases show
some very simple examples of how this can be used.

This was perhaps the last remaining major non-orthogonality in the
TableGen frontend.

Change-Id: I79b92d41a5c0e7c03cc8af4000c5e1bda5ef464d

Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D47431

llvm-svn: 335221
2018-06-21 13:35:44 +00:00
David Green d143c65de3 [DA] Enable -da-delinearize by default
This enables da-delinearize in Dependence Analysis for delinearizing array
accesses into multiple dimensions. This can help to increase the power of
Dependence analysis on multi-dimensional arrays and prevent having to fall
back to the slower and less accurate MIV tests. It adds static checks on the
bounds of the arrays to ensure that one dimension doesn't overflow into
another, and brings our code in line with our tests.

Differential Revision: https://reviews.llvm.org/D45872

llvm-svn: 335217
2018-06-21 11:53:16 +00:00
Simon Pilgrim 2a9cde026c [X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. 

llvm-svn: 335216
2018-06-21 11:37:13 +00:00
Mikael Holmen 42f7bc96dd [DebugInfo] Make sure all DBG_VALUEs' reguse operands have IsDebug property
Summary:
In some cases, these operands lacked the IsDebug property, which is meant to signal that
they should not affect codegen. This patch adds a check for this property in the
MachineVerifier and adds it where it was missing.

This includes refactorings to use MachineInstrBuilder construction functions instead of
manually setting up the intrinsic everywhere.

Patch by: JesperAntonsson

Reviewers: aprantl, rnk, echristo, javed.absar

Reviewed By: aprantl

Subscribers: qcolombet, sdardis, nemanjai, JDevlieghere, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D48319

llvm-svn: 335214
2018-06-21 10:03:34 +00:00
David Green a465188500 [DAGCombine] Fix alignment for offset loads/stores
The alignment parameter to getExtLoad is treated as a base alignment,
not the alignment of the load (base + offset). When we infer a better
alignment for a Ptr we need to ensure that it applies to the base to
prevent the alignment on the load from being wrong.

This fixes a bug where the alignment could then be used to incorrectly
prove noalias between a load and a store, leading to a miscompile.

Differential Revision: https://reviews.llvm.org/D48029

llvm-svn: 335210
2018-06-21 08:30:07 +00:00
Eric Christopher 12e221d9b9 Remove FIXME comment about WIP.
This is the only line other than the function signature remaining
of the original patch.

llvm-svn: 335208
2018-06-21 07:15:19 +00:00
Eric Christopher 76cfef46f3 Add some explanatory text to the associated symbol support.
llvm-svn: 335207
2018-06-21 07:15:14 +00:00
Florian Hahn d36aa1f763 Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
r335150 should resolve the issues with the clang-with-thin-lto-ubuntu
and clang-with-lto-ubuntu builders.

Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

llvm-svn: 335206
2018-06-21 07:15:08 +00:00
Mikael Holmen 57b33f6aac [DebugInfo] Keep DBG_VALUE undef in LiveDebugVariables
Summary:
Fixes PR36579.

For cases where we had e.g.

 DBG_VALUE 42
 [...]
 DBG_VALUE undef

LiveDebugVariables would discard all undef DBG_VALUEs and then it would
look like the variable had the value 42 throughout the rest of the
function, which is incorrect.

With this patch we don't remove all undef DBG_VALUEs in LiveDebugVariables
so they will be kept after register allocation just like other DBG_VALUEs
which will yield more correct debug information.

Reviewers: aprantl

Reviewed By: aprantl

Subscribers: bjope, Ka-Ka, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48277

llvm-svn: 335205
2018-06-21 07:02:46 +00:00
Chandler Carruth d1dab0c3c0 [PM/LoopUnswitch] Add partial non-trivial unswitching for invariant
conditions feeding a chain of `and`s or `or`s for a branch.

Much like with full non-trivial unswitching, we rely on the pass manager
to handle iterating until all of the profitable unswitches have been
done. This is to allow other more profitable unswitches to fire on any
of the cloned, simpler versions of the loop if viable.

Threading the partial unswiching through the non-trivial unswitching
logic motivated some minor refactorings. If those are too disruptive to
make it reasonable to review this patch, I can separate them out, but
it'll be somewhat timeconsuming so I wanted to send it for initial
review as-is. Feel free to tell me whether it warrants pulling apart.

I've tried to re-use (and factor out) logic form the partial trivial
unswitching, but not as much could be shared as I had haped. Still, this
wasn't as bad as I naively expected.

Some basic testing is added, but I probably need more. Suggestions for
things you'd like to see tested more than welcome. One thing I'd like to
do is add some testing that when we schedule this with loop-instsimplify
it effectively cleans up the cruft created.

Last but not least, this uncovered a bug that has been in loop cloning
the entire time for non-trivial unswitching. Specifically, we didn't
correctly add the outer-most cloned loop to the list of cloned loops.
This meant that LCSSA wouldn't be updated for it hypothetically, and
more significantly that we would never visit it in the loop pass
manager. I noticed this while checking loop-instsimplify by hand. I'll
try to separate this bugfix out into its own patch with a more focused
test. But it is just one line, so shouldn't significantly confuse the
review here.

After this patch, the only missing "feature" in this unswitch I'm aware
of us non-trivial unswitching of switches. I'll try implementing *full*
non-trivial unswitching of switches (which is at least a sound thing to
implement), but *partial* non-trivial unswitching of switches is
something I don't see any sound and principled way to implement. I also
have no interesting test cases for the latter, so I'm not really
worried. The rest of the things that need to be ported are bug-fixes and
more narrow / targeted support for specific issues.

Differential Revision: https://reviews.llvm.org/D47522

llvm-svn: 335203
2018-06-21 06:14:03 +00:00
Michael Zolotukhin 336d75cc73 ProvenanceAnalysis: Store WeakTrackingVH instead of Value* in UnderlyingValue Cache.
Summary:
Since the value stored in the cache might be deleted or replaced with
something else, we need to use tracking ValueHandlers instead of plain
Value pointers. It was discovered in one of internal builds, and
unfortunately there is no small reproducer for the issue.

The cache was introduced in rL327328.

Reviewers: ahatanak, pete

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48407

llvm-svn: 335201
2018-06-21 05:14:00 +00:00
Craig Topper 296526bf46 [X86] Remove masking from 512-bit floating max/min intrinsics. Use select instruction instead.
llvm-svn: 335199
2018-06-21 05:00:56 +00:00
Tim Shen 433b9761ce Revert "[SCEV] Improve zext(A /u B) and zext(A % B)"
This reverts commit r335197, as some bots are not happy.

llvm-svn: 335198
2018-06-21 02:15:32 +00:00
Tim Shen 5af61e0a28 [SCEV] Improve zext(A /u B) and zext(A % B)
Summary:
Try to match udiv and urem patterns, and sink zext down to the leaves.

I'm not entirely sure why some unrelated tests change, but the added <nsw>s seem right.

Reviewers: sanjoy

Subscribers: jlebar, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D48338

llvm-svn: 335197
2018-06-21 01:49:07 +00:00
Wolfgang Pieb 61d8c8d9b3 [DWARF] Improved error reporting for range lists.
Errors found processing the DW_AT_ranges attribute are propagated by lower level 
routines and reported by their callers.

Reviewer: JDevlieghere

Differential Revision: https://reviews.llvm.org/D48344

llvm-svn: 335188
2018-06-20 22:56:37 +00:00
Simon Dardis 0f111dd704 [mips] Add microMIPS specific addressing patterns.
These are identical but use microMIPS instructions instead of MIPS instructions.

Also, flatten the 'let AdditionalPredicates = [InMicroMips]' by using the
ISA_MICROMIPS adjective. Add tests for constant materialization.

Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D48275

llvm-svn: 335185
2018-06-20 22:40:12 +00:00
Alina Sbirlea dfd14adeb0 Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor

Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)

2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:

1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.

2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
  i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
  ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
  - Since Pred is not deleted (BB is), the entry block does not need updating.
  - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
  - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
  - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
  - adding some test owner as subscribers for the interesting tests modified:
    test/CodeGen/X86/avx-cmp.ll
    test/CodeGen/AMDGPU/nested-loop-conditions.ll
    test/CodeGen/AMDGPU/si-annotate-cf.ll
    test/CodeGen/X86/hoist-spill.ll
    test/CodeGen/X86/2006-11-17-IllegalMove.ll

3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.

Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar

Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D48202

llvm-svn: 335183
2018-06-20 22:01:04 +00:00
Alina Sbirlea 201d02c75c [MemorySSA] Verify Phi incoming blocks are block predecessors.
Summary: Make the MemorySSA verify also check that all Phi incoming blocks are block predecessors.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D48333

llvm-svn: 335174
2018-06-20 21:06:13 +00:00
Craig Topper c2696d577b [X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to isel
I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code.

I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering.

Differential Revision: https://reviews.llvm.org/D43608

llvm-svn: 335173
2018-06-20 21:05:02 +00:00
Simon Pilgrim 3d1c8c97b8 [SLPVectorizer] Provide InstructionsState down the BoUpSLP vectorization call tree
As described in D48359, this patch pushes InstructionsState down the BoUpSLP call hierarchy instead of the corresponding raw OpValue. This makes it easier to track the alternate opcode etc. and avoids us having to call getAltOpcode which makes it difficult to support more than one alternate opcode.

Differential Revision: https://reviews.llvm.org/D48382

llvm-svn: 335170
2018-06-20 20:54:52 +00:00
Stanislav Mekhanoshin 20279dc025 Allow binop C1, (select cc, CF, CT) -> select folding
Previously this folding was done only if select is a first operand.
However, for non-commutative operations constant may go before
select.

Differential Revision: https://reviews.llvm.org/D48223

llvm-svn: 335167
2018-06-20 20:24:20 +00:00
Simon Dardis 6021424c10 [mips] Correct predicates for loads, bit manipulation instructions and some pseudos
Additionally, correct the definition of the rdhwr instruction.

Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D48216

llvm-svn: 335162
2018-06-20 19:59:58 +00:00
Matt Arsenault 5a4ec8127f AMDGPU: Fix scalar_to_vector for v4i16/v4f16
llvm-svn: 335161
2018-06-20 19:45:48 +00:00
Matt Arsenault 3d06668ad4 AMDGPU: Fix missing C++ mode comment
llvm-svn: 335160
2018-06-20 19:45:40 +00:00
Sanjay Patel 3597588493 [IR] add/use isIntDivRem convenience function
There are more existing potential users of this,
but I've limited this patch to the first couple
that I found to minimize typo risk.

llvm-svn: 335157
2018-06-20 19:02:17 +00:00
Chandler Carruth 4da3331d3d [PM/LoopUnswitch] Support partial trivial unswitching.
The idea of partial unswitching is to take a *part* of a branch's
condition that is loop invariant and just unswitching that part. This
primarily makes sense with i1 conditions of branches as opposed to
switches. When dealing with i1 conditions, we can easily extract loop
invariant inputs to a a branch and unswitch them to test them entirely
outside the loop.

As part of this, we now create much more significant cruft in the loop
body, so this relies on adding cleanup passes to the loop pipeline and
revisiting unswitched loops to do that cleanup before continuing to
process them.

This already appears to be more powerful at unswitching than the old
loop unswitch pass, and so I'd appreciate pretty careful review in case
I'm just missing some correctness checks. The `LIV-loop-condition` test
case is not unswitched by the old unswitch pass, but is with this pass.

Thanks to Sanjoy and Fedor for the review!

Differential Revision: https://reviews.llvm.org/D46706

llvm-svn: 335156
2018-06-20 18:57:07 +00:00
Alex Bradbury fafdebcfcb [RISCV] Accept fmv.s.x and fmv.x.s as mnemonic aliases for fmv.w.x and fmv.x.w
These instructions were renamed in version 2.2 of the user-level ISA spec, but 
the old name should also be accepted by standard tools.

llvm-svn: 335154
2018-06-20 18:42:25 +00:00
Vedant Kumar 4e93f3dcf8 [Local] Generalize insertReplacementDbgValues, NFC
This utility should operate on Values, not Instructions. While I'm here,
I've also made it possible to skip emitting replacement dbg.values for
certain debug users (by having RewriteExpr return nullptr).

llvm-svn: 335152
2018-06-20 18:40:14 +00:00
Florian Hahn 5ac2629823 [PredicateInfo] Order instructions in different BBs by DFSNumIn.
Using OrderedInstructions::dominates as comparator for instructions in
BBs without dominance relation can cause a non-deterministic order
between such instructions. That in turn can cause us to materialize
copies in a non-deterministic order. While this does not effect
correctness, it causes some minor non-determinism in the final generated
code, because values have slightly different labels.

Without this patch, running -print-predicateinfo on a reasonably large
module produces slightly different output on each run.

This patch uses the dominator trees DFSInNum to order instruction from
different BBs, which should enforce a deterministic ordering and
guarantee that dominated instructions come after the instructions that
dominate them.

Reviewers: dberlin, efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D48230

llvm-svn: 335150
2018-06-20 17:42:01 +00:00
Paul Robinson 8e3e374e5f [DWARF] Don't keep a ref to possibly stack allocated data.
llvm-svn: 335146
2018-06-20 17:08:46 +00:00
Vlad Tsyrklevich 34b9112216 IRMover: Account for matching types present across modules
Summary:
Due to uniqueing of DICompositeTypes, it's possible for a type from one
module to be loaded into another earlier module without being renamed.
Then when the defining module is being IRMoved, the type can be used as
a Mapping destination before being loaded, such that when it's requested
using TypeMapTy::get() it will fail with an assertion that the type is a
source type when it's actually a type in both the source and
destination modules. Correctly handle that case by allowing a non-opaque
non-literal struct type be present in both modules.

Fix for PR37684.

Reviewers: pcc, tejohnson

Reviewed By: pcc, tejohnson

Subscribers: tobiasvk, mehdi_amini, steven_wu, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D47898

llvm-svn: 335145
2018-06-20 16:50:56 +00:00
Vedant Kumar 6fa24b0b7f [Local] Add a utility to insert replacement dbg.values, NFC
The purpose of this utility is to make it easier for optimizations to
insert replacement dbg.values for instructions they are deleting. This
is useful in situations where salvageDebugInfo is inapplicable, say,
because the new dbg.value cannot refer to an operand of the dying value.

The utility is called insertReplacementDbgValues.

It assumes that the instruction 'From' is going to be deleted, and
inserts replacement dbg.values for each debug user of 'From'. The
newly-inserted dbg.values refer to 'To' instead of 'From'. Each
replacement dbg.value has the same location and variable as the debug
user it replaces, has a DIExpression determined by the result of
'RewriteExpr' applied to an old debug user of 'From', and is placed
before 'InsertBefore'.

This should simplify future patches, like D48331.

llvm-svn: 335144
2018-06-20 16:50:25 +00:00
Paul Robinson 37fbb92652 Remove a redundant initialization. NFC
llvm-svn: 335143
2018-06-20 16:12:03 +00:00
Simon Pilgrim 292651a5b7 [SLPVectorizer] Move isOneOf after InstructionsState type. NFCI.
A future patch will have isOneOf use InstructionsState.

llvm-svn: 335142
2018-06-20 16:11:00 +00:00
Bjorn Pettersson 7bf676662a [DAG] Don't map a TableId to itself in the ReplacedValues map
Summary:
Found some regressions (infinite loop in DAGTypeLegalizer::RemapId)
after r334880. This patch makes sure that we do map a TableId to
itself.

Reviewers: niravd

Reviewed By: niravd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48364

llvm-svn: 335141
2018-06-20 16:06:09 +00:00
Nirav Dave cd558887d3 [DAG] Fix and-mask folding when narrowing loads.
Summary:
Check that and masks are strictly smaller than implicit mask from
narrowed load.

Fixes PR37820.

Reviewers: samparker, RKSimon, nemanjai

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48335

llvm-svn: 335137
2018-06-20 15:36:29 +00:00
Sam Clegg 52564675c4 [WebAssembly] Update know failures for the wasm waterfall
Summary:
The waterfall no longer builds .s files and no longers uses
the wasm-o when it builds object files.

Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48371

llvm-svn: 335135
2018-06-20 15:17:12 +00:00
Simon Pilgrim 0c9f8dcde7 [SLPVectorizer] Use InstructionsState to record AltOpcode
This is part of a move towards generalizing the alternate opcode mechanism and not just supporting (F)Add/(F)Sub counterparts.

The patch embeds the AltOpcode in the InstructionsState instead of calling getAltOpcode so often.

I'm hoping to eventually remove all uses of getAltOpcode and handle alternate opcode selection entirely within getSameOpcode, that will require us to use InstructionsState throughout the BoUpSLP call hierarchy (similar to some of the changes in D28907), which I will begin in future patches.

Differential Revision: https://reviews.llvm.org/D48359

llvm-svn: 335134
2018-06-20 15:13:40 +00:00
Simon Pilgrim 2e2f20a949 [SLPVectorizer] Relax "alternate" opcode vectorisation to work with any SK_Select shuffle pattern
D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand.

This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation.

The AArch64 regression will be fixed by D48172.

Differential Revision: https://reviews.llvm.org/D48174

llvm-svn: 335130
2018-06-20 14:26:28 +00:00
Sanjay Patel 0c57de4c21 [InstSimplify] Fix missed optimization in simplifyUnsignedRangeCheck()
For both operands are unsigned, the following optimizations are valid, and missing:

   1. X > Y && X != 0 --> X > Y
   2. X > Y || X != 0 --> X != 0
   3. X <= Y || X != 0 --> true
   4. X <= Y || X == 0 --> X <= Y
   5. X > Y && X == 0 --> false

unsigned foo(unsigned x, unsigned y) { return x > y && x != 0; }
should fold to x > y, but I found we haven't done it right now.
besides, unsigned foo(unsigned x, unsigned y) { return x < y && y != 0; }
Has been folded to x < y, so there may be a bug.

Patch by: Li Jia He!

Differential Revision: https://reviews.llvm.org/D47922

llvm-svn: 335129
2018-06-20 14:22:49 +00:00
Alex Bradbury 79d2b50ca8 [RISCV] Add InstAlias definitions for fgt.{s|d}, fge.{s|d}
These are produced by GCC and supported by GAS, but not currently contained in 
the pseudoinstruction listing in the RISC-V ISA manual.

llvm-svn: 335127
2018-06-20 14:03:02 +00:00
Krzysztof Parzyszek d8b780dcd6 [Hexagon] Remove 'T' from HasVNN predicates, NFC
Patch by Sumanth Gundapaneni.

llvm-svn: 335124
2018-06-20 13:56:09 +00:00
Simon Dardis eae99120b0 [mips] Fix the predicates of some DSP instructions from AdditionalPredicates to ASEPredicate
Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D48166

llvm-svn: 335122
2018-06-20 13:29:57 +00:00
Sanjay Patel 825a4faa8d [InstCombine] ignore debuginfo when removing redundant assumes (PR37726)
This is similar to:
rL335083

Fixes::
https://bugs.llvm.org/show_bug.cgi?id=37726

llvm-svn: 335121
2018-06-20 13:22:26 +00:00
Alex Bradbury 18b9bd7d6c [RISCV] Add InstAlias definitions for sgt and sgtu
These are produced by GCC and supported by GAS, but not currently contained in 
the pseudoinstruction listing in the RISC-V ISA manual.

llvm-svn: 335120
2018-06-20 12:54:02 +00:00
Tim Northover 644a819534 ARM: convert ORR instructions to ADD where possible on Thumb.
Thumb has more 16-bit encoding space dedicated to ADD than ORR, allowing both a
3-address encoding and a wider range of immediates. So, particularly when
optimizing for code size (but it doesn't make things worse elsewhere) it's
beneficial to select an OR operation to an ADD if we know overflow won't occur.

This is made even better by LLVM's penchant for putting operations in canonical
form by converting the other way.

llvm-svn: 335119
2018-06-20 12:09:44 +00:00
Tim Northover 70666e7765 [AArch64] Implement FLT_ROUNDS macro.
Very similar to ARM implementation, just maps to an MRS.

Should fix PR25191.

Patch by Michael Brase.

llvm-svn: 335118
2018-06-20 12:09:01 +00:00
Andrea Di Biagio 2145b13fc9 [llvm-mca][X86] Teach how to identify register writes that implicitly clear the upper portion of a super-register.
This patch teaches llvm-mca how to identify register writes that implicitly zero
the upper portion of a super-register.

On X86-64, a general purpose register is implemented in hardware as a 64-bit
register. Quoting the Intel 64 Software Developer's Manual: "an update to the
lower 32 bits of a 64 bit integer register is architecturally defined to zero
extend the upper 32 bits".  Also, a write to an XMM register performed by an AVX
instruction implicitly zeroes the upper 128 bits of the aliasing YMM register.

This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis
interface to help identify instructions that implicitly clear the upper portion
of a super-register.  The rest of the patch teaches llvm-mca how to use that new
method to obtain the information, and update the register dependencies
accordingly.

I compared the kernels from tests clear-super-register-1.s and
clear-super-register-2.s against the output from perf on btver2.  Previously
there was a large discrepancy between the estimated IPC and the measured IPC.
Now the differences are mostly in the noise.

Differential Revision: https://reviews.llvm.org/D48225

llvm-svn: 335113
2018-06-20 10:08:11 +00:00
Simon Pilgrim b7ac037797 [SLPVectorizer] Split Tree/Reduction cost calls to simplify debugging. NFCI.
llvm-svn: 335110
2018-06-20 09:39:01 +00:00
Roman Lebedev 42a1ff11fb [NFC][SCEV] Add tests related to bit masking (PR37793)
Summary:
Related to https://bugs.llvm.org/show_bug.cgi?id=37793, https://reviews.llvm.org/D46760#1127287

We'd like to do this canonicalization https://rise4fun.com/Alive/Gmc
But it is currently restricted by rL155136 / rL155362, which says:
```
    // This is a constant shift of a constant shift. Be careful about hiding
    // shl instructions behind bit masks. They are used to represent multiplies
    // by a constant, and it is important that simple arithmetic expressions
    // are still recognizable by scalar evolution.
    //
    // The transforms applied to shl are very similar to the transforms applied
    // to mul by constant. We can be more aggressive about optimizing right
    // shifts.
    //
    // Combinations of right and left shifts will still be optimized in
    // DAGCombine where scalar evolution no longer applies.
```

I think these tests show that for *constants*, SCEV has no issues with that canonicalization.

Reviewers: mkazantsev, spatel, efriedma, sanjoy

Reviewed By: mkazantsev

Subscribers: sanjoy, javed.absar, llvm-commits, stoklund, bixia

Differential Revision: https://reviews.llvm.org/D48229

llvm-svn: 335101
2018-06-20 07:54:11 +00:00
Roman Lebedev d23b6831de [X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.
Summary:
First off: i do not have any access to that processor,
so this is purely theoretical, no benchmarks.

I have been looking into b**d**ver2 scheduling profile, and while cross-referencing
the existing b**t**ver2, znver1 profiles, and the reference docs
(`Software Optimization Guide for AMD Family {15,16,17}h Processors`),
i have noticed that only b**t**ver2 scheduling profile specifies these.

Also, there is no mca test coverage.

Reviewers: RKSimon, craig.topper, courbet, GGanesh, andreadb

Reviewed By: GGanesh

Subscribers: gbedwell, vprasad, ddibyend, shivaram, Ashutosh, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D47676

llvm-svn: 335099
2018-06-20 07:01:14 +00:00
Clement Courbet 7b9913fb9f [X86] Add sched class WriteLAHFSAHF and fix values.
Summary:
I ran llvm-exegesis on SKX, SKL, BDW, HSW, SNB.
Atom is from Agner and SLM is a guess.
I've left AMD processors alone.

Reviewers: RKSimon, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48079

llvm-svn: 335097
2018-06-20 06:13:39 +00:00
Hiroshi Inoue c73b6d6bf7 [NFC] fix trivial typos in comments
llvm-svn: 335096
2018-06-20 05:29:26 +00:00
Craig Topper ddd88a559f [DAGCombiner] Add some comments to some true/false arguments to make it obvious what they are. NFC
llvm-svn: 335095
2018-06-20 04:32:07 +00:00
Craig Topper 59da74370b [SelectionDAG] Don't crash on inline assembly errors when the inline assembly return type is a struct.
Summary:
If we get an error building the SelectionDAG for inline assembly we try to continue and still build the DAG.

But if the return type for the inline assembly is a struct we end up crashing because we try to create an UNDEF node with a struct type which isn't valid.

Instead we need to create an UNDEF for each element of the struct and join them with merge_values.

This patch relies on single operand merge_values being handled gracefully by getMergeValues. If the return type is void there will be no VTs returned by ComputeValueVTs and now we just return instead of calling setValue. Hopefully that's ok, I assumed nothing would need to look up the mapped value for void node.

Fixes PR37359

Reviewers: rengolin, rovka, echristo, efriedma, bogner

Reviewed By: efriedma

Subscribers: craig.topper, llvm-commits

Differential Revision: https://reviews.llvm.org/D46560

llvm-svn: 335093
2018-06-20 04:32:05 +00:00
Craig Topper d22ad8568c [X86] Use binary search of the EVEX->VEX static tables instead of populating two DenseMaps for lookups
Summary:
After r335018, the static tables are guaranteed sorted by the EVEX opcode to convert. We can use this to do a binary search and remove the need for any secondary data structures.

Right now one table is 736 entries and the other is 482 entries. It might make sense to merge the two tables as a follow up. The effort it takes to select the table is probably similar to the extra binary search step it would require for a larger table.

I haven't done any measurements to see if this has any effect on compile time, but I don't imagine that EVEX->VEX conversion is a place we spend a lot of time.

Reviewers: RKSimon, spatel, chandlerc

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48312

llvm-svn: 335092
2018-06-20 04:32:04 +00:00
Vlad Tsyrklevich 98724e582e Revert r334980 and 334983
This reverts commits r334980 and r334983 because they were causing build
timeouts on the x86_64-linux-ubsan bot.

llvm-svn: 335085
2018-06-20 00:02:32 +00:00
Vedant Kumar f01827f2d1 [IR] Introduce helpers to skip debug instructions (NFC)
This patch introduces two helpers to make it easier to ignore debug
intrinsics:

- Instruction::getNextNonDebugInstruction()

This is just like Instruction::getNextNode(), except that it skips debug
info.

- skipDebugInfo(BasicBlock::iterator)

A free function which advances a BasicBlock iterator past any debug
info. This is a no-op when the iterator already points to a non-debug
instruction.

Part of: llvm.org/PR37728
Related to: https://reviews.llvm.org/D47874

Differential Revision: https://reviews.llvm.org/D48305

llvm-svn: 335083
2018-06-19 23:42:17 +00:00
Philip Reames 8befa295ea [InlineSpiller] Fix a crash due to lack of forward progress from remat specifically for STATEPOINT
This patch covers up a fairly fundemental issue around remat and register allocation which shows up with psuedo instructions with more vreg uses than there are physical registers.  This patch essentially just disables remat for STATEPOINTs which are the only case we've seen so far, but long term we need a better fix.

For STATEPOINTs specifically, this is a strict improvement.  It unblocks progress towards enabling a currently off-by-default mode which integrates deopt bundle operand lowering with register allocator spilling so that we end up with smaller stack sizes and more optimally placed spills.  Assming no other issues turn up during my next round of integration testing - which based on experience so far, is admittedly unlikely - we might finally be able to enable something I've been working towards in small bits and pieces for years now.  :)

For psuedo ops in general, there are a couple of ideas for a "proper fix" discussed on the bug, but I'm far enough outside my knowledge area to not be able to see any of them through to a successful conclusion.  If anyone wants to help out here, please do.  

Differential Revision: https://reviews.llvm.org/D41098

llvm-svn: 335077
2018-06-19 21:19:59 +00:00
Jessica Paquette 32de26d432 [MachineOutliner] NFC: Remove insertOutlinerPrologue, rename insertOutlinerEpilogue
insertOutlinerPrologue was not used by any target, and prologue-esque code was
beginning to appear in insertOutlinerEpilogue. Refactor that into one function,
buildOutlinedFrame.

This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue.

llvm-svn: 335076
2018-06-19 21:14:48 +00:00
Heejin Ahn 891a747266 [WebAssembly] Fix liveness tracking info after drop insertion
Summary:
This fixes liveness tracking information after `drop` instruction
insertion in ExplicitLocals pass.

When a drop instruction is inserted to drop a dead register operand, the
original operand should be marked not dead anymore because it is now
used by the new drop instruction. And the operand to the new drop
instruction should be marked killed instead. This bug caused some
programs to fail when `llc` is run with `-verify-machineinstrs` option.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48253

llvm-svn: 335074
2018-06-19 20:30:42 +00:00
Sanjay Patel 2ca3360b11 [IR] move shuffle mask queries from TTI to ShuffleVectorInst
The optimizer is getting smarter (eg, D47986) about differentiating shuffles 
based on its mask values, so we should make queries on the mask constant 
operand generally available to avoid code duplication.

We'll probably use this soon in the vectorizers and instcombine (D48023 and 
https://bugs.llvm.org/show_bug.cgi?id=37806).

We might clean up TTI a bit more once all of its current 'SK_*' options are 
covered.

Differential Revision: https://reviews.llvm.org/D48236

llvm-svn: 335067
2018-06-19 18:44:00 +00:00
Matt Davis a245c765a8 [MIRParser] Update a diagnostic message to use the correct register sigil. NFC
Summary:
Patch r323922 changed the sigil for physical registers to '$',  instead of '%'.
An error message was missed during this change, and reports the wrong sigil.
This patch corrects that diagnostic and the tests that check that error string.


Reviewers: zer0, bjope

Reviewed By: bjope

Subscribers: bjope, thegameg, plotfi, llvm-commits

Differential Revision: https://reviews.llvm.org/D48086

llvm-svn: 335066
2018-06-19 18:39:40 +00:00
Krzysztof Parzyszek 03aa8f3a24 [Hexagon] Fix the value of HexagonII::TypeCVI_FIRST
This value is the first vector instruction type in numerical order. The
previous value was incorrect, leaving TypeCVI_GATHER outside of the range
for vector instructions. This caused vector .new instructions to be
incorrectly encoded in the presence of gather.

llvm-svn: 335065
2018-06-19 18:09:54 +00:00
Craig Topper 0b7936737b [X86] Initialize FMA3Info directly in its constructor instead of relying on std::call_once
FMA3Info only exists as a managed static. As far as I know the ManagedStatic construction proccess is thread safe. It doesn't look like we ever access the ManagedStatic object without immediately doing a query on it that would require the map to be populated. So I don't think we're ever deferring the calculation of the tables from the construction of the object.

So I think we should be able to just populate the FMA3Info map directly in the constructor and get rid of all of the initGroupsOnce stuff.

Differential Revision: https://reviews.llvm.org/D48194

llvm-svn: 335064
2018-06-19 18:06:52 +00:00
Craig Topper 7ffa976993 [X86] Don't fold unaligned loads into SSE ROUNDPS/ROUNDPD for ceil/floor/nearbyint/rint/trunc.
Incorrect patterns were added in r334460. This changes them to check alignment properly for SSE.

llvm-svn: 335062
2018-06-19 17:51:42 +00:00
Krzysztof Parzyszek 5c2944c4f2 [Hexagon] Enforce restrictions on packetizing cache instructions
llvm-svn: 335061
2018-06-19 17:26:20 +00:00
Simon Dardis af38a8fed6 [mips] Mark microMIPS64 as being unsupported.
There are no provided instruction definitions for this architecture.

Reviewers: smaksimovic, atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D48320

llvm-svn: 335057
2018-06-19 16:05:44 +00:00
Simon Dardis 9e80c340f7 [mips] Fix the predicates of some aliases
Previously, some aliases were marked as not being available for microMIPS32R6,
but this was overridden at the top level.

Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D48321

llvm-svn: 335053
2018-06-19 15:25:01 +00:00
Simon Pilgrim 0461393660 [SLPVectorizer] Remove default OperandValueKind arguments from getArithmeticInstrCost calls (NFC)
The getArithmeticInstrCost calls for shuffle vectors entry costs specify TargetTransformInfo::OperandValueKind arguments, but are just using the method's default values. This seems to be a copy + paste issue and doesn't affect the costs in anyway. The TargetTransformInfo::OperandValueProperties default arguments are already not being used.

Noticed while working on D47985.

Differential Revision: https://reviews.llvm.org/D48008

llvm-svn: 335045
2018-06-19 13:40:00 +00:00
Strahinja Petrovic bb2b00bb80 [PowerPC] Fix label address calculation for ppc32
This patch fixes calculating address of label on ppc32 (for -fPIC).

Differential Revision: https://reviews.llvm.org/D46582

llvm-svn: 335043
2018-06-19 13:07:40 +00:00
Mikhail Dvoretckii 8393f90717 [InstCombine] Replacing X86-specific rounding intrinsics with generic floor-ceil
This patch replaces calls to X86-specific intrinsics with floor-ceil semantics
with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This
doesn't affect the resulting machine code, as those intrinsics are lowered to
the same instructions, but exposes these specific rounding cases to generic
optimizations.

Differential Revision: https://reviews.llvm.org/D48067

llvm-svn: 335039
2018-06-19 10:49:12 +00:00
Mikhail Dvoretckii b1ce7765be [X86] VRNDSCALE* folding from masked and scalar ffloor and fceil patterns
This patch handles back-end folding of generic patterns created by lowering the
X86 rounding intrinsics to native IR in cases where the instruction isn't a
straightforward packed values rounding operation, but a masked operation or a
scalar operation.

Differential Revision: https://reviews.llvm.org/D45203

llvm-svn: 335037
2018-06-19 10:37:52 +00:00
David Green e6a9c24878 [LoopSimplifyCFG] Invalidate SCEV in LoopSimplifyCFG
LoopSimplifyCFG, being a loop pass, needs to preserve scalar
evolution. This invalidates SE for the loops altered during
block merging.

Differential Revision: https://reviews.llvm.org/D48258

llvm-svn: 335036
2018-06-19 09:43:36 +00:00
Simon Pilgrim c966f7213e [SLPVectorizer] Pull out AltOpcode determination from reorderAltShuffleOperands.
Minor step towards making the alternate opcode system work with a wider range of opcode pairs.

llvm-svn: 335032
2018-06-19 09:16:06 +00:00
Bjorn Pettersson 2015a39955 Remove valueCoversEntireFragment asserts in ConvertDebugDeclareToDebugValue
This is a fixup for r334830 causing problems in polly-aosp buildbot.

Focus in r334830 was to fix a problem seen with
ConvertDebugDeclareToDebugValue involving store instructions.
It also added some asserts to find out of similar problems
existed for the ConvertDebugDeclareToDebugValue functions
involving load and phi instructions. One of those asserts seems
to blow in the polly-aosp buildbot, so I'll revert the asserts
while debugging.

llvm-svn: 335031
2018-06-19 08:41:34 +00:00
Florian Hahn d8fcf0de31 [LoopInterchange] Move PHI handling to adjustLoopBranches.
This patch moves the logic to handle reduction PHI nodes to the end of
adjustLoopBranches. Reduction PHI nodes in the outer loop header can be
moved to the inner loop header and reduction PHI nodes from the inner loop
header can be moved to the outer loop header. In the latter situation,
we have to deal with 1 kind of PHI nodes:

    PHI nodes that are part of inner loop-only reductions.

We can replace the PHI node with the value coming from outside
the inner loop.

Reviewers: mcrosier, efriedma, karthikthecool

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D46198

llvm-svn: 335027
2018-06-19 08:03:24 +00:00
Mikhail Dvoretckii bd8ed2dbaa Test commit.
llvm-svn: 335026
2018-06-19 07:55:10 +00:00
QingShan Zhang 9f0fe9a3f8 If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction when we are loading a floating,
and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction.

Differential Revision: https://reviews.llvm.org/D47568
Reviewed By: Nemanjai

llvm-svn: 335024
2018-06-19 06:54:51 +00:00
Max Kazantsev 37da4333a8 [SimplifyIndVars] Eliminate redundant truncs
This patch adds logic to deal with the following constructions:

  %iv = phi i64 ...
  %trunc = trunc i64 %iv to i32
  %cmp = icmp <pred> i32 %trunc, %invariant

Replacing it with
  %iv = phi i64 ...
  %cmp = icmp <pred> i64 %iv, sext/zext(%invariant)

In case if it is legal. Specifically, if `%iv` has signed comparison users, it is
required that `sext(trunc(%iv)) == %iv`, and if it has unsigned comparison
uses then we require `zext(trunc(%iv)) == %iv`. The current implementation
bails if `%trunc` has other uses than `icmp`, but in theory we can handle more
cases here (e.g. if the user of trunc is bitcast).

Differential Revision: https://reviews.llvm.org/D47928
Reviewed By: reames

llvm-svn: 335020
2018-06-19 04:48:34 +00:00
Craig Topper c2965214ef [X86] Add the ability to force an EVEX2VEX mapping table entry from the .td files. Remove remaining manual table entries from the tablegen emitter.
This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information.

Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions.

Finally, remove the manual table from the emitter.

This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap.

llvm-svn: 335018
2018-06-19 04:24:44 +00:00
Craig Topper 0a5e90cc2a [X86] Add a new VEX_WPrefix encoding to tag EVEX instruction that have VEX.W==1, but can be converted to their VEX equivalent that uses VEX.W==0.
EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity.

The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1.

This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0.

This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler.

llvm-svn: 335017
2018-06-19 04:24:42 +00:00
Sanjoy Das 6e9b355cc9 Revert "[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags"
This reverts r334428.  It incorrectly marks some multiplications as nuw.  Tim
Shen is working on a proper fix.

Original commit message:

[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe.

Summary:
Previously we would add them for adds, but not multiplies.

llvm-svn: 335016
2018-06-19 04:09:44 +00:00
Craig Topper 46c0b368d6 [X86] Simplify the TSFlags checking code in EvexToVexInstPass. NFCI
The code was previously checking the L2 and L flag on 3 separate lines, treating the combination as an encoding. Instead its better to think of the L2 bit as being something that can't be done with VEX and early returning. Then we just need to check the L bit.

llvm-svn: 335015
2018-06-19 03:17:46 +00:00
Heejin Ahn 817811caae [WebAssembly] Add more utility functions
Summary:
Added more utility functions that will be used in EH-related passes Also
changed `LoopBottom` function to `getBottom` and uses templates to be
able to handle other classes as well, which will be used in CFGSort
later.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48262

llvm-svn: 335006
2018-06-19 00:32:03 +00:00
Heejin Ahn 33c3fce592 [WebAssembly] Add WasmEHFuncInfo for unwind destination information
Summary:
Add WasmEHFuncInfo and routines to calculate and fill in this struct to
keep track of unwind destination information. This will be used in
other EH related passes.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D48263

llvm-svn: 335005
2018-06-19 00:26:39 +00:00
Heejin Ahn 6279d71227 [WebAssembly] Make rethrow instruction take a target BB argument
Summary:
This patch changes the rethrow instruction to take a BB argument in LLVM
backend, like `br` and `br_if`s. This BB is a target catch BB the
rethrow instruction unwinds to. This BB argument will be converted to an
relative depth immediate at the end of CFGStackify pass, as in the same
way of branches.

RETHROW_TO_CALLER is a codegen-only instruction that should be used when
a rethrow instruction does not have an unwind destination BB, i.e., it
should rethrow to its caller function.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D48260

llvm-svn: 334998
2018-06-18 23:54:29 +00:00
Michael Berg 7b993d762f Utilize new SDNode flag functionality to expand current support for fadd
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.

Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar

Reviewed By: spatel

Subscribers: wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D47909

llvm-svn: 334996
2018-06-18 23:44:59 +00:00
Craig Topper a7b7f2f4d8 [X86] Remove ReadAfterLd from avx512_shift_rmbi multiclass.
The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd?

llvm-svn: 334994
2018-06-18 23:20:57 +00:00
Xin Tong 54b4227f32 Revert "Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor"
This reverts commit f976cf4cca0794267f28b54e468007fd476d37d9.

I am reverting this because it causes break in a few bots and its going
to take me sometime to look at this.

llvm-svn: 334993
2018-06-18 23:20:08 +00:00
Xin Tong bfd8cfcb8d Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor
Summary:
Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor

This is a missing small optimization in MergeBlockIntoPredecessor.

This helps with one simplifycfg test which expects this case to be handled.

Reviewers: davide, spatel, brzycki, asbirlea

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48284

llvm-svn: 334992
2018-06-18 22:59:13 +00:00
Eric Christopher 88bbad24c0 Tidy comment language and explanation.
llvm-svn: 334990
2018-06-18 22:21:19 +00:00
Eric Christopher b1faaf3069 Pull non-lazy stub table emission into a separate function alongside
the individual stub creation to increase readability a bit in the
non-object file format specific function.

llvm-svn: 334989
2018-06-18 22:21:18 +00:00
Eric Christopher b6d0b99f3b Add return statements to make it clear that all of these are mutually exclusive conditions.
else if would have worked just as well, but this keeps the original readability a bit more clear.

llvm-svn: 334988
2018-06-18 22:21:13 +00:00
Wouter van Oortmerssen 48dac3109e [WebAssembly] Modified tablegen defs to have 2 parallel instuction sets.
Summary:
One for register based, much like the existing definitions,
and one for stack based (suffix _S).

This allows us to use registers in most of LLVM (which works better),
and stack based in MC (which results in a simpler and more readable
assembler / disassembler).

Tried to keep this change as small as possible while passing tests,
follow-up commit will:
- Add reg->stack conversion in MI.
- Fix asm/disasm in MC to be stack based.
- Fix emitter to be stack based.

tests passing:
llvm-lit -v `find test -name WebAssembly`

test/CodeGen/WebAssembly
test/MC/WebAssembly
test/MC/Disassembler/WebAssembly
test/DebugInfo/WebAssembly
test/CodeGen/MIR/WebAssembly
test/tools/llvm-objdump/WebAssembly

Reviewers: dschuff, sbc100, jgravelle-google, sunfish

Subscribers: aheejin, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48183

llvm-svn: 334985
2018-06-18 21:22:44 +00:00
Michael Berg 932ba20af8 refactor of visitFADD for AllowNewConst cases
Summary: Refactoring for all constant cases which require AllowNewConst and some staging for future fmf usage.

Reviewers: spatel, hfinkel, wristow

Reviewed By: spatel

Subscribers: nhaehnle

Differential Revision: https://reviews.llvm.org/D48289

llvm-svn: 334984
2018-06-18 21:12:21 +00:00
Sander de Smalen 067eee1c13 [AArch64][SVE] Asm: Fix predicate pattern diagnostics.
This patch uses the DiagnosticPredicate for SVE predicate patterns
to improve their diagnostics, now giving a 'invalid operand' diagnostic
if the type is not an immediate or one of the expected pattern
labels.

Reviewers: samparker, SjoerdMeijer, javed.absar, fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48220

llvm-svn: 334983
2018-06-18 21:03:02 +00:00
Sander de Smalen 7ac9e193ec [AArch64][SVE] Asm: Support for saturating INC/DEC (32bit scalar) instructions.
The variants added by this patch are:
- SQINC     signed increment, e.g. sqinc x0, w0, all, mul #4
- SQDEC     signed decrement, e.g. sqdec x0, w0, all, mul #4
- UQINC   unsigned increment, e.g. uqinc w0, all, mul #4
- UQDEC   unsigned decrement, e.g. uqdec w0, all, mul #4
 
This patch includes asmparser changes to parse a GPR64 as a GPR32 in
order to satisfy the constraint check:
  x0 == GPR64(w0)
in:
  sqinc x0, w0, all, mul #4
         ^___^ (must match)

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47716

llvm-svn: 334980
2018-06-18 20:50:33 +00:00
Wouter van Oortmerssen 78c62966c2 [WebAssembly] Cleaned up register accessors in WebAssemblyMachineFunctionInfo.h
Tested: llvm-lit -v `find test -name WebAssembly`

(This is a commit access "test commit" :)

llvm-svn: 334979
2018-06-18 20:45:49 +00:00
Craig Topper 17bd84c12c [X86] Encode the EVEX2VEX exception list information in .td files instead of the emitter source.
Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility.

llvm-svn: 334971
2018-06-18 18:47:07 +00:00
Michael Berg cafe947445 [NFC] make MIFlag accessor functions consistant with usage model
llvm-svn: 334970
2018-06-18 18:37:48 +00:00
Florian Hahn 3385caaafd [VPlan] Add VPInstruction to VPRecipe transformation.
This patch introduces a VPInstructionToVPRecipe transformation, which
allows us to generate code for a VPInstruction based VPlan re-using the
existing infrastructure.

Reviewers: dcaballe, hsaito, mssimpso, hfinkel, rengolin, mkuper, javed.absar, sguggill

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D46827

llvm-svn: 334969
2018-06-18 18:28:49 +00:00
Lang Hames 68c9b8d6a1 [ORC] Add an initial implementation of a replacement CompileOnDemandLayer.
CompileOnDemandLayer2 is a replacement for CompileOnDemandLayer built on the ORC
Core APIs. Functions in added modules are extracted and compiled lazily.
CompileOnDemandLayer2 supports multithreaded JIT'd code, and compilation on
multiple threads.

llvm-svn: 334967
2018-06-18 18:01:43 +00:00
Lang Hames 2e96114caf [ORC] Keep weak flag on VSO symbol tables during materialization, but treat
materializing weak symbols as strong.

This removes some elaborate flag tweaking and plays nicer with RuntimeDyld,
which relies of weak/common flags to determine whether it should emit a given
weak definition. (Switching to strong up-front makes it appear as if there is
already an overriding definition, which would require an extra back-channel to
override).

llvm-svn: 334966
2018-06-18 18:01:41 +00:00
Krzysztof Parzyszek 546017322f Shrink interval after moving copy in removePartialRedundancy
llvm-svn: 334963
2018-06-18 17:16:39 +00:00
Nirav Dave b35f9e1459 Fix typoed cast to avoid assertion in MCFragment::dump.
llvm-svn: 334959
2018-06-18 16:26:11 +00:00
Simon Pilgrim 5b962b2fc3 [SLPVectorizer] Tidyup isShuffle helper
Ensure we keep track of the input vectors in all cases instead of just for SK_Select.

Ideally we'd reuse the shuffle mask pattern matching in TargetTransformInfo::getInstructionThroughput here to easily add support for all TargetTransformInfo::ShuffleKind without mass code duplication, I've added a TODO for now but D48236 should help us here.

Differential Revision: https://reviews.llvm.org/D48023

llvm-svn: 334958
2018-06-18 16:25:01 +00:00
Florian Hahn 63cbcf98a5 [VPlanRecipeBase] Add eraseFromParent().
Reviewers: dcaballe, hsaito, mkuper, hfinkel

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D48081

llvm-svn: 334951
2018-06-18 15:18:48 +00:00
Sander de Smalen 13684d8400 [AArch64][SVE] Asm: Support for saturating INC/DEC (64bit scalar) instructions.
Summary:
The variants added by this patch are:
- SQINC  (signed increment)
- UQINC  (unsigned increment)
- SQDEC  (signed decrement)
- UQDEC  (unsigned decrement)

For example:
  uqincw  x0, all, mul #4

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Differential Revision: https://reviews.llvm.org/D47715

llvm-svn: 334948
2018-06-18 14:47:52 +00:00
Simon Pilgrim 9173c97ce4 [X86][BtVer2] Flag AVX2+ scheduler classes as unsupported
Jaguar only supports up to AVX1

Differential Revision: https://reviews.llvm.org/D48274

llvm-svn: 334947
2018-06-18 14:31:14 +00:00
Florian Hahn 3bcff3662c [VPlan] Fix sanitizer problem with insertBefore.
llvm-svn: 334943
2018-06-18 13:51:28 +00:00
Simon Pilgrim 99a5832016 [SLPVectorizer] Avoid calling const VL.size() repeatedly in for-loop. NFCI.
llvm-svn: 334934
2018-06-18 11:35:36 +00:00
Florian Hahn 7591e4e94a [VPlanRecipeBase] Add insertBefore helper.
Reviewers: dcaballe, mkuper, hfinkel, hsaito, mssimpso

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D48080

llvm-svn: 334933
2018-06-18 11:34:17 +00:00
Sander de Smalen d521c4353e [AArch64][SVE] Asm: Support for vector element compares.
This patch adds instructions for comparing elements from two vectors, e.g.
  cmpgt p0.s, p0/z, z0.s, z1.s

and also adds support for comparing to a 64-bit wide element vector, e.g.
  cmpgt p0.s, p0/z, z0.s, z1.d

The patch also contains aliases for certain comparisons, e.g.:
  cmple p0.s, p0/z, z0.s, z1.s => cmpge p0.s, p0/z, z1.s, z0.s
  cmplo p0.s, p0/z, z0.s, z1.s => cmphi p0.s, p0/z, z1.s, z0.s
  cmpls p0.s, p0/z, z0.s, z1.s => cmphs p0.s, p0/z, z1.s, z0.s
  cmplt p0.s, p0/z, z0.s, z1.s => cmpgt p0.s, p0/z, z1.s, z0.s

llvm-svn: 334931
2018-06-18 10:59:19 +00:00
Clement Courbet 0d9da88d18 [X86] Fix NOOP sched overrides on BDW/HSW/SKL.
Summary: Noop certainly does not use resources.

Reviewers: RKSimon, craig.topper, andreadb

Subscribers: gbedwell, llvm-commits, gchatelet

Differential Revision: https://reviews.llvm.org/D48028

llvm-svn: 334927
2018-06-18 06:48:22 +00:00
Craig Topper f0ab7bd196 [X86] Create X86InstrFMA3Group objects fully in a static table instead of on the heap. NFCI
Previously we heap allocated the X86InstrFMA3Group objects which were created by passing them small register/memory opcode arrays that existed as individual static tables.

Rather than a bunch of small static arrays we now have one large static table of X86InstrFMA3Group objects. Rather than storing a pointer to the opcode arrays in the X86InstrFMA3Group object, we now store have a register and memory array as part of the object. If a group doesn't have memory or register opcodes, the array entries will be 0.

This greatly simplifies the destruction of the X86InstrFMA3Info object. We no longer need to delete the X86InstrFMA3Group objects as we destruct the DenseMap. And we don't need to keep track of which ones we already deleted.

This reduces the llc binary size on my local machine by ~50k. I can only assume that's really due to the fact that we had something like 512 small static arrays that we passed to the init functions either one at a time or in pairs. So there were between 256 and 512 distinct calls to the init functions in the initOnceImpl method.

llvm-svn: 334925
2018-06-18 06:32:22 +00:00
Craig Topper 16fdde5e63 [X86] Add '.s' aliases to the assembler for the various redundant move encodings to match gas and our EVEX instructions.
We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions.

Also remove the vpextrw.s EVEX alias. That's not something gas implements.

llvm-svn: 334922
2018-06-18 05:00:50 +00:00
Craig Topper 916d0cf649 [X86] Move the 'vmovq.s' and similar assembly strings for EVEX vector moves with reversed operands to InstAliases.
The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser.

Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported.

llvm-svn: 334920
2018-06-18 01:28:05 +00:00
Lang Hames 0705ee8dc2 [ORC] Remove redundant condition
llvm-svn: 334918
2018-06-17 23:54:58 +00:00
Lang Hames a5247cc5c7 [ORC] Only notify queries that they are resolved/ready when the query state
changes.

This guards against redundant notifications.

llvm-svn: 334916
2018-06-17 18:59:01 +00:00
Craig Topper 9fe45d846e [X86] Add all the FMA instructions direclty to the load folding table instead of proxying through X86InstrFMA3Info.
These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table.

llvm-svn: 334915
2018-06-17 18:00:16 +00:00
Lang Hames cd018a4467 [ORC] Suppress an unused variable warning for a debug-mode only use.
llvm-svn: 334911
2018-06-17 17:18:12 +00:00
Lang Hames df5776b1dc [ORC] Erase empty dependence sets when adding new symbol dependencies.
llvm-svn: 334910
2018-06-17 16:59:53 +00:00
Lang Hames 11adecfb2c [ORC] In MaterializationResponsibility, only maintain the Materializing flag on
symbols in debug mode.

The MaterializationResponsibility class hijacks the Materializing flag to track
symbols that have not yet been resolved in order to guard against redundant
resolution. Since this is an API contract check and only enforced in debug mode
there is no reason to maintain the flag state in release mode.

llvm-svn: 334909
2018-06-17 16:59:52 +00:00
Craig Topper b0e986f88e [X86] Pass the parent SDNode to X86DAGToDAGISel::selectScalarSSELoad to simplify the hasSingleUseFromRoot handling.
Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output.

This fixed at least fma-scalar-memfold.ll to succed without the peephole pass.

llvm-svn: 334908
2018-06-17 16:29:46 +00:00
Sander de Smalen 279b7e74e7 [AArch64][SVE] Asm: Support for bitwise operations on predicate vectors.
This patch adds support for instructions performing bitwise operations
on predicate vectors, including AND, BIC, EOR, NAND, NOR, ORN, ORR, and
their status flag setting variants ANDS, BICS, EORS, NANDS, ORNS, ORRS.

This patch also adds several aliases:

  orr  p0.b, p1/z, p1.b, p1.b  => mov  p0.b, p1.b
  orrs p0.b, p1/z, p1.b, p1.b  => movs p0.b, p1.b

  and  p0.b, p1/z, p2.b, p2.b  => mov  p0.b, p1/z, p2.b
  ands p0.b, p1/z, p2.b, p2.b  => movs p0.b, p1/z, p2.b

  eor  p0.b, p1/z, p2.b, p1.b  => not  p0.b, p1/z, p2.b
  eors p0.b, p1/z, p2.b, p1.b  => nots p0.b, p1/z, p2.b

llvm-svn: 334906
2018-06-17 10:48:21 +00:00
Sander de Smalen 2c25b4cd36 [AArch64][SVE] Asm: Support for SEL (vector/predicate) instructions.
Support for SVE's predicated select instructions to select elements
from either vector, both in a data-vector and a predicate-vector
variant.

llvm-svn: 334905
2018-06-17 10:11:04 +00:00
Jonas Hahnfeld c7410ed47a [NVPTX] Ignore target-cpu and -features for inlining
We don't want to prevent inlining because of target-cpu and -features
attributes that were added to newer versions of LLVM/Clang: There are
no incompatible functions in PTX, ptxas will throw errors in such cases.

Differential Revision: https://reviews.llvm.org/D47691

llvm-svn: 334904
2018-06-17 09:55:20 +00:00
Heejin Ahn 9786946731 [WebAssembly] Simple comment fix. NFC.
llvm-svn: 334899
2018-06-17 00:37:56 +00:00
Craig Topper 29f22d7baa [X86] More additions to the load folding tables based on the autogenerated tables.
Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table.

llvm-svn: 334898
2018-06-16 23:25:50 +00:00
Craig Topper c435632862 [X86] Hide POP16/32/64rmr and PUSH16/32/64rmr instructions from the assembly parser.
These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority.

llvm-svn: 334897
2018-06-16 23:25:48 +00:00
Craig Topper 74412c7d59 [X86] Fix an inconsistency between AVX512 and AVX/SSE version on a couple instructions.
VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode.

VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode.

llvm-svn: 334896
2018-06-16 23:25:47 +00:00
Michael Zolotukhin 158a7c3323 CorrelatedValuePropagation: Preserve DT.
Summary:
We only modify CFG in a couple of places, and we can preserve DT there
with a little effort.

Reviewers: davide, vsk

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48059

llvm-svn: 334895
2018-06-16 18:57:31 +00:00
Benjamin Kramer 1193bbf6b7 Fix namespaces. No functionality change.
llvm-svn: 334890
2018-06-16 13:37:52 +00:00
Stanislav Mekhanoshin 3b11794dbf [AMDGPU] setcc (select cc, CT, CF), CF, eq | ne -> xor cc, -1 | cc
This is the common case in the BE when we serialize condition and then
rematerialize it. Use either original or inverted condition.

Differential Revision: https://reviews.llvm.org/D48246

llvm-svn: 334882
2018-06-16 03:46:59 +00:00