Commit Graph

49568 Commits

Author SHA1 Message Date
Vlad Tsyrklevich 0b40f21134 Revert "[DAGCombine] Move AND nodes to multiple load leaves"
This reverts commit r319773. It was causing some buildbots to hang, e.g.
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-android/builds/5589

llvm-svn: 319867
2017-12-06 01:16:08 +00:00
Derek Schuff 3d5b86205c [WebAssembly] Fix test breakage from r319810
llvm-svn: 319865
2017-12-06 01:02:44 +00:00
Zachary Turner 330f48f2dc Regex out the local hash comparison test.
Since the local hash is a different number of bytes depending
on host architecture, we don't have a consistent value.  I
will need to re-do this test for both x86 and x64.  For now
it accepts any value for the local hash.

llvm-svn: 319864
2017-12-06 00:58:12 +00:00
Zachary Turner 376d437776 Teach llvm-pdbutil to dump types from object files.
llvm-svn: 319859
2017-12-05 23:58:18 +00:00
Xinliang David Li 45c819063a Revert r319794: [PGO] detect infinite loop and form MST properly: memory leak problem
llvm-svn: 319841
2017-12-05 21:54:01 +00:00
Anna Thomas 7df1a92543 [SafepointIRVerifier] Allow deriving pointers from unrelocated base
Summary:
This patch allows to use derived pointers (GEPs/bitcasts) of unrelocated
base pointers. We care only about the uses of these derived pointers.

It is acheived by two changes:
1. When we have enough information to say if the pointer is unrelocated at some
point or not, we walk all BBs to remove from their Contributions all valid defs
of unrelocated pointers (GEP with unrelocated base or bitcast of unrelocated
pointer).
2. When it comes to verification we just ignore instructions that were removed
at stage 1.

Patch by Daniil Suchkov!

Reviewers: anna, reames, apilipenko, mkazantsev

Reviewed By: anna, mkazantsev

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40289

llvm-svn: 319838
2017-12-05 21:39:37 +00:00
Simon Pilgrim d495301414 [X86][AVX512] Tag BLENDM instruction scheduler classes
llvm-svn: 319833
2017-12-05 21:05:25 +00:00
Simon Pilgrim b69dae42e3 [X86][AVX512] Tag GATHER/SCATTER instruction scheduler classes
NOTE: At the moment these use the WriteLoad/WriteStore classes, which severely underestimates the costs. This needs to be reviewed.
llvm-svn: 319829
2017-12-05 20:47:11 +00:00
Paul Robinson 795ab0d94d [DWARFv5] Emit v5 line table header.
Differential Revision: https://reviews.llvm.org/D40741

llvm-svn: 319827
2017-12-05 20:35:00 +00:00
Matt Arsenault 8ae38bc0bd AMDGPU: Fix SDWA crash on inline asm
This was only searching for explicit defs,
and asserting for any implicit or variadic
instruction defs, like inline asm.

llvm-svn: 319826
2017-12-05 20:32:01 +00:00
Hans Wennborg 5df9f0878b Re-commit r319490 "XOR the frame pointer with the stack cookie when protecting the stack"
The patch originally broke Chromium (crbug.com/791714) due to its failing to
specify that the new pseudo instructions clobber EFLAGS. This commit fixes
that.

> Summary: This strengthens the guard and matches MSVC.
>
> Reviewers: hans, etienneb
>
> Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D40622

llvm-svn: 319824
2017-12-05 20:22:20 +00:00
Ulrich Weigand 5bfed6cb7c [SystemZ] Validate shifted compare value in adjustForTestUnderMask
When folding a shift into a test-under-mask comparison, make sure that
there is no loss of precision when creating the shifted comparison
value.  This usually never happens, except for certain always-true
comparisons in unoptimized code.

Fixes PR35529.

llvm-svn: 319818
2017-12-05 19:42:07 +00:00
Simon Pilgrim 833c260a4b [X86][AVX512] Tag VPTRUNC/VPMOVSX/VPMOVZX instruction scheduler classes
llvm-svn: 319815
2017-12-05 19:21:28 +00:00
Rafael Espindola 74de7e0791 Simplify test.
It can use attrib instead of icacls.

llvm-svn: 319809
2017-12-05 18:26:23 +00:00
Matt Arsenault 7f0a527300 AMDGPU: Fix infinite loop with dbg_value
Surprisingly SIOptimizeExecMaskingPreRA can infinite loop
in some case with DBG_VALUE. Most tests using dbg_value are
run at -O0, so don't run this pass. This seems to only
happen when the value argument is undef.

llvm-svn: 319808
2017-12-05 18:23:17 +00:00
Joel Galenson ea0bafda8a [CVP] Remove some {s|u}sub.with.overflow checks.
This uses ConstantRange::makeGuaranteedNoWrapRegion's newly-added handling for subtraction to allow CVP to remove some subtraction overflow checks.

Differential Revision: https://reviews.llvm.org/D40039

llvm-svn: 319807
2017-12-05 18:14:24 +00:00
Simon Pilgrim 65f805fe30 [X86][X87] Tag FCMOV instruction scheduler classes
llvm-svn: 319804
2017-12-05 18:01:26 +00:00
Dan Gohman c2c997718d [WebAssembly] Implement WASM_STACK_POINTER.
Use the .stack_pointer directive to implement WASM_STACK_POINTER for
specifying a global variable to be the stack pointer.

llvm-svn: 319797
2017-12-05 17:23:43 +00:00
Dan Gohman f7172f4ab0 [WebAssembly] Don't emit .import_global for the wasm target.
.import_global is used by the ELF-based target and not needed by the wasm
target.

llvm-svn: 319796
2017-12-05 17:21:57 +00:00
Xinliang David Li cc35bc9efc [PGO] detect infinite loop and form MST properly
Differential Revision: http://reviews.llvm.org/D40702

llvm-svn: 319794
2017-12-05 17:19:41 +00:00
Rafael Espindola 20569e96e9 Delete temp file if rename fails.
Without this when lld failed to replace the output file it would leave
the temporary behind. The problem is that the existing logic is

- cancel the delete flag
- rename

We have to cancel first to avoid renaming and then crashing and
deleting the old version. What is missing then is deleting the
temporary file if the rename fails.

This can be an issue on both unix and windows, but I am not sure how
to cause the rename to fail reliably on unix. I think it can be done
on ZFS since it has an ACL system similar to what windows uses, but
adding support for checking that in llvm-lit is probably not worth it.

llvm-svn: 319786
2017-12-05 16:40:56 +00:00
Alexey Bataev d4683e6ef1 [InstCombine] Additional test for PR35354, NFC.
llvm-svn: 319783
2017-12-05 16:15:55 +00:00
Jina Nahias 51c1a627c2 [x86][AVX512] Lowering kunpack intrinsics to LLVM IR
This patch, together with a matching clang patch (https://reviews.llvm.org/D39719), implements the lowering of X86 kunpack intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D39720

Change-Id: I4088d9428478f9457f6afddc90bd3d66b3daf0a1
llvm-svn: 319778
2017-12-05 15:42:56 +00:00
Bjorn Pettersson 5abbad7999 Add REQUIRES asserts in combine_loads_from_build_pair.ll
A fixup of r319771, that was causing buildbot failures.

llvm-svn: 319775
2017-12-05 15:26:01 +00:00
Sam Parker 0a436a9d62 [DAGCombine] Move AND nodes to multiple load leaves
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.

Differential Revision: https://reviews.llvm.org/D39604

llvm-svn: 319773
2017-12-05 15:13:47 +00:00
Bjorn Pettersson 823b299fbc [DAGCombine] Handle big endian correctly in CombineConsecutiveLoads
Summary:
Found out, at code inspection, that there was a fault in
DAGCombiner::CombineConsecutiveLoads for big-endian targets.

A BUILD_PAIR is always having the least significant bits of
the composite value in element 0. So when we are doing the checks
for consecutive loads, for big endian targets, we should check
if the load to elt 1 is at the lower address and the load
to elt 0 is at the higher address.

Normally this bug only resulted in missed oppurtunities for
doing the load combine. I guess that in some rare situation it
could lead to faulty combines, but I've not seen that happen.

Note that this patch actually will trigger load combine for
some big endian regression tests.
One example is test/CodeGen/PowerPC/anon_aggr.ll where we now get
  t76: i64,ch = load<LD8[FixedStack-9]
instead of
  t37: i32,ch = load<LD4[FixedStack-10]>
  t35: i32,ch = load<LD4[FixedStack-9]>
  t41: i64 = build_pair t37, t35
before legalization. Then the legalization will split the LD8
into two loads, so the end result is the same. That should
verify that the transfomation is correct now.

Reviewers: niravd, hfinkel

Reviewed By: niravd

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D40444

llvm-svn: 319771
2017-12-05 14:50:05 +00:00
Simon Pilgrim 71660c61e6 [X86][AVX512] Add missing scalar CMPSS/CMPSD logic scheduler classes
llvm-svn: 319770
2017-12-05 14:34:42 +00:00
Mikael Holmen 0a3e98062f Bail out of a SimplifyCFG switch table opt at undef values.
Summary:
A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef.

The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization.

Patch by JesperAntonsson.

Reviewers: spatel, eeckstein, hans

Reviewed By: hans

Subscribers: uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D40639

llvm-svn: 319768
2017-12-05 14:14:00 +00:00
Simon Pilgrim b9b46394e3 [X86][AVX512] Cleanup bit logic scheduler classes
llvm-svn: 319767
2017-12-05 14:04:23 +00:00
Simon Pilgrim fd3a2632e5 [X86][AVX512] Tag scalar CVT and CMP instruction scheduler classes
llvm-svn: 319765
2017-12-05 13:49:44 +00:00
Igor Laevsky cec8f47e77 [InstCombine] Don't crash on out of bounds shifts
Differential Revision: https://reviews.llvm.org/D40649

llvm-svn: 319761
2017-12-05 12:18:15 +00:00
Simon Pilgrim aa91155960 [X86][AVX512] Tag VPCMP/VPCMPU instruction scheduler classes
Move hardcoded itinerary out to the instruction declarations. Not sure that IIC_SSE_ALU_F32P is the best schedule for integer comparisons, but I'm not going to change it right now.

llvm-svn: 319760
2017-12-05 12:14:36 +00:00
Simon Pilgrim a2b5862641 [X86][AVX512] Cleanup VPCMP scheduler classes
Move hardcoded itinerary out to the instruction declarations. Not sure that IIC_SSE_ALU_F32P is the best schedule for integer comparisons, but I'm not going to change it right now.

llvm-svn: 319758
2017-12-05 12:02:22 +00:00
Jonas Paulsson b5b91cd402 [SystemZ] set 'guessInstructionProperties = 0' and set flags as needed.
This has proven a healthy exercise, as many cases of incorrect instruction
flags were corrected in the process. As part of this, IntrWriteMem was added
to several SystemZ instrinsics.

Furthermore, a bug was exposed in TwoAddress with this change (as incorrect
hasSideEffects flags were removed and instructions could now be sunk), and
the test case for that bugfix (r319646) is included here as
test/CodeGen/SystemZ/twoaddr-sink.ll.

One temporary test regression (one extra copy) which will hopefully go away
in upcoming patches for similar cases:
test/CodeGen/SystemZ/vec-trunc-to-i1.ll

Review: Ulrich Weigand.
https://reviews.llvm.org/D40437

llvm-svn: 319756
2017-12-05 11:24:39 +00:00
Jonas Paulsson 86c40db49d [Regalloc] Generate and store multiple regalloc hints.
MachineRegisterInfo used to allow just one regalloc hint per virtual
register. This patch extends this to a vector of regalloc hints, which is
filled in by common code with sorted copy hints. Such hints will make for
more ID copies that can be removed.

NB! This improvement is currently (and hopefully temporarily) *disabled* by
default, except for SystemZ. The only reason for this is the big impact this
has on tests, which has unfortunately proven unmanageable. It was a long
while since all the tests were updated and just waiting for review (which
didn't happen), but now targets have to enable this themselves
instead. Several targets could get a head-start by downloading the tests
updates from the Phabricator review. Thanks to those who helped, and sorry
you now have to do this step yourselves.

This should be an improvement generally for any target!

The target may still create its own hint, in which case this has highest
priority and is stored first in the vector. If it has target-type, it will
not be recomputed, as per the previous behaviour.

The temporary hook enableMultipleCopyHints() will be removed as soon as all
targets return true.

Review: Quentin Colombet, Ulrich Weigand.
https://reviews.llvm.org/D38128

llvm-svn: 319754
2017-12-05 10:52:24 +00:00
Guy Blank f3cefdd350 [X86] Fix a bug in handling GRXX subclasses in Domain Reassignment pass
When trying to determine the correct Mask register class corresponding
to a GPR register class, not all register classes were handled.
This caused an assertion to be raised on some scenarios.

Differential Revision:
https://reviews.llvm.org/D40290

llvm-svn: 319745
2017-12-05 09:08:24 +00:00
Craig Topper a404ce955a [X86] Use vector widening to support sign extend from i1 when the dest type is not 512-bits and vlx is not enabled.
Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements.

If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate.

llvm-svn: 319740
2017-12-05 06:37:21 +00:00
Daniel Sanders 3c1c4c0ee0 Revert r319691: [globalisel][tablegen] Split atomic load/store into separate opcode and enable for AArch64.
Some concerns were raised with the direction. Revert while we discuss it and look into an alternative

llvm-svn: 319739
2017-12-05 05:52:07 +00:00
Craig Topper e1ba2450c2 [X86] Fix a crash if avx512bw and xop are both enabled when the IR contrains a v32i8 bitreverse.
llvm-svn: 319737
2017-12-05 04:47:12 +00:00
Matt Arsenault 9a60c3ea36 AMDGPU: Fix crash when scheduling DBG_VALUE
This calls handleMove with a DBG_VALUE instruction,
which isn't tracked by LiveIntervals. I'm not sure
this is the correct place to fix this. The generic
scheduler seems to have more deliberate region
selection that skips dbg_value.

The test is also really hard to reduce. I haven't been able
to figure out what exactly causes this particular case to
try moving the dbg_value.

llvm-svn: 319732
2017-12-05 03:09:23 +00:00
Craig Topper 276c770e57 [X86] Use vector widening to support zero extend from i1 when the dest type is not 512-bits and vlx is not enabled.
Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements.

If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate.

llvm-svn: 319728
2017-12-05 01:45:46 +00:00
Craig Topper 913b42b0e1 [X86] Don't use kunpck for vXi1 concat_vectors if the upper bits are undef.
This can be efficiently selected by a COPY_TO_REGCLASS without the need for an extra instruction.

llvm-svn: 319726
2017-12-05 01:28:06 +00:00
Craig Topper 6302012442 [X86] Use getZeroVector and remove an unnecessary creation of an APInt before calling getConstant. NFCI
The getConstant function can take care of creating the APInt internally.

getZeroVector will take care of using the correct type for the build vector to avoid re-lowering.

The test change here is because execution domain constraints apparently pass through undef inputs of a zeroing xor. So the different ordering of register allocation here caused the dependency to change.

llvm-svn: 319725
2017-12-05 01:28:04 +00:00
Jan Vesely 39aeab4f30 AMDGPU/EG: Add a new FeatureFMA and use it to selectively enable FMA instruction
Only used by pre-GCN targets
v2: fix predicate setting for FMA_Common

Differential Revision: https://reviews.llvm.org/D40692

llvm-svn: 319712
2017-12-04 23:07:28 +00:00
Jan Vesely d1c9b61e2b AMDGPU: Disable fp64 support on pre GCN asics
It's not implemented.
Passing +fp64-fp16-denormal feature enables fp64 even on asics that don't support it

v2: fix hasFP64 query

Differential Revision: https://reviews.llvm.org/D39931

llvm-svn: 319709
2017-12-04 22:57:29 +00:00
Hans Wennborg 361d4392cf Revert r319490 "XOR the frame pointer with the stack cookie when protecting the stack"
This broke the Chromium build (crbug.com/791714). Reverting while investigating.

> Summary: This strengthens the guard and matches MSVC.
>
> Reviewers: hans, etienneb
>
> Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D40622
>
> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319490 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-svn: 319706
2017-12-04 22:21:15 +00:00
Matt Arsenault 68f0505263 AMDGPU: Fix creating invalid copy when adjusting dmask
Move the entire optimization to one place. Before it was possible
to adjust dmask without changing the register class of the output
instruction, since they were done in separate places. Fix all
lane sizes and move all of the optimization into the DAG folding.

llvm-svn: 319705
2017-12-04 22:18:27 +00:00
Paul Robinson 68ba772cc0 Re-submit r289925 (Update .debug_line section version to match DWARF version)
Set the .debug_line version to match the requested DWARF version,
except with a maximum of v4 because we don't support v5 yet.

Previously Chromium had issues with this patch; see PR31407.  Chromium
tool issues have been addressed, so hopefully this will go through
this time.

Patch by Katya Romanova!

Differential Revision: https://reviews.llvm.org/D38002

llvm-svn: 319699
2017-12-04 21:27:46 +00:00
Daniel Sanders c83977fc21 [globalisel][tablegen] Tests for r319691
I forgot to 'svn add' the test files.

llvm-svn: 319698
2017-12-04 21:14:34 +00:00
Hans Wennborg 7e61f24962 DAG: Match truncated rotation (PR35487)
If the truncation has been pushed past the or-node, look through it and
truncate afterwards.

Differential revision: https://reviews.llvm.org/D40792

llvm-svn: 319692
2017-12-04 20:39:57 +00:00
Daniel Sanders 04e4f47e93 [globalisel][tablegen] Split atomic load/store into separate opcode and enable for AArch64.
This patch splits atomics out of the generic G_LOAD/G_STORE and into their own
G_ATOMIC_LOAD/G_ATOMIC_STORE. This is a pragmatic decision rather than a
necessary one. Atomic load/store has little in implementation in common with
non-atomic load/store. They tend to be handled very differently throughout the
backend. It also has the nice side-effect of slightly improving the common-case
performance at ISel since there's no longer a need for an atomicity check in the
matcher table.

All targets have been updated to remove the atomic load/store check from the
G_LOAD/G_STORE path. AArch64 has also been updated to mark
G_ATOMIC_LOAD/G_ATOMIC_STORE legal.

There is one issue with this patch though which also affects the extending loads
and truncating stores. The rules only match when an appropriate G_ANYEXT is
present in the MIR. For example,
  (G_ATOMIC_STORE (G_TRUNC:s16 (G_ANYEXT:s32 (G_ATOMIC_LOAD:s16 X))))
will match but:
  (G_ATOMIC_STORE (G_ATOMIC_LOAD:s16 X))
will not. This shouldn't be a problem at the moment, but as we get better at
eliminating extends/truncates we'll likely start failing to match in some
cases. The current plan is to fix this in a patch that changes the
representation of extending-load/truncating-store to allow the MMO to describe
a different type to the operation.

llvm-svn: 319691
2017-12-04 20:39:32 +00:00
Matthias Braun de4c0ae205 Add missing triple args to tests
llvm-svn: 319686
2017-12-04 20:08:28 +00:00
Haicheng Wu 234eabaf07 [ConstantFold] Support vector index when factoring out GEP index into preceding dimensions
Follow-up of r316824. This patch supports the vector type for both current and
previous index when factoring out the current one into the previous one.

Differential Revision: https://reviews.llvm.org/D39556

llvm-svn: 319683
2017-12-04 19:56:33 +00:00
Sanjoy Das aa92cae14e [BypassSlowDivision] Improve our handling of divisions by constants
(This reapplies r314253.  r314253 was reverted on r314482 because of a
correctness regression on P100, but that regression was identified to be
something else.)

Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow .  This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.

Reviewers: jlebar

Subscribers: jholewinski, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D38265

llvm-svn: 319677
2017-12-04 19:21:58 +00:00
Matthias Braun 7eae251bae MachineVerifier: undef phi arg doesn't need to be live-out from predecessor
Differential Revision: https://reviews.llvm.org/D40756

llvm-svn: 319674
2017-12-04 18:57:48 +00:00
Francis Visoiu Mistrih 25528d6de7 [CodeGen] Unify MBB reference format in both MIR and debug output
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.

The MIR printer prints the IR name of a MBB only for block definitions.

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix

Differential Revision: https://reviews.llvm.org/D40422

llvm-svn: 319665
2017-12-04 17:18:51 +00:00
Pablo Barrio 2b4385846c Fix function pointer tail calls in armv8-M.base
Summary:
The compiler fails with the following error message:

fatal error: error in backend: ran out of registers during
register allocation

Tail call optimization for Armv8-M.base fails to meet all the required
constraints when handling calls to function pointers where the
arguments take up r0-r3. This is because the pointer to the
function to be called can only be stored in r0-r3, but these are
all occupied by arguments. This patch makes sure that tail call
optimization does not try to handle this type of calls.

Reviewers: chill, MatzeB, olista01, rengolin, efriedma

Reviewed By: olista01, efriedma

Subscribers: efriedma, aemerson, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D40706

llvm-svn: 319664
2017-12-04 16:55:49 +00:00
Sam Kolton 5f7f32c382 [AMDGPU] SDWA: add support for PRESERVE into SDWA peephole.
Summary:

Reviewers: arsenm, vpykhtin, rampitec

Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D37817

llvm-svn: 319662
2017-12-04 16:22:32 +00:00
Sam Parker 987b2c9966 [ARM] CodeGen test
Add another and + load DAG combine test.

llvm-svn: 319660
2017-12-04 15:14:59 +00:00
Anna Thomas 7b360434ff [Loop Predication] Teach LP about reverse loops
Summary:
Currently, we only support predication for forward loops with step
of 1.  This patch enables loop predication for reverse or
countdownLoops, which satisfy the following conditions:
   1. The step of the IV is -1.
   2. The loop has a singe latch as B(X) = X <pred>
latchLimit with pred as s> or u>
   3. The IV of the guard is the decrement
IV of the latch condition (Guard is: G(X) = X-1 u< guardLimit).

This patch was downstream for a while and is the last series of patches
that's from our LP implementation downstream.

Reviewers: apilipenko, mkazantsev, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40353

llvm-svn: 319659
2017-12-04 15:11:48 +00:00
Jonas Hahnfeld 5db24d7c22 [NVPTX] Assign valid global names
PTX requires that identifiers consist only of [a-zA-Z0-9_$]. The
existing pass already ensured this for globals and this patch adds
the cleanup for functions with local linkage.

However, there was a different problem in the case of collisions
of the adjusted name: The ValueSymbolTable then automatically
appended ".N" with increasing Ns to get a unique name while helping
the ABI demangling. Special case this behavior to omit the dots and
append N directly. This will always give us legal names according
to the PTX requirements.

Differential Revision: https://reviews.llvm.org/D40573

llvm-svn: 319657
2017-12-04 14:19:33 +00:00
Oliver Stannard 7ab60605f8 Revert r319649 - [Asm, ARM] Add fallback diag for multiple invalid operands
This is causing a failure in the llvm-clang-x86_64-expensive-checks-win
buildbot, and I can't reproduce it locally, so reverting until I can work out
what is wrong.

llvm-svn: 319654
2017-12-04 13:42:22 +00:00
Oliver Stannard 7cd4db94f8 [Asm, ARM] Add fallback diag for multiple invalid operands
This adds a "invalid operands for instruction" diagnostic for
instructions where there is an instruction encoding with the correct
mnemonic and which is available for this target, but where multiple
operands do not match those which were provided. This makes it clear
that there is some combination of operands that is valid for the current
target, which the default diagnostic of "invalid instruction" does not.

Since this is a very general error, we only emit it if we don't have a
more specific error.

Differential revision: https://reviews.llvm.org/D36747

llvm-svn: 319649
2017-12-04 12:02:32 +00:00
Martin Storsjo eca862de07 [AArch64] Allow using emulated tls on platforms other than ELF
This matches how it is done on X86.

This allows using emulated tls on windows; in MinGW environments,
native tls isn't supported at the moment.

Set the right Data*bitsDirective for windows to match the existing
tests for other platforms. Make parts of the existing tests a regex,
to allow matching .section .rdata for windows, to avoid having to
duplicate the rest of the tests for windows.

Differential Revision: https://reviews.llvm.org/D40770

llvm-svn: 319644
2017-12-04 09:09:04 +00:00
Martin Storsjo c85cc41801 [ARM] Allow using emulated tls on platforms other than ELF
This matches how it is done on X86.

This allows using emulated tls on windows; in MinGW environments,
native tls isn't supported at the moment.

Differential Revision: https://reviews.llvm.org/D40769

llvm-svn: 319643
2017-12-04 09:08:55 +00:00
Craig Topper 4520d4f8ad [X86] Allow VPMAXUQ/VPMAXSQ/VPMINUQ/VPMINSQ to be used with 128/256 bit vectors when AVX512 is enabled.
These instructions can be used by widening to 512-bits and extracting back to 128/256. We do similar to several other instructions already.

llvm-svn: 319641
2017-12-04 07:21:01 +00:00
Craig Topper 67217d7eb4 [SelectionDAG] Teach computeKnownBits some improvements to ISD::SRL with a non-splat constant shift amount.
If we have a non-splat constant shift amount, the minimum shift amount can be used to infer the number of zero upper bits of the result. There's probably a lot more that we can do here, but this
fixes a case where I wanted to infer the sign bit as zero when all the shift amounts are non-zero.

llvm-svn: 319639
2017-12-04 05:38:42 +00:00
Simon Pilgrim 465a88bb92 [X86][AVX512] Tag packed F2I/I2F/F2F conversion instructions scheduler class
llvm-svn: 319636
2017-12-03 21:16:12 +00:00
Simon Pilgrim 9291053463 [X86][AVX512] Regenerate schedule tests.
llvm-svn: 319635
2017-12-03 21:07:36 +00:00
Yaxun Liu 30e4608cca CodeGen: Fix SelectionDAGISel::LowerArguments for sret addr space
SelectionDAGISel::LowerArguments assumes sret addr space is 0, which is
not true for amdgcn---amdgiz target.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D40255

llvm-svn: 319630
2017-12-03 03:31:45 +00:00
Sam Clegg a2b35dac03 Reland "[WebAssembly] Add visibility flag to Wasm symbol flags""
Original change was rL319488.

This was reverted rL319602 due to a gcc 7.1 warning.

Differential Revision: https://reviews.llvm.org/D40772

llvm-svn: 319626
2017-12-03 01:19:23 +00:00
Yaxun Liu 494770403a CodeGen: Fix pointer info in SplitVecOp_EXTRACT_VECTOR_ELT/SplitVecRes_INSERT_VECTOR_ELT
Two issues found when doing codegen for splitting vector with non-zero alloca addr space:

DAGTypeLegalizer::SplitVecRes_INSERT_VECTOR_ELT/SplitVecOp_EXTRACT_VECTOR_ELT uses dummy pointer info for creating
SDStore. Since one pointer operand contains multiply and add, InferPointerInfo is unable to
infer the correct pointer info, which ends up with a dummy pointer info for the target to lower
store and results in isel failure. The fix is to introduce MachinePointerInfo::getUnknownStack to
represent MachinePointerInfo which is known in alloca address space but without other information.

TargetLowering::getVectorElementPointer uses value type of pointer in addr space 0 for
multiplication of index and then add it to the pointer. However the pointer may be in an addr
space which has different size than addr space 0. The fix is to use the pointer value type for
index multiplication.

Differential Revision: https://reviews.llvm.org/D39758

llvm-svn: 319622
2017-12-02 22:13:22 +00:00
Simon Atanasyan d4b693bfb8 [llvm-readobj] Print static MIPS GOT
If a linked binary file contains a dynamic section, the GOT layout
defined by the dynamic section entries. In a statically linked file
the GOT is just a series of entries. This change teaches `llvm-readobj`
to print the GOT in that case. That provides a feature parity with GNU
`readelf`.

llvm-svn: 319616
2017-12-02 13:06:35 +00:00
Craig Topper 64469a18f3 [X86] Fix copy paste mistake in test case for r319612.
llvm-svn: 319613
2017-12-02 08:39:02 +00:00
Craig Topper 7d9a3b82c6 [X86] Teach the assembler to support %db8-%db15 as aliases for %dr8-%dr15.
llvm-svn: 319612
2017-12-02 08:27:46 +00:00
Craig Topper 3e846ecb5b [X86] Support %dr8-%dr15 in the assembler.
Apparently I failed to make this work when I fixed it in the disassembler way back in r224862.

llvm-svn: 319611
2017-12-02 08:27:45 +00:00
Tatyana Krasnukha f665f6a279 [ARC] Add instruction subset for the ARC backend.
Reviewers: petecoup, kparzysz

Reviewed By: petecoup

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37983

llvm-svn: 319609
2017-12-02 05:25:17 +00:00
Nirav Dave 839ff79a8d [DAG][AArch64] Disable post-legalization store
Disable post-legalization store for AArch64 backend which is causing
errors out-of-tree.

llvm-svn: 319607
2017-12-02 04:01:26 +00:00
Heejin Ahn e74a864cec [WebAssembly] Revert r319488 "Add visibility flag to Wasm symbol flags"
This patch reportedly broke one of LLVM bots (ubuntu-gcc7.1-werror).

See http://lab.llvm.org:8011/builders/ubuntu-gcc7.1-werror/builds/3369 for
details.

llvm-svn: 319602
2017-12-02 02:05:06 +00:00
Matt Morehouse 9e658c974b Revert "[X86] Improvement in CodeGen instruction selection for LEAs."
This reverts r319543, due to ASan bot breakage.

llvm-svn: 319591
2017-12-01 22:20:26 +00:00
Nirav Dave 3e76e1e89e [DAG][ARM] Revert "Reenable post-legalize store merge"
due to failures in AArch and ARM code gen.

llvm-svn: 319587
2017-12-01 21:55:47 +00:00
Jake Ehrlich 3da7982cca [MC] Handle unknown literal register numbers in .cfi_* directives
r230670 introduced a step to map EH register numbers to standard
DWARF register numbers. This failed to consider the case when a
user .cfi_* directive uses an integer literal rather than a
register name, to specify a DWARF register number that has no
corresponding LLVM register number (e.g. a special register that
the compiler and assembler have no name for).

Fixes PR34028.

Patch by Roland McGrath

Differential Revision: https://reviews.llvm.org/D36493

llvm-svn: 319586
2017-12-01 21:44:27 +00:00
Philip Reames 6260cf71d3 [IndVars] Fix a bug introduced in r317012
Turns out we can have comparisons which are indirect users of the induction variable that we can make invariant.  In this case, there is no loop invariant value contributing and we'd fail an assert.

The test case was found by a java fuzzer and reduced.  It's a real cornercase.  You have to have a static loop which we've already proven only executes once, but haven't broken the backedge on, and an inner phi whose result can be constant folded by SCEV using exit count reasoning but not proven by isKnownPredicate.  To my knowledge, only the fuzzer has hit this case.

llvm-svn: 319583
2017-12-01 20:57:19 +00:00
Adam Nemet 9303f62255 [opt-remarks] If hotness threshold is set, ignore remarks without hotness
These are blocks that haven't not been executed during training.  For large
projects this could make a significant difference.  For the project, I was
looking at, I got an order of magnitude decrease in the size of the total YAML
files with this and r319235.

Differential Revision: https://reviews.llvm.org/D40678

Re-commit after fixing the failing testcase in rL319576, rL319577 and
rL319578.

llvm-svn: 319581
2017-12-01 20:41:38 +00:00
Fedor Sergeev 3b459c3847 IR printing improvement for loop passes - handle -print-module-scope
Summary:
Adding support for -print-module-scope similar to how it is
being done for function passes. This option causes loop-pass printer
to emit a whole-module IR instead of just a loop itself.

Reviewers: sanjoy, silvas, weimingz

Reviewed By: sanjoy

Subscribers: apilipenko, skatkov, llvm-commits

Differential Revision: https://reviews.llvm.org/D40247

llvm-svn: 319566
2017-12-01 18:33:58 +00:00
Paul Robinson ab69b477a9 [DebugInfo] Bail out if making no progress dumping line tables.
llvm-svn: 319564
2017-12-01 18:25:30 +00:00
Adam Nemet 57783730fd Revert "[opt-remarks] If hotness threshold is set, ignore remarks without hotness"
This reverts commit r319556.

Something is not working with this when used with sample-based profiling.
Investigating...

llvm-svn: 319562
2017-12-01 18:12:29 +00:00
Fedor Sergeev 94dca7c7ea IR printing improvement for function passes - introducing -print-module-scope
Summary:
When debugging function passes it happens to be rather useful to dump
the whole module before the transformation and then use this dump
to analyze this single transformation by running it separately
on that particular module state.

Introducing
    -print-module-scope
debugging option that forces all the function-level IR dumps
to become whole-module dumps.

This option builds on top of normal dumping controls like
   -print-before/after
   -filter-print-funcs

The plan is to eventually extend this option to cover other local passes
(at least loop passes) but that should go as a separate change.

Reviewers: sanjoy, weimingz, silvas, fedor.sergeev

Reviewed By: weimingz

Subscribers: apilipenko, skatkov, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D40245

llvm-svn: 319561
2017-12-01 17:42:46 +00:00
Adam Nemet 8d1fc2b65b [opt-remarks] If hotness threshold is set, ignore remarks without hotness
These are blocks that haven't not been executed during training.  For large
projects this could make a significant difference.  For the project, I was
looking at, I got an order of magnitude decrease in the size of the total YAML
files with this and r319235.

Differential Revision: https://reviews.llvm.org/D40678

llvm-svn: 319556
2017-12-01 17:02:04 +00:00
Hans Wennborg e2470b95da Revert r319531 "[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops."
It causes builds to fail with "Instruction does not dominate all uses" (PR35497).

> Patch tries to improve vectorization of the following code:
>
> void add1(int * __restrict dst, const int * __restrict src) {
>   *dst++ = *src++;
>   *dst++ = *src++ + 1;
>   *dst++ = *src++ + 2;
>   *dst++ = *src++ + 3;
> }
> Allows to vectorize even if the very first operation is not a binary add, but just a load.
>
> Fixed issues related to previous commit.
>
> Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
>
> Reviewed By: ABataev, RKSimon
>
> Subscribers: llvm-commits, RKSimon
>
> Differential Revision: https://reviews.llvm.org/D28907

llvm-svn: 319550
2017-12-01 16:17:24 +00:00
Sam Parker 45b5950f38 [ARM] and + load combine tests
Add a few more tests cases.

llvm-svn: 319548
2017-12-01 15:31:41 +00:00
Nirav Dave eb2b24fded [ARM][DAG] Reenable post-legalize store merge
Summary: Reenable post-legalize stores with constant merging computation and cofrresponding test case.

Reviewers: eastig, efriedma

Subscribers: aemerson, javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40701

llvm-svn: 319547
2017-12-01 14:49:26 +00:00
Jatin Bhateja 328199ec26 [X86] Improvement in CodeGen instruction selection for LEAs.
Summary:
1/  Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to
     accommodate similar operand appearing in the DAG  e.g.
                 T1 = A + B
                 T2 = T1 + 10
                 T3 = T2 + A
    For above DAG rooted at T3, X86AddressMode will now look like
                Base = B , Index = A , Scale = 2 , Disp = 10

2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity
     then complex LEAs (having 3 operands) could be factored out  e.g.
                 leal 1(%rax,%rcx,1), %rdx
                 leal 1(%rax,%rcx,2), %rcx
     will be factored as following
                 leal 1(%rax,%rcx,1), %rdx
                 leal (%rdx,%rcx)   , %edx

3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop.

4/ Simplify LEA converts (lea (BASE,1,INDEX,0)  --> add (BASE, INDEX) which offers better through put.

PR32755 will be taken care of by this pathc.

Previous patch revisions : r313343 , r314886

Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy, jbhateja

Reviewed By: lsaba, RKSimon, jbhateja

Subscribers: jmolloy, spatel, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 319543
2017-12-01 14:07:38 +00:00
Sam Parker 412a991b10 [ARM] and + load combine tests
Adding autogenerated tests for narrow load combines.

Differential Revision: https://reviews.llvm.org/D40709

llvm-svn: 319542
2017-12-01 13:42:39 +00:00
Simon Pilgrim 2dc4ff1cde [X86][AVX512] Tag vshift/vpermv/pshufd/pshufb instructions scheduler classes
llvm-svn: 319540
2017-12-01 13:25:54 +00:00
Mikael Holmen 9c13c8b6ec Revert r319537: Bail out of a SimplifyCFG switch table opt at undef values.
Broke build bots so reverting.

llvm-svn: 319539
2017-12-01 13:11:39 +00:00
Florian Hahn 30932a3c16 [InstSimplify] More fcmp cases when comparing against negative constants.
Summary:
For known positive non-zero value X:
    fcmp uge X, -C => true
    fcmp ugt X, -C => true
    fcmp une X, -C => true
    fcmp oeq X, -C => false
    fcmp ole X, -C => false
    fcmp olt X, -C => false


Patch by Paul Walker.

Reviewers: majnemer, t.p.northover, spatel, RKSimon

Reviewed By: spatel

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D40012

llvm-svn: 319538
2017-12-01 12:34:16 +00:00
Mikael Holmen 9f047795fb Bail out of a SimplifyCFG switch table opt at undef values.
Summary:
A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef.

The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization.

Patch by: JesperAntonsson

Reviewers: spatel, eeckstein, hans

Reviewed By: hans

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40639

llvm-svn: 319537
2017-12-01 12:30:49 +00:00
Nemanja Ivanovic 4364513cb2 Follow-up to r319434 to turn the pass on by default
Now that the patch has gone through the buildbot cycle,
turn it on by default.

llvm-svn: 319535
2017-12-01 12:02:59 +00:00
Alexander Timofeev c1425c9d6b [AMDGPU] SiFixSGPRCopies should not modify non-divergent PHI
Differential revision: https://reviews.llvm.org/D40556

llvm-svn: 319534
2017-12-01 11:56:34 +00:00
Dinar Temirbulatov 29e86584c6 [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops.
Patch tries to improve vectorization of the following code:
    
            void add1(int * __restrict dst, const int * __restrict src) {
              *dst++ = *src++;
              *dst++ = *src++ + 1;
              *dst++ = *src++ + 2;
              *dst++ = *src++ + 3;
            }
            Allows to vectorize even if the very first operation is not a binary add, but just a load.
    
            Fixed issues related to previous commit.
    
            Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
    
            Reviewed By: ABataev, RKSimon
    
            Subscribers: llvm-commits, RKSimon
    
            Differential Revision: https://reviews.llvm.org/D28907

llvm-svn: 319531
2017-12-01 11:10:47 +00:00
Volkan Keles a32ff00b00 GlobalISel: Enable the legalization of G_MERGE_VALUES and G_UNMERGE_VALUES
Summary: LegalizerInfo assumes all G_MERGE_VALUES and G_UNMERGE_VALUES instructions are legal, so it is not possible to legalize vector operations on illegal vector types. This patch fixes the problem by removing the related check and adding default actions for G_MERGE_VALUES and G_UNMERGE_VALUES.

Reviewers: qcolombet, ab, dsanders, aditya_nandakumar, t.p.northover, kristof.beyls

Reviewed By: dsanders

Subscribers: rovka, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D39823

llvm-svn: 319524
2017-12-01 08:19:10 +00:00
Hiroshi Inoue 48e4c7aae6 Recommit rL319407: [SROA] enable splitting for non-whole-alloca loads and stores
Recommiting once reverted patch rL319407 after adding a check for bit vector size to avoid failures in some build bots.

llvm-svn: 319522
2017-12-01 06:05:05 +00:00
Craig Topper f8470a6399 [X86] Custom legalize v2i32 gathers via widening rather than promoting.
The default legalization for v2i32 is promotion to v2i64. This results in a gather that reads 64-bit elements rather than 32. If one of the elements is near a page boundary this can cause an illegal access that can fault.

We also miscalculate the scale for the gather which is an even worse problem, but we probably could have found a separate way to fix that.

llvm-svn: 319521
2017-12-01 06:02:02 +00:00
Craig Topper c261213abc [X86][SelectionDAG] Make sure we explicitly sign extend the index when type promoting the index of scatter and gather.
Type promotion makes no guarantee about the contents of the promoted bits. Since the gather/scatter instruction will use the bits to calculate addresses, we need to ensure they aren't garbage.

llvm-svn: 319520
2017-12-01 06:02:00 +00:00
Craig Topper 81201cee4a [X86] Add another v2i32 gather test case with v2i64 index that wasn't sign extended.
llvm-svn: 319519
2017-12-01 06:01:59 +00:00
Craig Topper 11f733df9b [X86] Add a DAG combine to simplify masks for AVX2 gather instructions.
AVX2 gathers only use the upper bit of the mask allowing us to simplify sign_extend_inreg to a shift left.

llvm-svn: 319514
2017-12-01 02:49:07 +00:00
Sam Clegg 223e2d4f04 [WebAssembly] Update MC tests now that hidden attr is supported
Summary:
Support was added in rL319488 but these tests were not
updated.

Subscribers: jfb, dschuff, jgravelle-google, aheejin, sunfish

Differential Revision: https://reviews.llvm.org/D40693

llvm-svn: 319510
2017-12-01 01:18:47 +00:00
Jake Ehrlich 1a468481c0 Add flag to ArchiveWriter to test GNU64 format more efficiently
Even with the sparse file optimizations the SYM64 test can still be painfully
slow. This unnecessarily slows down devs. It's critical that we test that the
switch to the SYM64 format occurs at 4GB but there isn't any better of a way to
fake the size of the file than sparse files. This change introduces a flag that
allows the cutoff to be arbitrarily set to whatever power of two is desired.
The flag is hidden as it really isn't meant to be used outside this one test.
This is unfortunate but appears necessary, at least until the average hard
drive is much faster.

The changes to the test require some explanation. Prior to this change we knew
that the SYM64 format was being used because the file was simply too large to
have validly handled this case if the SYM64 format were not used. To ensure
that the SYM64 format is still being used I am grepping the file for "SYM64".
Without changing the filename however this would be pointless because "SYM64"
would occur in the file either way. So the filename of the test is also changed
in order to avoid this issue.

Differential Revision: https://reviews.llvm.org/D40632

llvm-svn: 319507
2017-12-01 00:54:28 +00:00
Matt Arsenault 686d5c728f AMDGPU: Use carry-less adds in FI elimination
llvm-svn: 319501
2017-11-30 23:42:30 +00:00
Peter Collingbourne 1f03422610 ThinLTOBitcodeWriter: Try harder to discard unused references to the merged module.
If the thin module has no references to an internal global in the
merged module, we need to make sure to preserve that property if the
global is a member of a comdat group, as otherwise promotion can end
up adding global symbols to the comdat, which is not allowed.

This situation can arise if the external global in the thin module
has dead constant users, which would cause use_empty() to return
false and would cause us to try to promote it. To prevent this from
happening, discard the dead constant users before asking whether a
global is empty.

Differential Revision: https://reviews.llvm.org/D40593

llvm-svn: 319494
2017-11-30 23:05:52 +00:00
Matt Arsenault 84445dd13c AMDGPU: Use gfx9 carry-less add/sub instructions
llvm-svn: 319491
2017-11-30 22:51:26 +00:00
Reid Kleckner ba4014e9dc XOR the frame pointer with the stack cookie when protecting the stack
Summary: This strengthens the guard and matches MSVC.

Reviewers: hans, etienneb

Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits

Differential Revision: https://reviews.llvm.org/D40622

llvm-svn: 319490
2017-11-30 22:41:21 +00:00
Sam Clegg 9138b7b005 Add visibility flag to Wasm symbol flags
The LLVM "hidden" flag needs to be passed through the Wasm
intermediate objects in order for the linker to apply
it to the final Wasm object.

The corresponding change in LLD is here: https://github.com/WebAssembly/lld/pull/14

Patch by Nicholas Wilson

Differential Revision: https://reviews.llvm.org/D40442

llvm-svn: 319488
2017-11-30 22:34:58 +00:00
Dan Gohman 59e4c0b938 [memcpyopt] Teach memcpyopt to optimize across basic blocks
This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.

Fixes PR28958.

Differential Revision: https://reviews.llvm.org/D38374

llvm-svn: 319482
2017-11-30 22:10:53 +00:00
Krzysztof Parzyszek 2dddd6004f [Hexagon] Fix wrong check in test/CodeGen/Hexagon/newvaluejump-solo.mir
llvm-svn: 319476
2017-11-30 21:23:19 +00:00
Krzysztof Parzyszek 8c67461859 [Hexagon] Fix wrong pass in testcase
llvm-svn: 319471
2017-11-30 20:39:15 +00:00
Krzysztof Parzyszek 44555225a6 [Hexagon] Solo instructions cannot be used with new value jumps
llvm-svn: 319470
2017-11-30 20:32:54 +00:00
Yaxun Liu 1db0f718b5 [AMDGPU] Convert test/tools/llvm-objdump/AMDGPU/source-lines.ll to amdgiz
Differential Revision: https://reviews.llvm.org/D40653

llvm-svn: 319469
2017-11-30 20:27:56 +00:00
Craig Topper d4257565cf [X86] Promote i8 CTPOP to i32 instead of i16 when we have the POPCNT instruction.
The 32-bit version is shorter to encode and the zext we emit for the promotion is likely going to be a 32-bit zero extend anyway.

llvm-svn: 319468
2017-11-30 20:15:31 +00:00
Jake Ehrlich ef3b80c57b [llvm-objcopy] Add support for --only-keep/-j and --keep
This change adds support for the --only-keep option and the -j alias as well.
A common use case for these being used together is to dump a specific section's
data. Additionally the --keep option is added (GNU objcopy doesn't have this)
to avoid removing a bunch of things. This allows people to err on the side of
stripping aggressively and then to keep the specific bits that they need for
their application.

Differential Revision: https://reviews.llvm.org/D39021

llvm-svn: 319467
2017-11-30 20:14:53 +00:00
Daniel Sanders aef1dfc690 [aarch64][globalisel] Legalize G_ATOMIC_CMPXCHG_WITH_SUCCESS and G_ATOMICRMW_*
G_ATOMICRMW_* is generally legal on AArch64. The exception is G_ATOMICRMW_NAND.

G_ATOMIC_CMPXCHG_WITH_SUCCESS needs to be lowered to G_ATOMIC_CMPXCHG with an
external comparison.

Note that IRTranslator doesn't generate these instructions yet.

llvm-svn: 319466
2017-11-30 20:11:42 +00:00
Amara Emerson d78d65c2a4 [GlobalISel][IRTranslator] Fix crash during translation of zero sized loads/stores/args/returns.
This fixes PR35358.

rdar://35619533

Differential Revision: https://reviews.llvm.org/D40604

llvm-svn: 319465
2017-11-30 20:06:02 +00:00
Xinliang David Li c23d2c6883 [PGO] Skip counter promotion for infinite loops
Differential Revision: http://reviews.llvm.org/D40662

llvm-svn: 319462
2017-11-30 19:16:25 +00:00
Daniel Sanders f499b2bf1f [globalisel][tablegen] Add support for specific immediates in the match pattern
This enables a few rules such as ARM's uxtb instruction.

llvm-svn: 319457
2017-11-30 18:48:35 +00:00
Dan Gohman 78c19d60a9 [WebAssembly] Revert r319186 "Support bitcasted function addresses with varargs."
The patch broke Emscripten's EM_ASM macros, which utiltize unprototyped
functions.

See https://bugs.llvm.org/show_bug.cgi?id=35385 for details.

llvm-svn: 319452
2017-11-30 18:16:49 +00:00
Francis Visoiu Mistrih c7832045d5 [MIR] Fix DebugInfo tests after r319445
llvm-svn: 319447
2017-11-30 16:48:53 +00:00
Francis Visoiu Mistrih c71cced0aa [CodeGen] Always use `printReg` to print registers in both MIR and debug
output

As part of the unification of the debug format and the MIR format,
always use `printReg` to print all kinds of registers.

Updated the tests using '_' instead of '%noreg' until we decide which
one we want to be the default one.

Differential Revision: https://reviews.llvm.org/D40421

llvm-svn: 319445
2017-11-30 16:12:24 +00:00
Alexey Bataev d60250dd9b [InstCombine] Additional test for PR35354, NFC.
llvm-svn: 319436
2017-11-30 14:33:58 +00:00
Nemanja Ivanovic db7e77047c [PowerPC] Recommit r314244 with refactoring and off by default
This re-commits everything that was pulled in r314244. The transformation
is off by default (patch to enable it to follow). The code is refactored
to have a single entry-point and provide fine-grained control over patterns
that it selects. This patch also fixes the bugs in the original code.

Everything that failed with the original patch has been re-tested with this
patch (with the transformation turned on). So the patch to turn this on is
soon to follow.

Differential Revision: https://reviews.llvm.org/D38575

llvm-svn: 319434
2017-11-30 13:39:10 +00:00
Simon Pilgrim bb791b3dbd [X86][AVX512] Tag fcmp/ptest/ternlog instructions scheduler classes
llvm-svn: 319433
2017-11-30 13:18:06 +00:00
Simon Pilgrim 1c7556fb29 [X86][AVX512] Regenerate avx512 schedule tests
llvm-svn: 319432
2017-11-30 13:09:21 +00:00
Sean Eveson a6bcd53d52 [MC] Function stack size section.
Re applying after fixing issues in the diff, sorry for any painful conflicts/merges!

Original RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-August/117028.html

This change adds a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. The section contains pairs of function symbol references (8 byte) and stack sizes (unsigned LEB128).

The contents of this section can be used to measure changes to stack sizes between different versions of the compiler or a source base. The advantage of having a section is that we can extract this information when examining binaries that we didn't build, and it allows users and tools easy access to that information just by referencing the binary.

There is a follow up change to add an option to clang.

Thanks.

Reviewers: hfinkel, MatzeB

Reviewed By: MatzeB

Subscribers: thegameg, asb, llvm-commits

Differential Revision: https://reviews.llvm.org/D39788

llvm-svn: 319430
2017-11-30 13:05:14 +00:00
Sean Eveson 661e4fbf83 Revert r319423: [MC] Function stack size section.
I messed up the diff.

llvm-svn: 319429
2017-11-30 12:43:25 +00:00
Diana Picus f003d9ff95 [ARM GlobalISel] Bail out for byval
Fallback if we have a byval parameter or argument since we don't support
them yet.

llvm-svn: 319428
2017-11-30 12:23:44 +00:00
Francis Visoiu Mistrih 93ef145862 [CodeGen] Print "%vreg0" as "%0" in both MIR and debug output
As part of the unification of the debug format and the MIR format, avoid
printing "vreg" for virtual registers (which is one of the current MIR
possibilities).

Basically:

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g"
* grep -nr '%vreg' . and fix if needed
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g"
* grep -nr 'vreg[0-9]\+' . and fix if needed

Differential Revision: https://reviews.llvm.org/D40420

llvm-svn: 319427
2017-11-30 12:12:19 +00:00
Sean Eveson f77b4d2f38 [MC] Function stack size section.
Summary:
Original RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-August/117028.html

I wasn't sure who to put as reviewers, so please add/remove people as appropriate.

This change adds a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. The section contains pairs of function symbol references (8 byte) and stack sizes (unsigned LEB128).

The contents of this section can be used to measure changes to stack sizes between different versions of the compiler or a source base. The advantage of having a section is that we can extract this information when examining binaries that we didn't build, and it allows users and tools easy access to that information just by referencing the binary.

There is a follow up change to add an option to clang.

Thanks.

Reviewers: hfinkel, MatzeB

Reviewed By: MatzeB

Subscribers: thegameg, asb, llvm-commits

Differential Revision: https://reviews.llvm.org/D39788

llvm-svn: 319423
2017-11-30 12:01:16 +00:00
Serge Guelton 24386867b8 Support generic lowering of vector bswap
llvm-svn: 319419
2017-11-30 11:06:22 +00:00
Simon Pilgrim 3e5987cf8d [X86][AVX512] Tag RCP/RSQRT/GETEXP instructions scheduler classes
llvm-svn: 319418
2017-11-30 10:48:47 +00:00
Jonas Devlieghere c635376d7c [dsymutil] Upstream getBundleInfo implementation
This patch implements `getBundleInfo`, which uses CoreFoundation to
obtain information about the CFBundle. This information is needed to
populate the Plist in the dSYM bundle.

This change only applies to darwin and is an NFC as far as other
platforms are concerned.

Differential revision: https://reviews.llvm.org/D40244

llvm-svn: 319416
2017-11-30 10:25:28 +00:00
Hiroshi Inoue 21e8ded4d2 Revert rL319407: [SROA] enable splitting for non-whole-alloca loads and stores
This reverts commit rL319407 due to failures in some buildbot.

llvm-svn: 319410
2017-11-30 08:29:51 +00:00
Jonas Paulsson b9a2467501 [SystemZ] Bugfix in adjustSubwordCmp.
Csmith generated a program where a store after load to the same address did
not get chained after the new load created during DAG legalizing, and so
performed an illegal overwrite of the expected value.

When the new zero-extending load is created, the chain users of the original
load must be updated, which was not done previously.

A similar case was also found and handled in lowerBITCAST.

Review: Ulrich Weigand
https://reviews.llvm.org/D40542

llvm-svn: 319409
2017-11-30 08:18:50 +00:00
Hiroshi Inoue 422e80aee2 [SROA] enable splitting for non-whole-alloca loads and stores
Currently, SROA splits loads and stores only when they are accessing the whole alloca.
This patch relaxes this limitation to allow splitting a load/store if all other loads and stores to the alloca are disjoint to or fully included in the current load/store. If there is no other load or store that crosses the boundary of the current load/store, the current splitting implementation works as is.
The whole-alloca loads and stores meet this new condition and so they are still splittable.

Here is a simplified motivating example.

struct record {
    long long a;
    int b;
    int c;
};

int func(struct record r) {
    for (int i = 0; i < r.c; i++)
        r.b++;
    return r.b;
}

When updating r.b (or r.c as well), LLVM generates redundant instructions on some platforms (such as x86_64, ppc64); here, r.b and r.c are packed into one 64-bit GPR when the struct is passed as a method argument.

With this patch, the above example is compiled into only few instructions without loop.
Without the patch, unnecessary loop-carried dependency is introduced by SROA and the loop cannot be eliminated by the later optimizers.

Differential Revision: https://reviews.llvm.org/D32998

llvm-svn: 319407
2017-11-30 07:44:46 +00:00
Craig Topper a495744d2c [X86] Optimize avx2 vgatherqps for v2f32 with v2i64 index type.
Normal type legalization will widen everything. This requires forcing 0s into the mask register. We can instead choose the form that only reads 2 elements without zeroing the mask.

llvm-svn: 319406
2017-11-30 07:01:40 +00:00
Craig Topper 321a8b9b63 [X86] Make sure we don't remove sign extends of masks with AVX2 masked gathers.
We don't use k-registers and instead use the MSB so we need to make sure we sign extend the mask to the msb.

llvm-svn: 319405
2017-11-30 06:31:31 +00:00
Graham Yiu 70293fa27a - Removed unused lamba (IsReturnBlock) causing build bots to fail for r319398
- Added lit testcases that were supposed to be part of r319398

llvm-svn: 319399
2017-11-30 03:36:57 +00:00
Matt Arsenault caf0ed4d74 AMDGPU: Allow negative MUBUF vaddr for gfx9
GFX9 does not enable bounds checking for the resource descriptors
used for private access, so it should be OK to use vaddr with
a potentially negative value.

llvm-svn: 319393
2017-11-30 00:52:40 +00:00
Rafael Espindola a4e418f713 Check alignment in getSectionContentsAsArray.
While the ArrayRef can technically have unaligned data, it would be
extremely surprising if iterating over it caused undefined behavior
when a reference to the underlying type was bound.

llvm-svn: 319392
2017-11-30 00:44:22 +00:00
Vedant Kumar 80fbb85555 [Coverage] Use the most-recent completed region count (PR35437)
This is a fix for the coverage segment builder.

If multiple regions must be popped off the active stack at once, and
more than one of them end at the same location, emit a segment using the
count from the most-recent completed region.

Fixes PR35437, rdar://35760630

Testing: invoked llvm-cov on a stage2 build of clang, additional unit
tests, check-profile

llvm-svn: 319391
2017-11-30 00:28:23 +00:00
Joerg Sonnenberger 4b1acff9b3 First step towards more human-friendly PPC assembler output:
- add -ppc-reg-with-percent-prefix option to use %r3 etc as register
  names
- split off logic for Darwinish verbose conditional codes into a helper
  function
- be explicit about Darwin vs AIX vs GNUish assembler flavors

Based on the patch from Alexandre Yukio Yamashita

Differential Revision: https://reviews.llvm.org/D39016

llvm-svn: 319381
2017-11-29 23:05:56 +00:00
Craig Topper cf461a0a32 [SelectionDAG][X86] Teach promotion legalization for fp_to_sint/fp_to_uint to insert an assertsext/assertzext based on the original type
If we put in an assertsext/zext here, we're able to generate better truncate code using pack on pre-avx512 targets.

Similar is already done during type legalization. This is the equivalent for op legalization

Differential Revision: https://reviews.llvm.org/D40591

llvm-svn: 319368
2017-11-29 22:15:43 +00:00
Dan Gohman 580c102ab8 [WebAssembly] Fix fptoui lowering bounds
To fully avoid trapping on wasm, fptoui needs a second check to ensure that
the operand isn't below the supported range.

llvm-svn: 319354
2017-11-29 20:20:11 +00:00
Krzysztof Parzyszek f4dcc42e7b [Hexagon] Remove HexagonISD::PACKHL
llvm-svn: 319352
2017-11-29 19:59:29 +00:00
Simon Pilgrim 4d2c703492 [X86][AVX512] Tag RCP/RSQRT/GETEXP instructions scheduler classes (REVERSION)
Accidental commit of incomplete patch

llvm-svn: 319346
2017-11-29 19:37:38 +00:00
Simon Pilgrim 87034cb498 [X86][AVX512] Tag RCP/RSQRT/GETEXP instructions scheduler classes
llvm-svn: 319338
2017-11-29 19:19:59 +00:00
Simon Pilgrim 36be852cee [X86][AVX512] Tag 3OP (shuffles, double-shifts and GFNI) instructions scheduler classes
llvm-svn: 319337
2017-11-29 18:52:20 +00:00
Nirav Dave bafaa53c4d [ARM][DAG] Revert Disable post-legalization store merge for ARM
Partially reverting enabling of post-legalization store merge
(r319036) for just ARM backend as it is causing incorrect code
in some Thumb2 cases.

llvm-svn: 319331
2017-11-29 18:06:13 +00:00
Zaara Syeda 76fe100696 [Power9] add more tests for D38287; NFC
llvm-svn: 319328
2017-11-29 17:26:20 +00:00
Sanjay Patel e0f906c915 [InstCombine] add tests for select-of-constants; NFC
These are variants of a test that was originally added in:
https://reviews.llvm.org/rL75531
...but removed with:
https://reviews.llvm.org/rL159230

llvm-svn: 319327
2017-11-29 17:21:39 +00:00
Adam Nemet 95e0c5fc6c Add opt-viewer testing
Detects whether we have the Python modules (pygments, yaml) required by
opt-viewer and hooks this up to REQUIRES.

This fixes https://bugs.llvm.org/show_bug.cgi?id=34129 (the lack of opt-viewer
testing).

It's also related to https://github.com/apple/swift/pull/12938 and the idea is
to expose LLVM_HAVE_OPT_VIEWER_MODULES to the Swift cmake.

Differential Revision: https://reviews.llvm.org/D40202

Fixes since the first commit:
1. Disable syntax highlighting as different versions of pygments generate
different HTML
2. Use llvm-cxxfilt from the build

llvm-svn: 319324
2017-11-29 17:07:41 +00:00
Sander de Smalen 6a3bf1f84a Reverted r319315 because of unused functions (due to PPR not yet being
used by any instructions).

llvm-svn: 319321
2017-11-29 15:14:39 +00:00
Sander de Smalen 2b6338b2bc [AArch64][SVE] Asm: Add SVE predicate register definitions and parsing support
Summary: Patch [1/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions.

Reviewers: rengolin, kristof.beyls, fhahn, mcrosier, evandro, echristo, efriedma

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D40360

llvm-svn: 319315
2017-11-29 14:34:18 +00:00
Diana Picus 863b5b05f1 [ARM GlobalISel] Fix selecting G_BRCOND
When lowering a G_BRCOND, we generate a TSTri of the condition against
1, which sets the flags, and then a Bcc which branches based on the
value of the flags.

Unfortunately, we were using the wrong condition code to check whether
we need to branch (EQ instead of NE), which caused all our branches to
do the opposite of what they were intended to do. This patch fixes the
issue by using the correct condition code.

llvm-svn: 319313
2017-11-29 14:20:06 +00:00
Oliver Stannard 9ea2eaeb50 [ARM] Add support for armv7e-m to the .arch directive
This will allow compilation of assembly files targeting armv7e-m without having
to specify the Tag_CPU_arch attribute as a workaround.

Differential revision: https://reviews.llvm.org/D40370

Patch by Ian Tessier!

llvm-svn: 319303
2017-11-29 10:12:15 +00:00
Serguei Katkov 5036459ae3 [CGP] Fix common type handling in optimizeMemoryInst
If common type is different we should bail out due to we will not be
able to create a select or Phi of these values.

Basically it is done in ExtAddrMode::compare however it does not work
if we handle the null first and then two values of different types.
so add a check in initializeMap as well. The check in ExtAddrMode::compare
is used as earlier bail out.

Reviewers: reames, john.brawn
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40479

llvm-svn: 319292
2017-11-29 05:51:26 +00:00
Sean Fertile aab3ef76d9 [PowerPC] Relax the checking on AND/AND8 in isSignOrZeroExtended.
Separate the handling of AND/AND8 out from PHI/OR/ISEL checking. The reasoning
is the others need all their operands to be sign/zero extended for their output
to also be sign/zero extended. This is true for AND and sign-extension, but for
zero-extension we only need at least one of the input operands to be zero
extended for the result to also be zero extended.

Differential Revision: https://reviews.llvm.org/D39078

llvm-svn: 319289
2017-11-29 04:09:29 +00:00
Matt Arsenault 9a7e29ae91 AMDGPU: Use stricter regexes for add instructions
Match the entire _co as one optional piece rather than
a set of characters to match multiple times.

llvm-svn: 319275
2017-11-29 02:25:14 +00:00
Adrian Prantl 5da51f435a llvm-dwarfdump: honor the --show-children option when dumping a specific DIE.
llvm-svn: 319271
2017-11-29 01:12:22 +00:00
Matt Arsenault 3f71c0e3ee AMDGPU: Select DS insts without m0 initialization
GFX9 stopped using m0 for most DS instructions. Select
a different instruction without the use. I think this will
be less error prone than trying to manually maintain m0
uses as needed.

llvm-svn: 319270
2017-11-29 00:55:57 +00:00
Craig Topper fbf7b3bf3e [X86] Promote fp_to_sint v16f32->v16i16/v16i8 to avoid scalarization.
llvm-svn: 319266
2017-11-29 00:32:09 +00:00
Adam Nemet 90e8c122ee Revert "Add opt-viewer testing"
This reverts commit r319188.

Breaks when c++filt is not available.

llvm-svn: 319262
2017-11-29 00:10:48 +00:00
Craig Topper 8261c8c066 [X86] Add test cases for fptosi v16f32->v16i8/v16i16 to show scalarization.
llvm-svn: 319261
2017-11-29 00:02:22 +00:00
Craig Topper 88ffb5d4d5 [X86] Mark ISD::FP_TO_UINT v16i8/v16i16 as Promote under AVX512 instead of legal. Fix infinite loop in op legalization when promotion requires 2 steps.
Previously we had an isel pattern to add the truncate. Instead use Promote to add the truncate to the DAG before isel.

The Promote legalization code had to be updated to prevent an infinite loop if promotion took multiple steps because it wasn't remembering the previously tried value.

llvm-svn: 319259
2017-11-28 23:56:02 +00:00
Craig Topper 3f749c2d4b [X86] Regenerate avx512-schedule test.
For some reason some sqrt instructions were missing the scheduling comments.

llvm-svn: 319258
2017-11-28 23:55:59 +00:00
Matt Arsenault 607a756651 AMDGPU: Enable IPRA
llvm-svn: 319256
2017-11-28 23:40:12 +00:00
Simon Pilgrim b9aa93cb93 [X86] Tag CLFLUSHOPT with same scheduling behaviour as CLFLUSH
llvm-svn: 319253
2017-11-28 23:25:42 +00:00
Daniel Sanders 40c5cbfb08 [globalisel][tablegen] Fix PR35375 by sign-extending the table value to match getConstantVRegVal()
Summary:
From the bug report:
> The problem is that it fails when trying to compare -65536 (or 4294901760) to 0xFFFF,0000. This is because the
> constant in the instruction is sign extended to 64 bits (0xFFFF,FFFF,FFFF,0000) and then compared to the non
> extended 64 bit version expected by TableGen.
> 
> In contrast, the DAGISelEmitter generates special code for AND immediates (OPC_CheckAndImm), which does not
> sign extend.

This patch doesn't introduce the special case for AND (and OR) immediates since the majority of it is related to handling known bits that have no effect on the result and GlobalISel doesn't detect known-bits at this time. Instead this patch just ensures that the immediate is extended consistently on both sides of the check.

Thanks to Diana Picus for the detailed bug report.

Reviewers: rovka

Reviewed By: rovka

Subscribers: kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D40532

llvm-svn: 319252
2017-11-28 23:18:54 +00:00
Simon Pilgrim a675071b7a [X86] Add CLFLUSHOPT schedule tests
llvm-svn: 319250
2017-11-28 23:12:12 +00:00
Simon Pilgrim c6c2103e1b [X86] Test clflushopt intrinsic on 32 and 64-bit targets
llvm-svn: 319247
2017-11-28 23:04:42 +00:00
Adam Nemet 80fb55625b Remove this test
After r319235, we no longer generate this remark.

llvm-svn: 319242
2017-11-28 22:39:38 +00:00
Adam Nemet 2e92289014 Demote this opt remark to DEBUG.
From a random opt-stat output:

Top 10 remarks:
  tailcallelim/tailcall          53%
  inline/AlwaysInline            13%
  gvn/LoadClobbered              13%
  inline/Inlined                  8%
  inline/TooCostly                2%
  inline/NoDefinition             2%
  licm/LoadWithLoopInvariantAddressInvalidated  2%
  licm/Hoisted                    1%
  asm-printer/InstructionCount    1%
  prologepilog/StackSize          1%

llvm-svn: 319235
2017-11-28 22:11:00 +00:00
Daniel Sanders 766646517f [globalisel][tablegen] Add support for importing G_ATOMIC_CMPXCHG, G_ATOMICRMW_* rules from SelectionDAG.
GIM_CheckNonAtomic has been replaced by GIM_CheckAtomicOrdering to allow it to support a wider
range of orderings. This has then been used to import patterns using nodes such
as atomic_cmp_swap, atomic_swap, and atomic_load_*.

llvm-svn: 319232
2017-11-28 22:07:05 +00:00
Adrian Prantl 77d90b0c39 SROA: Don't create variable fragments that are outside of the variable.
An alloca may be larger than a variable that is described to be stored
there. Don't create a dbg.value for fragments that are outside of the
variable.

This fixes PR35447.
https://bugs.llvm.org/show_bug.cgi?id=35447

llvm-svn: 319230
2017-11-28 21:30:38 +00:00
Alexey Bataev ab5f3f2b33 [SLP] Additional test for PR35354, NFC.
llvm-svn: 319224
2017-11-28 20:48:24 +00:00
Daniel Sanders 7b361b50d8 [aarch64][globalisel] Add missing tests from r319216
llvm-svn: 319220
2017-11-28 20:27:59 +00:00
Sean Fertile e200016ea9 [PowerPC] Allow tail calls of fastcc functions from C CallingConv functions.
Allow fastcc callees to be tail-called from ccc callers.

Differential Revision: https://reviews.llvm.org/D40355

llvm-svn: 319218
2017-11-28 20:25:58 +00:00
Craig Topper dd4295626b [X86] In lowerVectorShuffleAsElementInsertion, if were able to find a scalar i8 or i16 and need to zero extend it, make sure we use a vXi32 type of the full vector width.
Previously, this was hardcoded to v4i32, but if the input type is 256 bits we need to use v8i32.

Fixes PR35443

llvm-svn: 319208
2017-11-28 19:25:45 +00:00
Sanjay Patel 1a72f67006 [InstCombine] auto-generate complete test checks; NFC
llvm-svn: 319205
2017-11-28 19:13:23 +00:00
Krzysztof Parzyszek 081e458e90 [Hexagon] Make sure to zero-extend bytes before building a vector
llvm-svn: 319204
2017-11-28 19:13:17 +00:00
Sanjay Patel b1a97d3774 [InstCombine] auto-generate complete test checks; NFC
llvm-svn: 319203
2017-11-28 19:07:28 +00:00
Daniel Sanders 17d277b734 [mir] Print/Parse both MOLoad and MOStore when they occur together.
Summary:
They're not always mutually exclusive. read-modify-write atomics are both
at the same time. One example of this is the SWP instructions on AArch64.
Another example is GlobalISel's G_ATOMICRMW_* generic instructions which
will be added in a later patch.

Reviewers: arphaman, aemerson

Reviewed By: aemerson

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D40157

llvm-svn: 319202
2017-11-28 18:57:02 +00:00
Hans Wennborg ca46db957d EntryExitInstrumenter: set DebugLocs on the inserted call instructions (PR35412)
Apparently the verifier requires that inlineable calls in a function
with debug info have debug locations.

llvm-svn: 319199
2017-11-28 18:44:26 +00:00
Zachary Turner 6900de1dfb [CodeView] Refactor / Rewrite TypeSerializer and TypeTableBuilder.
The motivation behind this patch is that future directions require us to
be able to compute the hash value of records independently of actually
using them for de-duplication.

The current structure of TypeSerializer / TypeTableBuilder being a
single entry point that takes an unserialized type record, and then
hashes and de-duplicates it is not flexible enough to allow this.

At the same time, the existing TypeSerializer is already extremely
complex for this very reason -- it tries to be too many things. In
addition to serializing, hashing, and de-duplicating, ti also supports
splitting up field list records and adding continuations. All of this
functionality crammed into this one class makes it very complicated to
work with and hard to maintain.

To solve all of these problems, I've re-written everything from scratch
and split the functionality into separate pieces that can easily be
reused. The end result is that one class TypeSerializer is turned into 3
new classes SimpleTypeSerializer, ContinuationRecordBuilder, and
TypeTableBuilder, each of which in isolation is simple and
straightforward.

A quick summary of these new classes and their responsibilities are:

- SimpleTypeSerializer : Turns a non-FieldList leaf type into a series of
  bytes. Does not do any hashing. Every time you call it, it will
  re-serialize and return bytes again. The same instance can be re-used
  over and over to avoid re-allocations, and in exchange for this
  optimization the bytes returned by the serializer only live until the
  caller attempts to serialize a new record.

- ContinuationRecordBuilder : Turns a FieldList-like record into a series
  of fragments. Does not do any hashing. Like SimpleTypeSerializer,
  returns references to privately owned bytes, so the storage is
  invalidated as soon as the caller tries to re-use the instance. Works
  equally well for LF_FIELDLIST as it does for LF_METHODLIST, solving a
  long-standing theoretical limitation of the previous implementation.

- TypeTableBuilder : Accepts sequences of bytes that the user has already
  serialized, and inserts them by de-duplicating with a hash table. For
  the sake of convenience and efficiency, this class internally stores a
  SimpleTypeSerializer so that it can accept unserialized records. The
  same is not true of ContinuationRecordBuilder. The user is required to
  create their own instance of ContinuationRecordBuilder.

Differential Revision: https://reviews.llvm.org/D40518

llvm-svn: 319198
2017-11-28 18:33:17 +00:00
Konstantin Zhuravlyov 06ae4ec78e AMDGPU: Add num spilled s/vgprs to metadata
This was requested by tools.

Differential Revision: https://reviews.llvm.org/D40321

llvm-svn: 319192
2017-11-28 17:51:08 +00:00
Adam Nemet 353f7cbc21 Add opt-viewer testing
Detects whether we have the Python modules (pygments, yaml) required by
opt-viewer and hooks this up to REQUIRES.

This fixes https://bugs.llvm.org/show_bug.cgi?id=34129 (the lack of opt-viewer
testing).

It's also related to https://github.com/apple/swift/pull/12938 and the idea is
to expose LLVM_HAVE_OPT_VIEWER_MODULES to the Swift cmake.

Differential Revision: https://reviews.llvm.org/D40202

llvm-svn: 319188
2017-11-28 17:26:28 +00:00
Francis Visoiu Mistrih 9d7bb0cb40 [CodeGen] Print register names in lowercase in both MIR and debug output
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.

* Only debug printing is affected. It now follows MIR.

Differential Revision: https://reviews.llvm.org/D40417

llvm-svn: 319187
2017-11-28 17:15:09 +00:00
Dan Gohman 2803bfaf00 [WebAssembly] Support bitcasted function addresses with varargs.
Generalize FixFunctionBitcasts to handle varargs functions. This in
particular fixes the case where clang bitcasts away a varargs when
calling a K&R-style function.

This avoids interacting with tricky ABI details because it operates
at the LLVM IR level before varargs ABI details are exposed.

This fixes PR35385.

llvm-svn: 319186
2017-11-28 17:15:03 +00:00
Matt Arsenault e123aba94e DAG: Legalize truncstores to illegal int types
Truncate to a legal int type, and produce a new
truncstore from a narrower type.

llvm-svn: 319185
2017-11-28 17:11:30 +00:00
Simon Pilgrim ece5bc358a [X86][X87] Tag FTST x87 instruction scheduler class
Looking through Agner, FTST is very similar to generic float compare behaviour, so I've added them to the existing IIC_FCOMI (WriteFAdd) tags.

llvm-svn: 319184
2017-11-28 16:57:20 +00:00
Sanjay Patel 14230e02ff [InstCombine] add tests from D39421 to show current transforms; NFC
llvm-svn: 319182
2017-11-28 16:40:30 +00:00
Simon Pilgrim 0747a7e8c3 [X86][X87] Tag FABS/FCHS/FSQRT/FSIN/FCOS x87 instruction scheduler classes
Atom's FABS/FCHS/FSQRT latencies taken from Agner.

Note: I just added FSIN and FCOS to the existing IIC_FSINCOS itinerary, which is actually a more costly instruction.
llvm-svn: 319175
2017-11-28 15:03:42 +00:00
Simon Pilgrim b843dc26e4 [X86][X86] Add some x87 schedule tests
Still missing some instructions: mainly loads/stores/system ops, all flagged as TODO.

llvm-svn: 319172
2017-11-28 14:35:52 +00:00
Simon Pilgrim 8dc603b031 [X86][3DNow] Add instruction itinerary and scheduling classes for femms/prefetch/prefetchw
llvm-svn: 319167
2017-11-28 12:37:35 +00:00
Chandler Carruth c34f789e38 Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable.
The core idea is to (re-)introduce some redundancies where their cost is
hidden by the cost of materializing immediates for constant operands of
PHI nodes. When the cost of the redundancies is covered by this,
avoiding materializing the immediate has numerous benefits:
1) Less register pressure
2) Potential for further folding / combining
3) Potential for more efficient instructions due to immediate operand

As a motivating example, consider the remarkably different cost on x86
of a SHL instruction with an immediate operand versus a register
operand.

This pattern turns up surprisingly frequently, but is somewhat rarely
obvious as a significant performance problem.

The pass is entirely target independent, but it does rely on the target
cost model in TTI to decide when to speculate things around the PHI
node. I've included x86-focused tests, but any target that sets up its
immediate cost model should benefit from this pass.

There is probably more that can be done in this space, but the pass
as-is is enough to get some important performance on our internal
benchmarks, and should be generally performance neutral, but help with
more extensive benchmarking is always welcome.

One awkward part is that this pass has to be scheduled after
*everything* that can eliminate these kinds of redundancies. This
includes SimplifyCFG, GVN, etc. I'm open to suggestions about better
places to put this. We could in theory make it part of the codegen pass
pipeline, but there doesn't really seem to be a good reason for that --
it isn't "lowering" in any sense and only relies on pretty standard cost
model based TTI queries, so it seems to fit well with the "optimization"
pipeline model. Still, further thoughts on the pipeline position are
welcome.

I've also only implemented this in the new pass manager. If folks are
very interested, I can try to add it to the old PM as well, but I didn't
really see much point (my use case is already switched over to the new
PM).

I've tested this pretty heavily without issue. A wide range of
benchmarks internally show no change outside the noise, and I don't see
any significant changes in SPEC either. However, the size class
computation in tcmalloc is substantially improved by this, which turns
into a 2% to 4% win on the hottest path through tcmalloc for us, so
there are definitely important cases where this is going to make
a substantial difference.

Differential revision: https://reviews.llvm.org/D37467

llvm-svn: 319164
2017-11-28 11:32:31 +00:00
Florian Hahn 25ea91a838 [TailRecursionElimination] Skip debug intrinsics.
Summary:
I think we do not need to analyze debug intrinsics here, as they should
not impact codegen. This has 2 benefits: 1) slightly less work to do and
2) avoiding generating optimization remarks for converting calls to
debug intrinsics to tail calls, which are not really helpful for users.

Based on work by Sander de Smalen.

Reviewers: davide, trentxintong, aprantl

Reviewed By: aprantl

Subscribers: llvm-commits, JDevlieghere

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D40440

llvm-svn: 319158
2017-11-28 09:32:25 +00:00
Martin Storsjo 04b68446eb [COFF] Implement constructor priorities
The priorities in the section name suffixes are zero padded,
allowing the linker to just do a lexical sort.

Add zero padding for .ctors sections in ELF as well.

Differential Revision: https://reviews.llvm.org/D40407

llvm-svn: 319150
2017-11-28 08:07:18 +00:00
Max Kazantsev 115607226a [GVN] Prevent ScalarPRE from hoisting across instructions that don't pass control flow to successors
This is to address a problem similar to those in D37460 for Scalar PRE. We should not
PRE across an instruction that may not pass execution to its successor unless it is safe
to speculatively execute it.

Differential Revision: https://reviews.llvm.org/D38619

llvm-svn: 319147
2017-11-28 07:07:55 +00:00
Adam Nemet bf74f64e67 Revert "Add opt-viewer testing"
This reverts commit r319073.

Bot fails with a mismatch that looks like pygments-generated HTML.

llvm-svn: 319146
2017-11-28 06:22:29 +00:00
Dan Gohman 3ff73cfbcd [WebAssembly] Handle errors better in fast-isel.
Fast-isel routines need to bail out in the case that fast-isel
fails on the operands.

This fixes https://bugs.llvm.org/show_bug.cgi?id=35064

llvm-svn: 319144
2017-11-28 05:36:42 +00:00
Simon Dardis 3aeb1a5404 [DAGCombine] Disable finding better chains for stores at O0
Unoptimized IR can have linear sequences of stores to an array, where the
initial GEP for the first store is formed from the pointer to the array, and the
GEP for each store after the first is formed from the previous GEP with some
offset in an inductive fashion.

The (large) resulting DAG when analyzed by DAGCombine undergoes an excessive
number of combines as each store node is examined every time its' offset node
is combined with any child of the offset. One of the transformations is
findBetterNeighborChains which assists MergeConsecutiveStores. The former
relies on repeated chain walking to do its' work, however MergeConsecutiveStores
is disabled at O0 which makes the transformation redundant.

Any optimization level other than O0 would invoke InstCombine which would
resolve the chain of GEPs into flat base + offset GEP for each store which
does not exhibit the repeated examination of each store to the array.

Disabling this optimization fixes an excessive compile time issue (30~ minutes
for the test case provided) at O0.

Reviewers: niravd, craig.topper, t.p.northover

Differential Revision: https://reviews.llvm.org/D40193

llvm-svn: 319142
2017-11-28 04:07:59 +00:00
Craig Topper ddbc340c20 [X86] Make zero extend from v16i1/v8i1 to v16i8/v8i16/v16i16 not scalarize under AVX512.
llvm-svn: 319136
2017-11-28 01:36:33 +00:00
Craig Topper 5befc5bfce [X86] Add command line without AVX512BW/AVX512VL to bitcast-int-to-vector-bool-zext.ll.
llvm-svn: 319135
2017-11-28 01:36:31 +00:00
Rafael Espindola c06f55e1e8 This reverts commit r319096 and r319097.
Revert "[SROA] Propagate !range metadata when moving loads."
Revert "[Mem2Reg] Clang-format unformatted parts of this file. NFCI."

Davide says they broke a bot.

llvm-svn: 319131
2017-11-28 01:25:38 +00:00
Matthias Braun 5d01e708e1 ARM: Fix PR32578
https://llvm.org/PR32578

I simplified and converted the reproducer into a lit test.

Patch by Vedant Kumar!

llvm-svn: 319130
2017-11-28 01:17:52 +00:00
Dan Gohman cdd48b8a6b [WebAssembly] Fix trapping behavior in fptosi/fptoui.
This adds code to protect WebAssembly's `trunc_s` family of opcodes
from values outside their domain. Even though such conversions have
full undefined behavior in C/C++, LLVM IR's `fptosi` and `fptoui` do
not, and only return undef.

This also implements the proposed non-trapping float-to-int conversion
feature and uses that instead when available.

llvm-svn: 319128
2017-11-28 01:13:40 +00:00
Adrian Prantl d7f6f1636d SROA: Avoid creating a fragment expression that covers the entire variable.
Fixes PR35416.

https://bugs.llvm.org/show_bug.cgi?id=35416

llvm-svn: 319126
2017-11-28 00:57:53 +00:00
Craig Topper dbd4a7fecc [DAGCombiner] Don't combine aext(setcc) if the setcc is already using the target's preferred result type.
With AVX512 vXi1 types are legal so we shouldn't be extending them.

This change is similar to existing code in the zext(setcc) combine.

llvm-svn: 319120
2017-11-27 23:51:40 +00:00
Craig Topper 256cc48df6 [X86] Teach getSetCCResultType to handle more than just SimpleVTs when looking at larger than 512-bit vectors.
Which VTs are considered simple is determined by the superset of the legal types of all targets in LLVM. If we're looking at VTs that are going to be split down to 512-bits we should allow any VT not just simple ones since the simple list changes over time as new targets are added.

llvm-svn: 319110
2017-11-27 22:56:10 +00:00
Davide Italiano b5d59e73ee [SROA] Propagate !range metadata when moving loads.
This tries to propagate !range metadata to a pre-existing load
when a load is optimized out. This is done instead of adding an
assume because converting loads to and from assumes creates a
lot of IR.

Patch by Ariel Ben-Yehuda.

Differential Revision:  https://reviews.llvm.org/D37216

llvm-svn: 319096
2017-11-27 21:25:13 +00:00
Sanjay Patel 0de1a4bc2d [PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result
This should fix PR31455:
https://bugs.llvm.org/show_bug.cgi?id=31455

Differential Revision: https://reviews.llvm.org/D28314

llvm-svn: 319094
2017-11-27 21:15:43 +00:00
Yaxun Liu dcb6067d9f [AMDGPU] Update test nullptr.ll to use amdgiz environment
This test needs to be manually updated since it is difficult to do it with script.

Addr space 6 to 23 are only used by r600, therefore only check them for r600.

Differential Revision: https://reviews.llvm.org/D40117

llvm-svn: 319092
2017-11-27 20:48:21 +00:00
Zaara Syeda f94d58d908 [PowerPC] Remove redundant TOC saves
This patch adds a peep hole optimization to remove any redundant toc save
instructions added as part of the call sequence for indirect calls. It removes
any toc saves within a function that are dominated by another toc save.

Differential Revision: https://reviews.llvm.org/D39736

llvm-svn: 319087
2017-11-27 20:26:36 +00:00
Arnold Schwaighofer d9e710984d Inliner: Don't mark notail calls with the 'tail' attribute
enum TailCallKind { TCK_None = 0, TCK_Tail = 1, TCK_MustTail = 2,
                    TCK_NoTail = 3 };

TCK_NoTail is greater than TCK_Tail so taking the min does not do the
correct thing.

rdar://35639547

llvm-svn: 319075
2017-11-27 19:03:40 +00:00
Adam Nemet cbdd238d5e Add opt-viewer testing
Detects whether we have the Python modules (pygments, yaml) required by
opt-viewer and hooks this up to REQUIRES.

This fixes https://bugs.llvm.org/show_bug.cgi?id=34129 (the lack of opt-viewer
testing).

It's also related to https://github.com/apple/swift/pull/12938 and the idea is
to expose LLVM_HAVE_OPT_VIEWER_MODULES to the Swift cmake.

Differential Revision: https://reviews.llvm.org/D40202

llvm-svn: 319073
2017-11-27 19:00:29 +00:00
Jake Ehrlich 6ad72d05f5 [llvm-objcopy] Add --strip-all-gnu and change --strip-all
GNU's --strip-all doesn't strip as aggressively as it could in general.
Currently llvm-objcopy copies the exact behavoir of GNU's --strip-all.
eu-strip is used as a drop in replacement for GNU strip/objcopy in many many
places without issue. eu-strip removes non-allocated sections and keeps
.gnu.warning* sections. Because --strip-all will likely be the most widely
used stripping option we should make --strip-all as aggressive as it can safely
be. Since we have evidence from eu-strip that this is a safe option we should
allow it. For those that might still have an issue afterwards I've added
--strip-all-gnu as an exact drop in replacement for GNU's --strip-all as well.

llvm-svn: 319071
2017-11-27 18:56:01 +00:00
Craig Topper 51f2886f93 [X86] Add avx512bw command lines to vselect-packss.ll
This shows several places where we fail to use masked move or blendm.

llvm-svn: 319063
2017-11-27 18:00:49 +00:00
Craig Topper 62189f7ab3 [X86] Make getSetCCResultType return vXi1 for any vXi32/vXi64 vector over 512 bits long when AVX512 is enabled.
Similar for vXi16/vXi8 with BWI.

Any vector larger than 512 bits will be split to 512 bits during legalization. But without this we will fold sexts with them before that making it difficult to recover leading to scalarization.

llvm-svn: 319059
2017-11-27 17:51:55 +00:00
Dmitry Preobrazhensky 16608e67d3 [AMDGPU][MC][DISASSEMBLER][GFX9] Corrected decoding of GLOBAL/SCRATCH opcodes
See bug 35433: https://bugs.llvm.org/show_bug.cgi?id=35433

Differential Revision: https://reviews.llvm.org/D40493

Reviewers: artem.tamazov, SamWot, arsenm
llvm-svn: 319050
2017-11-27 17:14:35 +00:00
Zaara Syeda 48cb3c1557 [Power9] Improvements to vector extract with variable index exploitation
This patch extends on to rL307174 to not use the power9 vector extract with
variable index instructions when extracting word element 1. For such cases,
the existing selection of MFVSRWZ provides a better sequence.

Differential Revision: https://reviews.llvm.org/D38287

llvm-svn: 319049
2017-11-27 17:11:03 +00:00
Jonas Devlieghere 6a9c5929d4 [llvm-dwarfdump] Display DW_AT_high_pc as absolute value
DWARF4 relative DW_AT_high_pc values are now displayed as absolute
addresses. The relative value is only shown when explicitly dumping the
forms, i.e. in show-form or verbose mode.

```
DW_AT_low_pc	(0x0000000000000049)
DW_AT_high_pc	(0x00000019)
```

becomes

```
DW_AT_low_pc	(0x0000000000000049)
DW_AT_high_pc	(0x0000000000000062)
```

Differential revision: https://reviews.llvm.org/D40317

rdar://35416943

llvm-svn: 319044
2017-11-27 16:40:46 +00:00
Sanjay Patel 178b70a3de [InstSimplify] add fcmp with negative constant tests; NFC
This is a superset of the tests proposed with D40012 to show another potential improvement.

llvm-svn: 319041
2017-11-27 16:08:34 +00:00
Nirav Dave db77e57ea8 [DAG] Do MergeConsecutiveStores again before Instruction Selection
Summary:

Now that store-merge is only generates type-safe stores, do a second
pass just before instruction selection to allow lowered intrinsics to
be merged as well.

Reviewers: jyknight, hfinkel, RKSimon, efriedma, rnk, jmolloy

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33675

llvm-svn: 319036
2017-11-27 15:28:15 +00:00
Simon Pilgrim 4164009b48 [X86] Add INVLPGA to the existing INVLPG scheduling
llvm-svn: 319031
2017-11-27 14:39:50 +00:00
Petar Jovanovic 7745d2f02f [mips] fix asmstring of Ext and Ins instructions and mips16 JALRC/JRC
Make the print format consistent with other assembler instructions.

Adding a tab character instead of space in asmstring of Ext and Ins
instructions.
Removing space around the tab character for JALRC and replacing space with
tab in JRC.

Patch by Milos Stojanovic.

Differential Revision: https://reviews.llvm.org/D38144

llvm-svn: 319030
2017-11-27 14:25:36 +00:00
Simon Pilgrim be0369ca0d [X86] Add scheduling tests for invlpg/invlpga
llvm-svn: 319029
2017-11-27 14:23:55 +00:00
Vedran Miletic ad21f2687d [AMDGPU] Add custom lowering for llvm.log{,10}.{f16,f32} intrinsics
AMDGPU backend errors with "unsupported call to function" upon
encountering a call to llvm.log{,10}.{f16,f32} intrinsics. This patch
adds custom lowering to avoid that error on both R600 and SI.

Reviewers: arsenm, jvesely

Subscribers: tstellar

Differential Revision: https://reviews.llvm.org/D29942

llvm-svn: 319025
2017-11-27 13:26:38 +00:00
John Brawn 4b476488ba [CGP] Fix handling of null pointer values in optimizeMemoryInst
The current way that trivial addressing modes are detected incorrectly thinks
that null pointers are non-trivial, leading to an infinite loop where we keep
duplicating the same select. Fix this by aware of null when deciding if an
addressing mode is trivial.

Differential Revision: https://reviews.llvm.org/D40447

llvm-svn: 319019
2017-11-27 11:29:15 +00:00
Momchil Velikov bd2c7eb923 [ARM] Fix an off-by-one error when restoring LR for 16-bit Thumb
The commit https://reviews.llvm.org/rL318143 computes incorrectly to offset to
restore LR from.

The number of tPOP operands is 2 (condition) + 2 (implicit def and use of SP) +
count of the popped registers. We need to load LR from just past the last
register, hence the correct offset should be either getNumOperands() - 4 and
getNumExplicitOperands() - 2 (multiplied by 4).

Differential revision: https://reviews.llvm.org/D40305

llvm-svn: 319014
2017-11-27 10:13:14 +00:00
Andrew V. Tischenko 26dde7719b Update BTVER2 sched numbers for SSE42 string instructions.
Differential Revision: https://reviews.llvm.org/D39846

llvm-svn: 319013
2017-11-27 09:58:00 +00:00
Craig Topper 400c32ecb9 [SelectionDAG] Teach SplitVecRes_SETCC to call GetSplitVector if the operands have already been split.
llvm-svn: 319010
2017-11-27 05:52:54 +00:00
Simon Pilgrim fe6e92d517 [X86][3DNow] Add 3DNow! instruction itinerary and scheduling classes
llvm-svn: 319005
2017-11-26 20:50:29 +00:00
Simon Pilgrim 61145d5c25 [X86][SSE] Add SSE42 tests to the clear upper tests
llvm-svn: 319003
2017-11-26 20:03:53 +00:00
Simon Pilgrim f545bb6cae [X86][MMX] Add IIC_MMX_MOVMSK instruction itinerary class
llvm-svn: 318999
2017-11-26 17:56:07 +00:00
Oren Ben Simhon fa582b075c Control-Flow Enforcement Technology - Shadow Stack support (LLVM side)
Shadow stack solution introduces a new stack for return addresses only.
The HW has a Shadow Stack Pointer (SSP) that points to the next return address.
If we return to a different address, an exception is triggered.
The shadow stack is managed using a series of intrinsics that are introduced in this patch as well as the new register (SSP).
The intrinsics are mapped to new instruction set that implements CET mechanism.

The patch also includes initial infrastructure support for IBT.

For more information, please see the following:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

Differential Revision: https://reviews.llvm.org/D40223

Change-Id: I4daa1f27e88176be79a4ac3b4cd26a459e88fed4
llvm-svn: 318996
2017-11-26 13:02:45 +00:00
Coby Tayree d8b17bedfa [x86][icelake]GFNI
galois field arithmetic (GF(2^8)) insns:
gf2p8affineinvqb
gf2p8affineqb
gf2p8mulb
Differential Revision: https://reviews.llvm.org/D40373

llvm-svn: 318993
2017-11-26 09:36:41 +00:00
Craig Topper e485631cd1 [X86] Add separate intrinsics for scalar FMA4 instructions.
Summary:
These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits.

I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512.

I think there are still some missed negate folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before.

I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics.

fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39851

llvm-svn: 318984
2017-11-25 18:32:43 +00:00
Craig Topper ea37e201ec [X86] Don't report gather is legal on Skylake CPUs when AVX2/AVX512 is disabled. Allow gather on SKX/CNL/ICL when AVX512 is disabled by using AVX2 instructions.
Summary:
This adds a new fast gather feature bit to cover all CPUs that support fast gather that we can use independent of whether the AVX512 feature is enabled. I'm only using this new bit to qualify AVX2 codegen. AVX512 is still implicitly assuming fast gather to keep tests working and to match the scatter behavior.

Test command lines have been added for these two cases.

Reviewers: magabari, delena, RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40282

llvm-svn: 318983
2017-11-25 18:09:37 +00:00
Andrew V. Tischenko 198720d38e Add BTVER2 sched support for SHLD/SHRD.
Differential Revision: https://reviews.llvm.org/D40124

llvm-svn: 318977
2017-11-25 10:46:53 +00:00
Craig Topper c1b3269171 [X86] Support folding to andnps with SSE1 only.
With SSE1 only, we emit FAND and FXOR nodes for v4f32.

llvm-svn: 318968
2017-11-25 07:20:22 +00:00
Craig Topper 5b85df8605 [X86] Add some early DAG combines to turn v4i32 AND/OR/XOR into FAND/FOR/FXOR whe only SSE1 is available.
v4i32 isn't a legal type with sse1 only and would end up getting scalarized otherwise.

This isn't completely ideal as it doesn't handle cases like v8i32 that would get split to v4i32. But it at least helps with code written using the clang intrinsic header.

llvm-svn: 318967
2017-11-25 07:20:21 +00:00
Craig Topper 13ed01e635 [X86] Prevent using X * rsqrt(X) to approximate sqrt when only sse1 is enabled.
This optimization can occur after type legalization and emit a vselect with v4i32 type. But that type is not legal with sse1. This ultimately gets scalarized by the second type legalization that runs after vector op legalization, but that's really intended to handle the scalar types that might be introduced by legalizing vector ops.

For now just stop this from happening by disabling the optimization with sse1.

llvm-svn: 318965
2017-11-24 19:57:48 +00:00
Simon Dardis 230f453574 [CodeGenPrepare] Check that erased sunken address are not reused
CodeGenPrepare sinks address computations from one basic block to another
and attempts to reuse address computations that have already been sunk. If
the same address computation appears twice with the first instance as an
operand of a load whose result is an operand to a simplifable select,
CodeGenPrepare simplifies the select and recursively erases the now dead
instructions. CodeGenPrepare then attempts to use the erased address
computation for the second load.

Fix this by erasing the cached address value if it has zero uses before
looking for the address value in the sunken address map.

This partially resolves PR35209.

Thanks to Alexander Richardson for reporting the issue!

This fixed version relands r318032 which was reverted in r318049 due to
sanitizer buildbot failures.

Reviewers: john.brawn

Differential Revision: https://reviews.llvm.org/D39841

llvm-svn: 318956
2017-11-24 16:45:28 +00:00
Dmitry Preobrazhensky 0e8924a5c7 [AMDGPU][MC][GFX9] Added v_interp_p2_f16 and v_interp_p2_legacy_f16
See bug 33629: https://bugs.llvm.org//show_bug.cgi?id=33629

Reviewers: artem.tamazov, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D39488

llvm-svn: 318955
2017-11-24 15:37:14 +00:00
Dylan McKay d3972a8f11 [AVR] Use the short form of 'clr <reg>'
r318895 made it so that the simpler instruction aliases are printed
rather than their expanded form.

llvm-svn: 318954
2017-11-24 15:36:43 +00:00
John Brawn 70cdb5b391 [CGP] Make optimizeMemoryInst able to combine more kinds of ExtAddrMode fields
This patch extends the recent work in optimizeMemoryInst to make it able to
combine more ExtAddrMode fields than just the BaseReg.

This fixes some benchmark regressions introduced by r309397, where GVN PRE is
hoisting a getelementptr such that it can no longer be combined into the
addressing mode of the load or store that uses it.

Differential Revision: https://reviews.llvm.org/D38133

llvm-svn: 318949
2017-11-24 14:10:45 +00:00
Aleksandar Beserminji 590f0793e8 [mips] Set microMIPS ASE flag
This patch fixes an issue where microMIPS ASE flag is not set
when a function has micromips attribute or when .set micromips
directive is used.

Differential Revision: https://reviews.llvm.org/D40316

llvm-svn: 318948
2017-11-24 14:00:47 +00:00
Dmitry Preobrazhensky dd2f1c993e [AMDGPU][MC][GFX9] Added support of 'inst_offset' modifier for compatibility with SP3
See bug 35329: https://bugs.llvm.org//show_bug.cgi?id=35329

Reviewers: arsenm, vpykhtin, artem.tamazov

Differential Revision: https://reviews.llvm.org/D40350

llvm-svn: 318947
2017-11-24 13:22:38 +00:00
Craig Topper 40a1edc307 [X86] Don't invert NewCC variable while processing the jcc/setcc/cmovcc instructions in optimizeCompareInstr.
The NewCC variable is calculated outside of the loop that processes jcc/setcc/cmovcc instructions. If we invert it during the loop it can cause an incorrect value to be used by a later iteration. Instead only read it during the loop and use a new variable to store the possibly inverted value.

Fixes PR35399.

llvm-svn: 318934
2017-11-23 19:25:45 +00:00
Craig Topper f31b0b850b [X86] Teach isel that X86ISD::CMPM_RND zeros the upper bits of the mask register.
llvm-svn: 318933
2017-11-23 18:41:21 +00:00
Simon Pilgrim 90accbc5d9 [X86][SSE] Use (V)PHMINPOSUW for vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions (PR32841)
(V)PHMINPOSUW determines the UMIN element in an v8i16 input, with suitable bit flipping it can also be used for SMAX/SMIN/UMAX cases as well.

This patch matches vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions and reduces the input down to a v8i16 vector before calling (V)PHMINPOSUW.

A later patch will use this for v16i8 reductions as well (PR32841).

Differential Revision: https://reviews.llvm.org/D39729

llvm-svn: 318917
2017-11-23 13:50:27 +00:00
Diana Picus c01f7f131b [ARM GlobalISel] Support G_FDIV for s32 and s64
TableGen already generates code for selecting a G_FDIV, so we only need
to add a test.

For the legalizer and reg bank select, we do the same thing as for the
other floating point binary operations: either mark as legal if we have
a FP unit or lower to a libcall, and map to the floating point
registers.

llvm-svn: 318915
2017-11-23 13:26:07 +00:00
Diana Picus 9faa09b21e [ARM GlobalISel] Support G_FMUL for s32 and s64
TableGen already generates code for selecting a G_FMUL, so we only need
to add a test for that part.

For the legalizer and reg bank select, we do the same thing as the other
floating point binary operators: either mark as legal if we have a FP
unit or lower to a libcall, and map to the floating point registers.

llvm-svn: 318910
2017-11-23 12:44:20 +00:00
Simon Dardis eb5bfd9889 [mips] Use the delay slot filler to convert branches for microMIPSR6.
The MIPS delay slot filler converts delay slot branches into compact
forms for the MIPS ISAs which support them. For branches that compare
(in)equality with with zero, it converts them into branches with implict
zero register operands. These branches have a slightly greater range
than normal two register operands branches.

Changing the branches at this point in the pipeline offers the long
branch pass the ability to mark better judgements if a long branch
sequence is required.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D40314

llvm-svn: 318908
2017-11-23 12:38:04 +00:00
Coby Tayree e8bdd383e9 [x86][icelake]BITALG
2/3
vpshufbitqmb encoding
3/3
vpshufbitqmb intrinsics
Differential Revision: https://reviews.llvm.org/D40222

llvm-svn: 318904
2017-11-23 11:15:50 +00:00
Alexander Potapenko 391804f54b [MSan] Move the access address check before the shadow access for that address
MSan used to insert the shadow check of the store pointer operand
_after_ the shadow of the value operand has been written.
This happens to work in the userspace, as the whole shadow range is
always mapped. However in the kernel the shadow page may not exist, so
the bug may cause a crash.

This patch moves the address check in front of the shadow access.

llvm-svn: 318901
2017-11-23 08:34:32 +00:00
Craig Topper 3fba1bfb77 [X86] Regenerate the vector-popcnt and vector-tzcnt tests to get BITALG CHECK linse on all functions not just the vXi16/vXi8.
llvm-svn: 318885
2017-11-22 23:35:12 +00:00
Fedor Sergeev 61975b49fe IR printing improvement for loop passes
Summary:
Loop-pass printing is somewhat deficient since it does not provide the
context around the loop (e.g. preheader). This context information becomes
pretty essential when analyzing transformations that move stuff out of the loop.

Extending printLoop to cover preheader and exit blocks (if any).

Reviewers: sanjoy, silvas, weimingz

Reviewed By: sanjoy

Subscribers: apilipenko, skatkov, llvm-commits

Differential Revision: https://reviews.llvm.org/D40246

llvm-svn: 318878
2017-11-22 20:59:53 +00:00
Krzysztof Parzyszek 942fa1631f [Hexagon] Implement buildVector32 and buildVector64 as utility functions
Change LowerBUILD_VECTOR to use those functions. This commit will tempora-
rily affect constant vector generation (it will generate constant-extended
values instead of non-extended combines), but the code for the general case
should be better. The constant selection part will be fixed later.

llvm-svn: 318877
2017-11-22 20:56:23 +00:00
Krzysztof Parzyszek b9f33b32ee [Hexagon] Add patterns to select A2_combine_ll and its variants
llvm-svn: 318876
2017-11-22 20:55:41 +00:00
Krzysztof Parzyszek 6acecc96ac [Hexagon] Remove trailing spaces, NFC
llvm-svn: 318875
2017-11-22 20:43:00 +00:00
Craig Topper 726968d6a2 [X86] Support v32i16/v64i8 CTLZ using lookup table.
Had to tweak the setcc's used by the code to use a vXi1 result type with a sign extend back to vector size.

llvm-svn: 318871
2017-11-22 20:05:57 +00:00
Paul Robinson 6ca1dd6fa3 [DwarfDump] -debug-line=offset applies to .dwo too.
llvm-svn: 318856
2017-11-22 18:23:55 +00:00
Yaxun Liu c596226604 [AMDGPU] Fix SITargetLowering::LowerCall for pointer info of byval argument
SITargetLowering::LowerCall uses dummy pointer info for byval argument, which causes
flat load instead of buffer load.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D40040

llvm-svn: 318844
2017-11-22 16:13:35 +00:00
Paul Robinson 511b54cadc [DebugInfo] Dump a .debug_line section, including line-number program,
without any compile units.

Differential Revision: https://reviews.llvm.org/D40114

llvm-svn: 318842
2017-11-22 15:48:30 +00:00
Dmitry Preobrazhensky c492500e7e [AMDGPU][mc][tests] Updated generated lit tests for GFX8/9
Summary:
Added tests to better cover features introduced by commit rL318675.
See http://llvm.org/viewvc/llvm-project?view=revision&revision=318675

llvm-svn: 318841
2017-11-22 15:47:27 +00:00
Paul Robinson 63811a472e [DWARFv5] Support DW_FORM_strp in the .debug_line.dwo header.
As a side effect, the .debug_line section will be dumped in physical
order, rather than in the order that compile units refer to their
associated portions of the .debug_line section.  These are probably
always the same order anyway, and no tests noticed the difference.

Differential Revision: https://reviews.llvm.org/D39854

llvm-svn: 318839
2017-11-22 15:33:17 +00:00
Paul Robinson e0833349b6 [DWARF] Fix handling of extended line-number opcodes
Differential Revision: https://reviews.llvm.org/D40200

llvm-svn: 318838
2017-11-22 15:14:49 +00:00
Nicolai Haehnle dd059c161d AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer
Summary:
This bug seems to have gone unnoticed because critical cases with LDS
instructions are eliminated by the peephole optimizer.

However, equivalent situations arise with buffer loads and stores
as well, so this fixes regressions since r317751 ("AMDGPU: Merge
S_BUFFER_LOAD_DWORD_IMM into x2, x4").

Fixes at least:
KHR-GL45.shader_storage_buffer_object.basic-operations-case1-cs
KHR-GL45.cull_distance.functional
piglit tes-input-gl_ClipDistance.shader_test
... and probably more

Change-Id: I0e371536288eb8e6afeaa241a185266fd45d129d

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D40303

llvm-svn: 318829
2017-11-22 12:25:21 +00:00
Jonas Paulsson 181e260e32 [DAGCombiner] Bugfix in isAlias().
Since i1 is a legal type, this:

  NumBytes = Op1->getMemoryVT().getSizeInBits() >> 3;

is wrong and should be instead

  NumBytes = Op0->getMemoryVT().getStoreSize();

There seems to be more places where this should be fixed outside DAGCombiner.

Review: Hal Finkel
https://bugs.llvm.org/show_bug.cgi?id=35366

llvm-svn: 318824
2017-11-22 08:58:30 +00:00
Max Kazantsev 23044fa639 [SCEV] Strengthen variance condition in calculateLoopDisposition
Given loops `L1` and `L2` with AddRecs `AR1` and `AR2` varying in them respectively.
When identifying loop disposition of `AR2` w.r.t. `L1`, we only say that it is varying if
`L1` contains `L2`. But there is also a possible situation where `L1` and `L2` are
consecutive sibling loops within the parent loop. In this case, `AR2` is also varying
w.r.t. `L1`, but we don't correctly identify it.

It can lead, for exaple, to attempt of incorrect folding. Consider:
  AR1 = {a,+,b}<L1>
  AR2 = {c,+,d}<L2>
  EXAR2 = sext(AR1)
  MUL = mul AR1, EXAR2
If we incorrectly assume that `EXAR2` is invariant w.r.t. `L1`, we can end up trying to
construct something like: `{a * {c,+,d}<L2>,+,b * {c,+,d}<L2>}<L1>`, which is incorrect
because `AR2` is not available on entrance of `L1`.

Both situations "`L1` contains `L2`" and "`L1` preceeds sibling loop `L2`" can be handled
with one check: "header of `L1` dominates header of `L2`". This patch replaces the old
insufficient check with this one.

Differential Revision: https://reviews.llvm.org/D39453

llvm-svn: 318819
2017-11-22 06:21:39 +00:00
Davide Italiano b480b5c2ee [SCCP] Pick the right lattice value for constants.
After the dataflow algorithm proves that an argument is constant,
it replaces it value with the integer constant and drops the lattice
value associated to the DEF.

e.g. in the example we have @f() that's called twice:
call @f(undef, ...)
call @f(2, ...)

`undef` MEET 2 = 2 so we replace the argument and all its uses with
the constant 2.

Shortly after, tryToReplaceWithConstantRange() tries to get the lattice
value for the argument we just replaced, causing an assertion.
This function is a little peculiar as it runs when we're doing replacement
and not as part of the solver but still queries the solver.

The fix is that of checking whether we replaced the value already and
get a temporary lattice value for the constant.

Thanks to Zhendong Su for the report!

Fixes PR35357.

llvm-svn: 318817
2017-11-22 03:04:55 +00:00
Peter Collingbourne 6c48462276 Object: Improve COFF irsymtab comdat representation.
Change the representation of COFF comdats so that a COFF linker
is able to accurately resolve comdats between IR and native object
files. Specifically, apply name mangling to comdat names consistently
with native object files, and do not export comdats with an internal
leader because they do not affect symbol resolution.

Differential Revision: https://reviews.llvm.org/D40278

llvm-svn: 318805
2017-11-21 22:06:20 +00:00
Krzysztof Parzyszek fc0a1812f5 [Hexagon] Make sure that RDF does not remove EH_LABELs
Since EH_LABELs (and other labels) no longer have "side-effects", they
should be checked for separately.

llvm-svn: 318801
2017-11-21 21:05:51 +00:00
Craig Topper ba150ef60a [X86] Allow vpclmulqdq instructions to be commuted during isel to allow load folding.
The commuting patterns for the AVX version actually still had priority over the new patterns.

llvm-svn: 318800
2017-11-21 21:05:21 +00:00
Nirav Dave 61ffc9c0eb Avoid unecessary opsize byte in segment move to memory
Segment moves to memory are always 16-bit. Remove invalid 32 and 64
bit variants.

Recommiting with missing clang inline assembly test change.

Fixes PR34478.

Reviewers: rnk, craig.topper

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D39847

llvm-svn: 318797
2017-11-21 19:28:13 +00:00
Chad Rosier fe97d73674 [AArch64] Mark mrs of TPIDR_EL0 (thread pointer) as *having* side effects.
This partially reverts r298851.  The the underlying issue is that we don't
currently model the dependency between mrs (read system register) and
msr (write system register) instructions.

Something like the below should never be reordered:

 msr TPIDR_EL0, x0  ;; set thread pointer
 mrs x8, TPIDR_EL0  ;; read thread pointer

but was being reordered after r298851.  The functional part of the patch
that wasn't reverted needed to remain in place in order to not break
r299462.

PR35317

llvm-svn: 318788
2017-11-21 18:08:34 +00:00
Hans Wennborg d97c0f7855 Rename test/Transforms/CountingFunctionInserter -> EntryExitInstrumenter
The pass was renamed in r318195.

llvm-svn: 318784
2017-11-21 17:22:19 +00:00
Hans Wennborg 37cbf28e79 EntryExitInstrumenter: support __cyg_profile_func_enter_bare
It works just like __cyg_profile_func_enter but takes no arguments.

llvm-svn: 318783
2017-11-21 17:22:19 +00:00
Oliver Stannard 9cb89f6611 [ARM] Remove pre-UAL FLDM/FSTM aliases
These are pre-UAL syntax, and we don't support any other pre-UAL instructions,
with the exception of FLDMX/FSTMX, which don't have a UAL equivalent. Therefore
there's no reason to keep them or their AsmParser hacks around.

With the AsmParser hacks removed, the FLDMX and FSTMX instructions get the same
operand diagnostics as the UAL instructions.

Differential revision: https://reviews.llvm.org/D39196

llvm-svn: 318777
2017-11-21 16:20:25 +00:00
Oliver Stannard 1e6d4b9e62 [ARM] Don't omit non-default predication code
This was causing the (invalid) predicated versions of the NEON VRINTX and
VRINTZ instructions to be accepted, with the condition code being ignored.

Also, there is no NEON VRINTR instruction, so that part of the check was not
necessary.

Differential revision: https://reviews.llvm.org/D39193

llvm-svn: 318771
2017-11-21 15:34:15 +00:00
Oliver Stannard 1e73e95f3c [Asm] Improve "too few operands" errors
- We can still emit this error if the actual instruction has two or more
  operands missing compared to the expected one.
- We should only emit this error once per instruction.

Differential revision: https://reviews.llvm.org/D36746

llvm-svn: 318770
2017-11-21 15:16:50 +00:00
Sander de Smalen 4acd57eb51 Revert r318759 due to make check-all failure on Windows
llvm-svn: 318768
2017-11-21 15:07:43 +00:00
Oliver Stannard d6ca9879ba [ARM] Add diagnostics for SPR/DPR lists
Differential revision: https://reviews.llvm.org/D39195

llvm-svn: 318766
2017-11-21 15:06:01 +00:00
Alexey Bataev a054ea9848 [InstCombine] Test for PR35354: unable to vectorize loop with std::max
on floats, NFC.

llvm-svn: 318764
2017-11-21 14:49:13 +00:00
Sam Kolton c27e3b6f03 [AMDGPU] SDWA: remove omod src operand for VOP2b instructions
Summary: VOP2b instructions (v_subbrev_u32, v_add_i32 ...) shouldn't support OMod operand in SDWA encoding

Reviewers: rampitec, dp

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D40172

llvm-svn: 318761
2017-11-21 14:11:59 +00:00
Sander de Smalen f475eed48d [TableGen] AsmMatcher: Fix bug with reported diagnostic for operand.
Summary:
The generated diagnostic by the AsmMatcher isn't always applicable to the AsmOperand.

This is because the code will only update the diagnostic if it is more specific than the previous diagnostic. However, when having validated operands and 'moved on' to a next operand (for some instruction/alias for which all previous operands are valid), if the diagnostic is InvalidOperand, than that should be set as the diagnostic, not the more specific message about a previous operand for some other instruction/alias candidate.

Reviewers: craig.topper, olista01, rengolin, stoklund

Reviewed By: olista01

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D40011

llvm-svn: 318759
2017-11-21 12:26:06 +00:00
Eugene Leviant 6bc35a93e6 [MI scheduler] Fix VADD and VSUB in cortex-a57 model
This patch fixes instregex for interger vector add/sub instructions

Differential revision: https://reviews.llvm.org/D40254

llvm-svn: 318749
2017-11-21 11:01:28 +00:00
Coby Tayree 5c7fe5df53 [x86][icelake]BITALG
vpopcnt{b,w}
Differential Revision: https://reviews.llvm.org/D40213

llvm-svn: 318748
2017-11-21 10:32:42 +00:00
Diana Picus c79dfb3b31 [ARM GlobalISel] Add comment for r318398. NFC.
Mention the purpose of the BICri tests added by r318398, as requested in
post-commit review.

llvm-svn: 318747
2017-11-21 10:17:02 +00:00
Coby Tayree 3880f2a363 [x86][icelake]VNNI
Introducing Vector Neural Network Instructions, consisting of:
vpdpbusd{s}
vpdpwssd{s}
Differential Revision: https://reviews.llvm.org/D40208

llvm-svn: 318746
2017-11-21 10:04:28 +00:00
Coby Tayree 71e37cc9ff [x86][icelake]vbmi2
introducing vbmi2, consisting of
vpcompress{b,w}
vpexpand{b,w}
vpsh{l,r}d{w,d,q}
vpsh{l,r}dv{w,d,q}
Differential Revision: https://reviews.llvm.org/D40206

llvm-svn: 318745
2017-11-21 09:48:44 +00:00
NAKAMURA Takumi 519ea284af SLPVectorizer.cpp: Avoid std::stable_sort(properlyDominates()).
properlyDominates() shouldn't be used as sort key. It causes different output between stdlibc++ and libc++.
Instead, I introduced RPOT. In most cases, it works for CSE.

llvm-svn: 318743
2017-11-21 09:41:01 +00:00
Coby Tayree 7ca5e58736 [x86][icelake]vpclmulqdq introduction
an icelake promotion of pclmulqdq
Differential Revision: https://reviews.llvm.org/D40101

llvm-svn: 318741
2017-11-21 09:30:33 +00:00
Coby Tayree 2a1c02fcbc [x86][icelake]VAES introduction
an icelake promotion of AES
Differential Revision: https://reviews.llvm.org/D40078

llvm-svn: 318740
2017-11-21 09:11:41 +00:00
Alex Bradbury 0c7b3643f7 [RISCV] Use register X0 (ZERO) for constant 0
The obvious approach of defining a pattern like the one below actually doesn't
work:
`def : Pat<(i32 0), (i32 X0)>;`

As was noted when Lanai made this change (https://reviews.llvm.org/rL288215),
attempting to handle the constant 0 in tablegen leads to assertions due to a
physical register being used where a virtual register is expected.

llvm-svn: 318738
2017-11-21 08:23:08 +00:00
Alex Bradbury ffc435e9c7 [RISCV] Support and tests for a variety of additional LLVM IR constructs
Previous patches primarily ensured that codegen was possible for the standard
RISC-V instructions. However, there are a number of IR inputs that wouldn't be
appropriately lowered. This patch both adds test cases and supports lowering
for a number of these cases:
* Improved sext/zext/trunc support
* Support for setcc variants that don't map directly to RISC-V instructions
* Lowering mul, and hence support for external symbols
* addc, adde, subc, sube
* mulhs, srem, mulhu, urem, udiv, sdiv
* {srl,sra,shl}_parts
* brind
* br_jt
* bswap, ctlz, cttz, ctpop
* rotl, rotr
* BlockAddress operands

Differential Revision: https://reviews.llvm.org/D29938

llvm-svn: 318737
2017-11-21 08:11:03 +00:00
Alex Bradbury 65385167fb [RISCV] Implement lowering of ISD::SELECT
Although ISD::SELECT_CC is a more natural match for RISCVISD::SELECT_CC (and
ultimately the integer RISC-V conditional branch instructions), we choose to
expand ISD::SELECT_CC and lower ISD::SELECT. The appropriate compare+branch
will be created in the case where an ISD::SELECT condition value is created by
an ISD::SETCC node, which operates on XLen types. Other datatypes such as
floating point don't have conditional branch instructions, and lowering
ISD::SELECT allows more flexibility for handling these cases.

Differential Revision: https://reviews.llvm.org/D29937

llvm-svn: 318735
2017-11-21 07:51:32 +00:00
Yaxun Liu 3cea36f03e [AMDGPU] Fix DAGTypeLegalizer::SplitInteger for shift amount type
DAGTypeLegalizer::SplitInteger uses default pointer size as shift amount constant type,
which causes less performant ISA in amdgcn---amdgiz target since the default pointer
type is i64 whereas the desired shift amount type is i32.

This patch fixes that by using TLI.getScalarShiftAmountTy in DAGTypeLegalizer::SplitInteger.

The X86 change is necessary since splitting i512 requires shifting amount of 256, which
cannot be held by i8.

Differential Revision: https://reviews.llvm.org/D40148

llvm-svn: 318727
2017-11-21 02:29:54 +00:00
Richard Trieu 26917db696 Revert r318678 to fix Clang test
r318678 caused the Clang test CodeGen/ms-inline-asm.c to start failing.

llvm-svn: 318710
2017-11-21 00:12:18 +00:00
Vitaly Buka 8000f228b3 [msan] Don't sanitize "nosanitize" instructions
Reviewers: eugenis

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40205

llvm-svn: 318708
2017-11-20 23:37:56 +00:00
Craig Topper 67eb30ab60 [SelectionDAG] When promoting the result of a VSELECT, make sure we promote the condition to the SetCC type for the final result type not the original type.
Normally this would be cleaned up by promoting the condition operand next. But in the attached case we promoted the result from v2i48 to v2i64 and the condition from v2i1 to v2i48. Then we tried to "promote" the v2i48 condition back to v2i1 because that's what the SetCC result type for v2i64 is on X86 with VLX. But promote is either a NOP or SIGN_EXTEND and this would need a truncation.

With the change here we now get the SetCC type of v2i1 when we're handling the result promotion and the operand no longer needs to be promoted itself.

Fixes PR35272.

llvm-svn: 318706
2017-11-20 23:08:50 +00:00
Fedor Sergeev a476117e3a [Sparc] efficient pattern for UINT_TO_FP conversion
Summary:
        while investigating performance degradation of imagick benchmark
        there were found inefficient pattern for UINT_TO_FP conversion.
        That pattern causes RAW hazard in assembly code. Specifically,
        uitofp IR operator results in poor assembler :

        st          %i0, [%fp - 952]
        ldd         [%fp - 952], %f0

        it stores 32-bit integer register into memory location and then
        loads 64-bit floating point data from that location.
        That is exactly RAW hazard case. To optimize that case it is
        possible to use SPISD::ITOF and SPISD::XTOF for conversion from
        integer to floating point data type and to use ISD::BITCAST to
        copy from integer register into floating point register.
        The fix is to write custom UINT_TO_FP pattern using SPISD::ITOF,
        SPISD::XTOF, ISD::BITCAST.

Patch by Alexey Lapshin

Reviewers: fedor.sergeev, jyknight, dcederman, lero_chris

Reviewed By: jyknight

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36875

llvm-svn: 318704
2017-11-20 22:33:58 +00:00
Yonghong Song a4de6d86db bpf: add a test case for trunc-op optimization
Commit b5cbc7760ab8 ("[bpf] allow direct and indirect calls")
allowed more than one function in the bpf program, and
commit 114353884415 ("bpf: fix a bug in trunc-op optimization")
fixed a bug in trunc-op optimization which only showed up
with more than one function in the bpf program.

This patch added a test case for trunc-op optimization
for bpf programs with two functions. Reverting commit
"bpf: fix a bug in trunc-op optimization" will cause
failure for this test case.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 318695
2017-11-20 21:37:58 +00:00
Hiroshi Yamauchi c94d4d70d8 Add heuristics for irreducible loop metadata under PGO
Summary:
Add the following heuristics for irreducible loop metadata:

- When an irreducible loop header is missing the loop header weight metadata,
  give it the minimum weight seen among other headers.
- Annotate indirectbr targets with the loop header weight metadata (as they are
  likely to become irreducible loop headers after indirectbr tail duplication.)

These greatly improve the accuracy of the block frequency info of the Python
interpreter loop (eg. from ~3-16x off down to ~40-55% off) and the Python
performance (eg. unpack_sequence from ~50% slower to ~8% faster than GCC) due to
better register allocation under PGO.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39980

llvm-svn: 318693
2017-11-20 21:03:38 +00:00
Paul Robinson 746edea0ae Revert "Fix out-of-order stepping behavior in programs with sunk instructions."
This reverts commit 30419e150cd940893a13b345e85f96053850208f.
aka r318679.  It caused "sanitizer-windows" bot to fail.

llvm-svn: 318684
2017-11-20 19:07:52 +00:00
Paul Robinson f0b02965e4 Fix out-of-order stepping behavior in programs with sunk instructions.
MachineSink attempts to place instructions near the basic blocks where
they are needed.  Once an instruction has been sunk, its location
relative to other instructions is no longer consistent with the
original source code. In order to ensure correct single-stepping and
profiling, the debug location for sunk instructions is either merged
with the insertion point or erased if the target successor block is
empty.

Patch by Matthew Voss!

Differential Revision: https://reviews.llvm.org/D39933

llvm-svn: 318679
2017-11-20 18:42:17 +00:00
Nirav Dave 3669061298 [X86] Avoid unecessary opsize byte in segment move to memory
Summary:

Segment moves to memory are always 16-bit. Remove invalid 32 and 64
bit variants.

Fixes PR34478.

Reviewers: rnk, craig.topper

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D39847

llvm-svn: 318678
2017-11-20 18:38:55 +00:00
Teresa Johnson 3309002a86 [SROA] Correctly invalidate analyses when dead instructions deleted
Summary:
SROA can fail in rewriting alloca but still rewrite a phi resulting
in dead instruction elimination. The Changed flag was not being set
correctly, resulting in downstream passes using stale analyses.
The included test case will assert during the second BDCE pass as a
result.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39921

llvm-svn: 318677
2017-11-20 18:33:38 +00:00
Dmitry Preobrazhensky a0342dc9eb [AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}
See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765

Reviewers: tamazov, SamWot, arsenm, vpykhtin

Differential Revision: https://reviews.llvm.org/D40088

llvm-svn: 318675
2017-11-20 18:24:21 +00:00
Evgeniy Stepanov 8e7018d92f [asan] Use dynamic shadow on 32-bit Android, try 2.
Summary:
This change reverts r318575 and changes FindDynamicShadowStart() to
keep the memory range it found mapped PROT_NONE to make sure it is
not reused. We also skip MemoryRangeIsAvailable() check, because it
is (a) unnecessary, and (b) would fail anyway.

Reviewers: pcc, vitalybuka, kcc

Subscribers: srhines, kubamracek, mgorny, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D40203

llvm-svn: 318666
2017-11-20 17:41:57 +00:00
Tony Jiang f75f4d6573 [MachineCSE] Add new callback for is caller preserved or constant physregs
The instructions addis,addi, bl are used to calculate the address of TLS thread
local variables. These TLS access code sequences are generated repeatedly every
time the thread local variable is accessed. By communicating to Machine CSE that
X2 is guaranteed to have the same value within the same function call (so called
Caller Preserved Physical Register), the redundant TLS access code sequences are
cleaned up.

Differential Revision: https://reviews.llvm.org/D39173

llvm-svn: 318661
2017-11-20 16:55:07 +00:00
Yaxun Liu 45d25e12ab [AMDGPU] Update test r600.amdgpu-alias-analysis.ll
Manually update test r600.amdgpu-alias-analysis.ll for amdgiz environment
since it cannot be done by script.

The two pointers are swapped in the output because PrintResults in
AliasAnalysisEvaluator.cpp sorts the strings obtained from printAsOperand
before printing them.

Differential Revision: https://reviews.llvm.org/D40131

llvm-svn: 318660
2017-11-20 16:53:13 +00:00
Simon Dardis 1631d6ce13 [mips] Reorder target specific passes
Move the hazard scheduling pass to after the long branch pass, as the
long branch pass can create forbiddden slot hazards. Rather than complicating
the implementation of the long branch pass to handle forbidden slot hazards,
just reorder the passes.

llvm-svn: 318657
2017-11-20 15:59:18 +00:00
Jonas Paulsson 12e3a58842 [SystemZ] Bugfix for handling of subregisters in getRegAllocationHints().
The 32 bit subreg indices of GR128 registers must also be checked for in
getRC32().

Review: Ulrich Weigand.
llvm-svn: 318652
2017-11-20 14:54:03 +00:00
Tony Jiang 438bf4a66b [PPC] Heuristic to choose between a X-Form VSX ld/st vs a X-Form FP ld/st.
The VSX versions have the advantage of a full 64-register target whereas the FP
ones have the advantage of lower latency and higher throughput. So what we’re
after is using the faster instructions in low register pressure situations and
using the larger register file in high register pressure situations.

The heuristic chooses between the following 7 pairs of instructions.
PPC::LXSSPX vs PPC::LFSX
PPC::LXSDX vs PPC::LFDX
PPC::STXSSPX vs PPC::STFSX
PPC::STXSDX vs PPC::STFDX
PPC::LXSIWAX vs PPC::LFIWAX
PPC::LXSIWZX vs PPC::LFIWZX
PPC::STXSIWX vs PPC::STFIWX

Differential Revision: https://reviews.llvm.org/D38486

llvm-svn: 318651
2017-11-20 14:38:30 +00:00
Sander de Smalen 0c5a29b6be [AArch64][TableGen] Skip tied result operands for InstAlias
Summary:
This patch fixes an issue so that the right alias is printed when the instruction has tied operands. It checks the number of operands in the resulting instruction as opposed to the alias, and then skips over tied operands that should not be printed in the alias.

This allows to generate the preferred assembly syntax for the AArch64 'ins' instruction, which should always be displayed as 'mov' according to the ARM Architecture Reference Manual. Several unit tests have changed as a result, but only to reflect the preferred disassembly. Some other InstAlias patterns (movk/bic/orr) needed a slight adjustment to stop them becoming the default and breaking other unit tests.

Please note that the patch is mostly the same as https://reviews.llvm.org/D29219 which was reverted because of an issue found when running TableGen with the Address Sanitizer. That issue has been addressed in this iteration of the patch.


Reviewers: rengolin, stoklund, huntergr, SjoerdMeijer, rovka

Reviewed By: rengolin, SjoerdMeijer

Subscribers: fhahn, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D40030

llvm-svn: 318650
2017-11-20 14:36:40 +00:00
Valery Pykhtin f2fe9725ea AMDGPU: Partial ILP scheduler port from SelectionDAG to SchedulingDAG (experimental)
Differential revision: https://reviews.llvm.org/D39897

llvm-svn: 318649
2017-11-20 14:35:53 +00:00
Gil Rapaport 8b9d1f3c5b [LV] Model masking in VPlan, introducing VPInstructions
This patch adds a new abstraction layer to VPlan and leverages it to model the planned
instructions that manipulate masks (AND, OR, NOT), introduced during predication.

The new VPValue and VPUser classes model how data flows into, through and out
of a VPlan, forming the vertices of a planned Def-Use graph. The new
VPInstruction class is a generic single-instruction Recipe that models a
planned instruction along with its opcode, operands and users. See
VectorizationPlan.rst for more details.

Differential Revision: https://reviews.llvm.org/D38676

llvm-svn: 318645
2017-11-20 12:01:47 +00:00
Diana Picus 3ac504035a [ARM GlobalISel] Add test for RSBri. NFC
Add instruction selector test for RSBri, which is derived from
AsI1_rbin_irs, and make sure it doesn't get mistaken for SUBri, which is
derived from the very similar AsI1_bin_irs pattern.

llvm-svn: 318643
2017-11-20 11:05:31 +00:00
Diana Picus 6db48f7d6b [ARM GlobalISel] Clean up binary operator tests. NFC
Remove some of the instruction selector tests for binary operators (and,
or, xor). These are all derived from the same kind of TableGen pattern,
AsI1_bin_irs, so there's no point in testing all of them.

llvm-svn: 318642
2017-11-20 10:35:35 +00:00
Mohammed Agabaria 115f68ea3e [LV][X86] Support of AVX2 Gathers code generation and update the LV with this
This patch depends on: https://reviews.llvm.org/D35348

Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen)
Update LoopVectorize to generate gathers for AVX2 processors.

Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb

Reviewed By: delena, RKSimon

Differential Revision: https://reviews.llvm.org/D35772

llvm-svn: 318641
2017-11-20 08:18:12 +00:00
Craig Topper 198f7d78d3 [X86] Regenerate a test with broadcast comments. NFC
llvm-svn: 318640
2017-11-20 08:15:04 +00:00
Max Kazantsev 268467869b [IRCE] Smart range intersection
In rL316552, we ban intersection of unsigned latch range with signed range check and vice
versa, unless the entire range check iteration space is known positive. It was a correct
functional fix that saved us from dealing with ambiguous values, but it also appeared
to be a very restrictive limitation. In particular, in the following case:

  loop:
    %iv = phi i32 [ 0, %preheader ], [ %iv.next, %latch]
    %iv.offset = add i32 %iv, 10
    %rc = icmp slt i32 %iv.offset, %len
    br i1 %rc, label %latch, label %deopt

  latch:
    %iv.next = add i32 %iv, 11
    %cond = icmp i32 ult %iv.next, 100
    br it %cond, label %loop, label %exit

Here, the unsigned iteration range is `[0, 100)`, and the safe range for range
check is `[-10, %len - 10)`. For unsigned iteration spaces, we use unsigned
min/max functions for range intersection. Given this, we wanted to avoid dealing
with `-10` because it is interpreted as a very big unsigned value. Semantically, range
check's safe range goes through unsigned border, so in fact it is two disjoint
ranges in IV's iteration space. Intersection of such ranges is not trivial, so we prohibited
this case saying that we are not allowed to intersect such ranges.

What semantics of this safe range actually means is that we can start from `-10` and go
up increasing the `%iv` by one until we reach `%len - 10` (for simplicity let's assume that
`%len - 10`  is a reasonably big positive value).

In particular, this safe iteration space includes `0, 1, 2, ..., %len - 11`. So if we were able to return
safe iteration space `[0, %len - 10)`, we could safely intersect it with IV's iteration space. All
values in this range are non-negative, so using signed/unsigned min/max for them is unambiguous.

In this patch, we alter the algorithm of safe range calculation so that it returnes a subset of the
original safe space which is represented by one continuous range that does not go through wrap.
In order to reach this, we use modified SCEV substraction function. It can be imagined as a function
that substracts by `1` (or `-1`) as long as the further substraction does not cause a wrap in IV iteration
space. This allows us to perform IRCE in many situations when we deal with IV space and range check
of different types (in terms of signed/unsigned).

We apply this approach for both matching and not matching types of IV iteration space and the
range check. One implication of this is that now IRCE became smarter in detection of empty safe
ranges. For example, in this case:
  loop:
    %iv = phi i32 [ %begin, %preheader ], [ %iv.next, %latch]
    %iv.offset = sub i32 %iv, 10
    %rc = icmp ult i32 %iv.offset, %len
    br i1 %rc, label %latch, label %deopt

  latch:
    %iv.next = add i32 %iv, 11
    %cond = icmp i32 ult %iv.next, 100
    br it %cond, label %loop, label %exit

If `%len` was less than 10 but SCEV failed to trivially prove that `%begin - 10 >u %len- 10`,
we could end up executing entire loop in safe preloop while the main loop was still generated,
but never executed. Now, cutting the ranges so that if both `begin - 10` and `%len - 10` overflow,
we have a trivially empty range of `[0, 0)`. This in some cases prevents us from meaningless optimization.

Differential Revision: https://reviews.llvm.org/D39954

llvm-svn: 318639
2017-11-20 06:07:57 +00:00
Serguei Katkov 505359f705 [CGP] Fix the crash caused by enable of complex addr mode
We must collect all AddModes even if they are the same.
This is due to Original value is different but we need all original
values collected as they are used as anchors in common phi finding.

Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40166

llvm-svn: 318638
2017-11-20 05:42:36 +00:00
Sanjay Patel c0218923e1 [x86] add sqrt tests for partially-inline-libcalls (PR31455)
llvm-svn: 318630
2017-11-19 17:31:37 +00:00
Sanjay Patel 9771a96f6e [LibCallSimplifier] allow splat vectors for pow(x, 0.5) -> sqrt() transforms
llvm-svn: 318629
2017-11-19 16:42:27 +00:00
Sanjay Patel fbd3e66b9a [LibCallSimplifier] partly fix pow(x, 0.5) -> sqrt() transforms
As the first test shows, we could transform an llvm intrinsic which never sets errno 
into a libcall which could set errno (even though it's marked readnone?), so that's 
not ideal.

It's possible that we can also transform a libcall which could set errno to an
intrinsic given the fast-math-flags constraint, but that's deferred to determine
exactly which set of FMF are needed.

Differential Revision: https://reviews.llvm.org/D40150

llvm-svn: 318628
2017-11-19 16:13:14 +00:00
Sanjay Patel eb731b09f3 [InstSimplify] fold and/or of fcmp ord/uno when operand is known nnan
The 'ord' and 'uno' predicates have a logic operation for NAN built into their definitions:

FCMP_ORD   =  7,  ///< 0 1 1 1    True if ordered (no nans)
FCMP_UNO   =  8,  ///< 1 0 0 0    True if unordered: isnan(X) | isnan(Y)

So we can simplify patterns like this:

(fcmp ord (known NNAN), X) && (fcmp ord X, Y) --> fcmp ord X, Y
(fcmp uno (known NNAN), X) || (fcmp uno X, Y) --> fcmp uno X, Y

It might be better to split this into (X uno 0) | (Y uno 0) as a canonicalization, but that
would be another patch.

Differential Revision: https://reviews.llvm.org/D40130 

llvm-svn: 318627
2017-11-19 15:34:27 +00:00
Craig Topper bece74c694 [X86] Add test cases for rndscaless/sd intrinsics.
Also fix the memop in the ins for these instructions. Not sure what effect this has.

llvm-svn: 318624
2017-11-19 06:24:26 +00:00
Craig Topper 512e9e7f3f [X86] Improve load folding of scalar rcp28 and rsqrt28 instructions using sse_load_f32/f64.
llvm-svn: 318623
2017-11-19 05:42:54 +00:00
Alexei Starovoitov 9a67245d88 [bpf] allow direct and indirect calls
kernel verifier is becoming smarter and soon will support
direct and indirect function calls.
Remove obsolete error from BPF backend.
Make call to use PCRel_4 fixup.
'bpf to bpf' calls are distinguished from 'bpf to kernel' calls
by insn->src_reg == BPF_PSEUDO_CALL == 1 which is used as relocation
indicator similar to ld_imm64->src_reg == BPF_PSEUDO_MAP_FD == 1
The actual 'call' instruction remains the same for both
'bpf to kernel' and 'bpf to bpf' calls.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 318614
2017-11-19 01:35:00 +00:00
Craig Topper 9a94dfc457 [X86] Switch cannonlake to use the SkylakeServer scheduling model instead of Haswell.
Cannonlake comes after skylake and supports avx512 so this is probably a closer model for now.

llvm-svn: 318613
2017-11-19 01:25:30 +00:00
Craig Topper 81037f385e [X86] Add skeleton support for icelake CPU.
There are several patches out for review right now to implement Icelake features. This adds a CPU to collect them under.

llvm-svn: 318612
2017-11-19 01:12:00 +00:00
Simon Pilgrim 72f40041f6 [MC][X86] Add test case from PR19251
llvm-svn: 318605
2017-11-18 23:23:25 +00:00
Simon Pilgrim 6960fe2e4e [MC][X86] Add teet case from PR32807
llvm-svn: 318603
2017-11-18 23:06:42 +00:00
Simon Pilgrim 41fc45c4e6 [X86][AVX512VL] Add AVX512VL tests to the vselect packss tests.
PR34553 has gone, adding tests to ensure it doesn't come back.

vselect_packss_v16i64 still has some awful codegen on AVX512 targets....

llvm-svn: 318599
2017-11-18 19:47:59 +00:00
Craig Topper 65b8a2c9e1 [X86] Add another gather test with v8i8 sign extended indices.
This requires the indices to be legalized and sign extended.

llvm-svn: 318597
2017-11-18 19:25:35 +00:00
Florian Hahn 2a266a343f [CallSiteSplitting] Remove some indirection (NFC).
Summary:
With this patch I tried to reduce the complexity of the code sightly, by
removing some indirection. Please let me know what you think.

Reviewers: junbuml, mcrosier, davidxl

Reviewed By: junbuml

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40037

llvm-svn: 318593
2017-11-18 18:14:13 +00:00
Sanjay Patel c0d1b2ee2e [x86] add tests for unnecessary shuffling; NFC
llvm-svn: 318592
2017-11-18 16:25:38 +00:00
Martin Storsjo 94b59240e2 [X86] Output cfi directives for saved XMM registers even if no GPRs are saved
This makes sure that functions that only clobber xmm registers
(on win64) also get the right cfi directives, if dwarf exceptions
are enabled.

Differential Revision: https://reviews.llvm.org/D40191

llvm-svn: 318591
2017-11-18 06:23:48 +00:00