Commit Graph

140751 Commits

Author SHA1 Message Date
Matt Arsenault d4bb5e4831 AMDGPU: Enable store clustering
Also respect the TII hook for these like the generic code does
in case we want a flag later to disable this.

llvm-svn: 287021
2016-11-15 20:22:55 +00:00
Haicheng Wu faee2b71a7 [AArch64] Lower multiplication by a constant int to shl+add+shl
Lower a = b * C where C = (2^n + 1) * 2^m to

add     w0, w0, w0, lsl n
lsl     w0, w0, m

Differential Revision: https://reviews.llvm.org/D229245

llvm-svn: 287019
2016-11-15 20:16:48 +00:00
Matt Arsenault 3666629837 AMDGPU: Analyze mubuf with immediate soffset
Fixes giving up on clustering common addr64 accesses with
constant 0 soffset.

llvm-svn: 287018
2016-11-15 20:14:27 +00:00
Matt Arsenault 12c53897f3 AMDGPU: Fix return after else
llvm-svn: 287015
2016-11-15 19:58:54 +00:00
Wei Mi 37c4aaaf52 Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.
llvm-svn: 287014
2016-11-15 19:42:05 +00:00
Matt Arsenault 92b355b1a9 AMDGPU: Replace assert(false) with unreachable
llvm-svn: 287013
2016-11-15 19:34:37 +00:00
Davide Italiano 926d516179 [ELF] Rewrite isMips64EL() using isMipsELF64(). NFCI.
llvm-svn: 287011
2016-11-15 19:15:18 +00:00
Stanislav Mekhanoshin ea91cca593 [AMDGPU] Add wave barrier builtin
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.

Differential Revision: https://reviews.llvm.org/D26585

llvm-svn: 287007
2016-11-15 19:00:15 +00:00
Sanjay Patel 22465125b3 [x86] auto-generate checks; NFC
Also, fix the test params to use an attribute rather than a CPU model
and remove the AVX run because that does nothing but check for a 'v'
prefix in all of these tests.

llvm-svn: 287003
2016-11-15 18:44:53 +00:00
Wei Mi 7ccf7651c0 [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.

Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.

Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.

But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.

From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.

Differential Revision: https://reviews.llvm.org/D26429

llvm-svn: 286999
2016-11-15 18:35:53 +00:00
Pawel Bylica c3f6c97f71 Integer legalization: fix MUL expansion
Summary:
This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720.

For tests I created a fuzz tester that compares the results with Boost.Multiprecision.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26628

llvm-svn: 286998
2016-11-15 18:29:24 +00:00
Zaara Syeda a19c9e60e9 vector load store with length (left justified) llvm portion
llvm-svn: 286993
2016-11-15 17:54:19 +00:00
Sanjay Patel 73d1d35d21 fix formatting; NFC
llvm-svn: 286989
2016-11-15 17:47:13 +00:00
Wei Mi d2948cef70 [IndVars] Change the order to compute WidenAddRec in widenIVUse.
When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence
return non-null but different WideAddRec, if getWideRecurrence is called
before getExtendedOperandRecurrence, we won't bother to call
getExtendedOperandRecurrence again. But As we know it is possible that after
SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned
by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null
WideAddRec, we know for sure that it is legal to do widening for current instruction.
So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which
will increase the chance of successful widening.

Differential Revision: https://reviews.llvm.org/D26059

llvm-svn: 286987
2016-11-15 17:34:52 +00:00
Diana Picus 895c6aa6fd [ARM] GlobalISel: Remove unused members. NFCI
This silences some warnings that I didn't see with my host compiler.

llvm-svn: 286981
2016-11-15 16:42:10 +00:00
Craig Topper c7486af9c9 [AVX-512] Add AVX-512 vector shift intrinsics to memory santitizer.
Just needed to add the intrinsics to the exist switch. The code is generic enough to support the wider vectors with no changes.

llvm-svn: 286980
2016-11-15 16:27:33 +00:00
Simon Pilgrim ceffb43b1b [X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)
This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic.

This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases.

Fix for PR13248

Differential Revision: https://reviews.llvm.org/D26583

llvm-svn: 286979
2016-11-15 16:24:40 +00:00
Sanjay Patel bb238bb4e5 [InstCombine] add tests for bitcasted selects; NFC
llvm-svn: 286978
2016-11-15 16:01:16 +00:00
Pablo Barrio 4f80c93a2e Revert "[JumpThreading] Unfold selects that depend on the same condition"
This reverts commit ac54d0066c478a09c7cd28d15d0f9ff8af984afc.

llvm-svn: 286976
2016-11-15 15:42:23 +00:00
Pablo Barrio 5f782bb048 Revert "[JumpThreading] Prevent non-deterministic use lists"
This reverts commit f2c2f5354070469dac253373c66527ca971ddc66.

llvm-svn: 286975
2016-11-15 15:42:17 +00:00
Diana Picus 90f0a84943 [ARM] Make sure GlobalISel is only initialized once. NFCI
Move some code inside the proper 'if' block to make sure it is only run once,
when the subtarget is first created. Things can still break if we use different
ARM target machines or if we have functions with different 'target-cpu' or
'target-features', we should fix that too in the future.

llvm-svn: 286974
2016-11-15 15:38:15 +00:00
Robert Lougher b0905209dd [LoopVectorizer] When estimating reg usage, unused insts may "end" another use
The register usage algorithm incorrectly treats instructions whose value is
not used within the loop (e.g. those that do not produce a value).

The algorithm first calculates the usages within the loop.  It iterates over
the instructions in order, and records at which instruction index each use
ends (in fact, they're actually recorded against the next index, as this is
when we want to delete them from the open intervals).

The algorithm then iterates over the instructions again, adding each
instruction in turn to a list of open intervals.  Instructions are then
removed from the list of open intervals when they occur in the list of uses
ended at the current index.

The problem is, instructions which are not used in the loop are skipped.
However, although they aren't used, the last use of a value may have been
recorded against that instruction index.  In this case, the use is not deleted
from the open intervals, which may then bump up the estimated register usage.

This patch fixes the issue by simply moving the "is used" check after the loop
which erases the uses at the current index.

Differential Revision: https://reviews.llvm.org/D26554

llvm-svn: 286969
2016-11-15 14:27:33 +00:00
Tony Jiang 5f850cd1b1 [PowerPC] Implement BE VSX load/store builtins - llvm portion.
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.

llvm-svn: 286967
2016-11-15 14:25:56 +00:00
Diana Picus 3776e76201 Get GlobalISel to build on Linux after r286407
r286407 has introduced calls to llvm::AddLandingPadInfo, which lives in the
SelectionDAG component. Add it to LLVMBuild to avoid linker failures on Linux.

llvm-svn: 286962
2016-11-15 14:11:11 +00:00
Zvi Rackover 6f76f46d2c [X86][FastISel] Assert that we are dealing with arithmetic with overflow intrinsics. NFC
llvm-svn: 286961
2016-11-15 13:50:35 +00:00
Sam Kolton c01faa383f [AMDGPU] TableGen: change individual instruction flags to bit type from bits<1>
Summary: This is needed to be able to use this flags in InstrMappings.

Reviewers: tstellarAMD, vpykhtin

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D26666

llvm-svn: 286960
2016-11-15 13:39:07 +00:00
Zvi Rackover f0b9b57bd3 [X86][FastISel] Fix lowering of overflow result on AVX512 targets
Summary:
    Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class.
    We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction.

    Fixes pr30981.

    Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet

    Subscribers: qcolombet, llvm-commits

    Differential Revision: https://reviews.llvm.org/D26620

llvm-svn: 286958
2016-11-15 13:29:23 +00:00
Florian Hahn 4b4dc172e7 Test commit, remove trailing space.
This commit is used to test commit access.

llvm-svn: 286957
2016-11-15 13:28:42 +00:00
Rafael Espindola bd0462678a clang format include/llvm/Support/ELF.h. NFC.
llvm-svn: 286956
2016-11-15 13:21:32 +00:00
NAKAMURA Takumi 220bf0fb97 DWARFAbbreviationDeclaration.h: Fix a typo in r286924. [-Wdocumentation]
llvm-svn: 286954
2016-11-15 13:16:50 +00:00
Joerg Sonnenberger 1a7eec68a9 Introduce TLI predicative for base-relative Jump Tables.
For 64bit ABIs it is common practice to use relative Jump Tables with
potentially different relocation bases.  As the logic for the jump table
itself doesn't depend on the relocation base, make it easier for targets
to use the generic logic. Start by dropping the now redundant MIPS logic.

Differential Revision: https://reviews.llvm.org/D26578

llvm-svn: 286951
2016-11-15 12:39:46 +00:00
Javed Absar f043dac25d [ARM] Add machine scheduler for Cortex-R52
This patch adds the Sched Machine Model for Cortex-R52.

Details of the pipeline and descriptions are in comments
in file ARMScheduleR52.td included in this patch.

Reviewers: rengolin, jmolloy

Differential Revision: https://reviews.llvm.org/D26500

llvm-svn: 286949
2016-11-15 11:34:54 +00:00
Daniel Sanders a3e1125a0a Fix -Wunused introduced in r286945 for release builds.
llvm-svn: 286946
2016-11-15 10:13:09 +00:00
Daniel Sanders ea6ef3d3fa [tablegen] Extract portions of AsmMatcherEmitter for re-use by another generator. NFC.
Summary:
This change is preparation for a change that will allow targets to verify that the instructions
they emit meet the predicates they specify. This is useful to ensure that C++
legalization/lowering/instruction-selection doesn't incorrectly select code for a different
subtarget than intended. Such cases are not caught by the integrated assembler when emitting
instructions directly to an object file.

Reviewers: qcolombet

Subscribers: qcolombet, beanz, mgorny, llvm-commits, modocache

Differential Revision: https://reviews.llvm.org/D25614

llvm-svn: 286945
2016-11-15 09:51:02 +00:00
Adam Nemet 8741656efd [opt-viewer] Add support for libYAML for faster parsing
This results in a speed-up of over 6x on sqlite3.

Before:

$ time -p /org/llvm/utils/opt-viewer/opt-viewer.py ./MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.opt.yaml html
  real 415.07
  user 410.00
  sys 4.66

After with libYAML:

$ time -p /org/llvm/utils/opt-viewer/opt-viewer.py ./MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.opt.yaml html
  real 63.96
  user 60.03
  sys 3.67

I followed these steps to get libYAML working with PyYAML: http://rmcgibbo.github.io/blog/2013/05/23/faster-yaml-parsing-with-libyaml/

llvm-svn: 286942
2016-11-15 08:40:51 +00:00
Asaf Badouh b573553424 DAGCombiner: fix combine of trunc and select
bugzilla:
https://llvm.org/bugs/show_bug.cgi?id=29002
pr29002

Differential Revision: https://reviews.llvm.org/D26449


 

llvm-svn: 286938
2016-11-15 07:55:22 +00:00
Matt Arsenault 1c8d933881 TableGen: Add operator !or
llvm-svn: 286936
2016-11-15 06:49:28 +00:00
Zvi Rackover 76dbf26599 [X86][GlobalISel] Add minimal call lowering support to the IRTranslator
Summary:
    Add basic functionality to support call lowering for X86.
    Currently only supports functions which return void and take zero arguments.
    Inspired by commit 286573.

Reviewers: ab, qcolombet, t.p.northover

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26593

llvm-svn: 286935
2016-11-15 06:34:33 +00:00
Craig Topper 0637099f24 [AVX-512] Add an example test case for PR31018.
llvm-svn: 286934
2016-11-15 05:21:55 +00:00
Craig Topper e6915b85ed [X86] Add LLVM version number for each intrinsic handled by auto upgrade for age tracking.
One day we'd like to remove some of this autoupgrade support and it will be easier if we know how long some of it has been around.

Differential Revision: https://reviews.llvm.org/D26321

llvm-svn: 286933
2016-11-15 05:04:51 +00:00
Matt Arsenault c79dc70d50 AMDGPU: Fix f16 fabs/fneg
llvm-svn: 286931
2016-11-15 02:25:28 +00:00
Lang Hames 79d7f70dfd [ORC] Work around an apparent modules/linkage issue.
<rdar://problem/29247092>

llvm-svn: 286930
2016-11-15 02:14:57 +00:00
Rui Ueyama 6b77ad3546 Simplify identify_magic.
This patch defines a memcmp-ish helper function to simplify identify_magic.

llvm-svn: 286928
2016-11-15 01:57:05 +00:00
Greg Clayton 6f6e4dbd5d Improve DWARF parsing speed by improving DWARFAbbreviationDeclaration
This patch gets a DWARF parsing speed improvement by having DWARFAbbreviationDeclaration instances know if they have a fixed byte size. If an abbreviation has a fixed byte size that can be calculated given a DWARFUnit, then parsing a DIE becomes two steps: parse ULEB128 abbrev code, and then add constant size to the offset.

This patch also adds a fixed byte size to each DWARFAbbreviationDeclaration::AttributeSpec so that attributes can quickly skip their values if needed without the need to lookup the fixed for size.

Notable improvements:

- DWARFAbbreviationDeclaration::findAttributeIndex() now returns an Optional<uint32_t> instead of a uint32_t and we no longer have to look for the magic -1U return value
- Optional<uint32_t> DWARFAbbreviationDeclaration::findAttributeIndex(dwarf::Attribute attr) const;
- DWARFAbbreviationDeclaration now has a getAttributeValue() function that extracts an attribute value given a DIE offset that takes advantage of the DWARFAbbreviationDeclaration::AttributeSpec::ByteSize
- bool DWARFAbbreviationDeclaration::getAttributeValue(const uint32_t DIEOffset, const dwarf::Attribute Attr, const DWARFUnit &U, DWARFFormValue &FormValue) const;
- A DWARFAbbreviationDeclaration instance can return a fixed byte size for itself so DWARF parsing is faster:
- Optional<size_t> DWARFAbbreviationDeclaration::getFixedAttributesByteSize(const DWARFUnit &U) const;
- Any functions that used to take a "const DWARFUnit *U" that would crash if U was NULL now take a "const DWARFUnit &U" and are only called with a valid DWARFUnit

Differential Revision: https://reviews.llvm.org/D26567

llvm-svn: 286924
2016-11-15 01:23:06 +00:00
Rui Ueyama e97c34cb60 Fix -Wswitch.
llvm-svn: 286920
2016-11-15 00:58:50 +00:00
Rui Ueyama 2d02166b43 Add a file magic for CL.exe's object file created with /GL.
This patch makes it possible to identify object files created by CL.exe
with /GL option. Such file contains Microsoft proprietary intermediate
code instead of target machine code to do LTO.

I need this to print out user-friendly error message from LLD.

Differential Revision: https://reviews.llvm.org/D26645

llvm-svn: 286919
2016-11-15 00:54:54 +00:00
Lang Hames 07a6ae96fb [ORC] Temporarily disable RPCUtils unit test.
This broke s390x due to a bug in the QueueChannel implementation that led to it
infinite-looping. Disabling it while I look into a fix.

llvm-svn: 286917
2016-11-15 00:49:12 +00:00
Saleem Abdulrasool f7009b42f8 llvm-strings: support the `-n` option
Permit specifying the match length (the `-n` or `--bytes` option).  The
deprecated `-[length]` form is not supported as an option.  This allows the
strings tool to display only the specified length strings rather than the
hardcoded default length of >= 4.

llvm-svn: 286914
2016-11-15 00:43:52 +00:00
Matt Arsenault 81da114e65 AMDGPU: Set hasExtraSrcRegAllocReq on v_div_scale_*
This doesn't solve any problems I know about, but this should have
more conservative assumptions about the operands'

llvm-svn: 286913
2016-11-15 00:05:42 +00:00
Matt Arsenault 972034bda9 AMDGPU: Fix formatting of 1/2pi immediate
llvm-svn: 286912
2016-11-15 00:04:33 +00:00