Commit Graph

33388 Commits

Author SHA1 Message Date
Sanjay Patel e79b43a01f [x86] generalize reassociation optimization in machine combiner to 2 instructions
Currently ( D10321, http://reviews.llvm.org/rL239486 ), we can use the machine combiner pass
to reassociate the following sequence to reduce the critical path:

A = ? op ?
B = A op X
C = B op Y
-->
A = ? op ?
B = X op Y
C = A op B

'op' is currently limited to x86 AVX scalar FP adds (with fast-math on), but in theory, it could
be any associative math/logic op (see TODO in code comment).

This patch generalizes the pattern match to ignore the instruction that defines 'A'. So instead of
a sequence of 3 adds, we now only need to find 2 dependent adds and decide if it's worth
reassociating them.

This generalization has a compile-time cost because we can now match more instruction sequences
and we rely more heavily on the machine combiner to discard sequences where reassociation doesn't
improve the critical path.

For example, in the new test case:

A = M div N
B = A add X
C = B add Y

We'll match 2 reassociation patterns, but this transform doesn't reduce the critical path:

A = M div N
B = A add Y
C = B add X

We need the combiner to reject that pattern but select this:

A = M div N
B = X add Y
C = B add A

Differential Revision: http://reviews.llvm.org/D10460

llvm-svn: 240361
2015-06-23 00:39:40 +00:00
Simon Pilgrim 616fe5066a [X86][FMA4] FMA4 ops can perform unaligned folded loads.
llvm-svn: 240342
2015-06-22 21:49:41 +00:00
Tom Stellard f0296cee9b R600/SI: Use ELF64 format instead of ELF32
Reviewers: arsenm, rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10392

llvm-svn: 240331
2015-06-22 21:03:54 +00:00
Tom Stellard 3aed34e947 R600: Use EM_AMDGPU for the ELF Machine type
Reviewers: arsenm, rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10390

llvm-svn: 240330
2015-06-22 21:03:52 +00:00
Ahmed Bougacha ed3c4d1a3d [X86] Teach load folding to accept scalar _Int users of MOVSS/MOVSD.
The _Int instructions are special, in that they operate on the full
VR128 instead of FR32.  The load folding then looks at MOVSS, at the
user, and bails out when it sees a size mismatch.

What we really know is that the rm_Int instructions don't load the
higher lanes, so folding is fine.

This happens for the straightforward intrinsic code, e.g.:

    _mm_add_ss(a, _mm_load_ss(p));

Fixes PR23349.

Differential Revision: http://reviews.llvm.org/D10554

llvm-svn: 240326
2015-06-22 20:51:51 +00:00
Pete Cooper 80d21cb40d Change .thumb_set to have the same error checks as .set.
According to the documentation, .thumb_set is 'the equivalent of a .set directive'.

We didn't have equivalent behaviour in terms of all the errors we could throw, for
example, when a symbol is redefined.

This change refactors parseAssignment so that it can be used by .set and .thumb_set
and implements tests for .thumb_set for all the errors thrown by that method.

Reviewed by Rafael Espíndola.

llvm-svn: 240318
2015-06-22 19:35:57 +00:00
Sanjay Patel 09b2c890af [x86] set default reciprocal (division and square root) codegen to match GCC
D8982 ( checked in at http://reviews.llvm.org/rL239001 ) added command-line 
options to allow reciprocal estimate instructions to be used in place of
divisions and square roots.

This patch changes the default settings for x86 targets to allow that recip
codegen (except for scalar division because that breaks too much code) when
using -ffast-math or its equivalent. 

This matches GCC behavior for this kind of codegen.

Differential Revision: http://reviews.llvm.org/D10396

llvm-svn: 240310
2015-06-22 18:29:44 +00:00
Rafael Espindola 36b718fc74 Avoid a Symbol -> Name -> Symbol conversion.
Before this we were producing a TargetExternalSymbol from a MCSymbol.
That meant extracting the symbol name and fetching the symbol again
down the pipeline.

This patch adds a DAG.getMCSymbol that lets the MCSymbol pass unchanged on the
DAG.

Doing so removes the need for MO_NOPREFIX and fixes the root cause of pr23900,
allowing r240130 to be committed again.

llvm-svn: 240300
2015-06-22 17:46:53 +00:00
Toma Tabacu 8e0316d439 [mips] [IAS] Add support for LAReg with identical source and destination register operands.
Summary: In this case, we're supposed to load the immediate in AT and then ADDu it with the source register and put it in the destination register.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9367

llvm-svn: 240278
2015-06-22 13:10:23 +00:00
Elena Demikhovsky 55a997437c AVX-512: added VPSHUFB instruction - all SKX forms
Added intrinsics and encoding tests.

llvm-svn: 240277
2015-06-22 13:00:42 +00:00
Toma Tabacu fb9d125592 [mips] [IAS] Add support for LASym with identical source and destination register operands.
Summary:
In this case, we're supposed to load the address of the symbol in AT and then ADDu it with the source register and
put it in the destination register.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9366

llvm-svn: 240273
2015-06-22 12:08:39 +00:00
Elena Demikhovsky ba5ab328e5 AVX-512: All forms of VCOPMRESS VEXPAND instructions,
encoding tests.

llvm-svn: 240272
2015-06-22 11:16:30 +00:00
Elena Demikhovsky d78609a7ac Reverted AVX-512 vector shuffle
llvm-svn: 240258
2015-06-22 09:01:15 +00:00
Michael Kuperstein fc21951cd7 [X86] Allow more call sequences to use push instructions for argument passing
This allows more call sequences to use pushes instead of movs when optimizing for size.
In particular, calling conventions that pass some parameters in registers (e.g. thiscall) are now supported.

Differential Revision: http://reviews.llvm.org/D10500

llvm-svn: 240257
2015-06-22 08:31:22 +00:00
Elena Demikhovsky e77566112c AVX-512: Added intrinsics for VPERMT2W/D/Q/PS/PD and
VPERMI2W/D/Q/PS/PD instructions.
Added tests.

llvm-svn: 240256
2015-06-22 06:45:48 +00:00
Simon Pilgrim 5055f6dfe0 [X86] Code tidyup - Use SDValue bool operator. NFC.
llvm-svn: 240249
2015-06-21 21:34:32 +00:00
Simon Pilgrim 056cbfe58d [X86][SSE] Fix PerformSExtCombine bug that accessed the wrong return value of an aggregate type.
Fix to rL237885 to ensure that it accesses the correct return value of an aggregate type.

llvm-svn: 240223
2015-06-20 16:19:24 +00:00
Benjamin Kramer e7561b8fe3 [PPC] Factor vector removal into a function and remove O(n^2) behavior.
No functionality change intended.

llvm-svn: 240222
2015-06-20 15:59:41 +00:00
Sanjay Patel cfe0393b82 name change: hasPattern() -> getMachineCombinerPatterns() ; NFC
This was suggested as part of D10460, but it's independent of
any functional change.

llvm-svn: 240192
2015-06-19 23:21:42 +00:00
Rafael Espindola 3dc0d05bf4 Improve error handling of getRelocationAddend.
This patch changes getRelocationAddend to use ErrorOr and considers it an error
to try to get the addend of a REL section.

If, for example, a x86_64 file has a REL section, that file is corrupted and
we should reject it.

Using ErrorOr is not ideal since we check the section type once per relocation
instead of once per section.

Checking once per section would involve getRelocationAddend just asserting and
callers checking the section before iterating over the relocations.

In any case, this is an improvement and includes a test.

llvm-svn: 240176
2015-06-19 20:58:43 +00:00
Alexander Kornienko 70bc5f1398 Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
  -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
  llvm/lib/


Thanks to Eugene Kosov for the original patch!

llvm-svn: 240137
2015-06-19 15:57:42 +00:00
Ahmed Bougacha 9a9094260d [ARM] Look through concat when lowering in-place shuffles (VZIP, ..)
Currently, we canonicalize shuffles that produce a result larger than
their operands with:
  shuffle(concat(v1, undef), concat(v2, undef))
->
  shuffle(concat(v1, v2), undef)

because we can access quad vectors (see PerformVECTOR_SHUFFLECombine).

This is useful in the general case, but there are special cases where
native shuffles produce larger results: the two-result ops.

We can look through the concat when lowering them:
  shuffle(concat(v1, v2), undef)
->
  concat(VZIP(v1, v2):0, :1)

This lets us generate the native shuffles instead of scalarizing to
dozens of VMOVs.

Differential Revision: http://reviews.llvm.org/D10424

llvm-svn: 240118
2015-06-19 02:32:35 +00:00
Ahmed Bougacha 2ffa91f908 [ARM] Factor out two-result shuffle matching. NFCI.
In preparation for a future patch: makes it easier to do the same
matching to generate different nodes, without duplication.

llvm-svn: 240116
2015-06-19 02:25:01 +00:00
Eric Christopher 572e03a396 Fix "the the" in comments.
llvm-svn: 240112
2015-06-19 01:53:21 +00:00
Sanjay Patel cf5c220da5 use SDValue bool operator; NFCI
llvm-svn: 240064
2015-06-18 21:44:31 +00:00
Colin LeMahieu fa38972063 [Hexagon] Fixing unused field copypasta.
llvm-svn: 240055
2015-06-18 21:03:13 +00:00
Colin LeMahieu d2158755eb [Hexagon] Printing packet brackets when asm printing and adding a number of tests that test packet brackets.
llvm-svn: 240051
2015-06-18 20:43:50 +00:00
Reid Kleckner 034ea96aa3 [X86] Rename RegInfo to TRI as suggested by Eric
llvm-svn: 240047
2015-06-18 20:32:02 +00:00
Reid Kleckner 98d7803291 [X86] Refactor stack adjustments into X86FrameLowering::BuildStackAdjustment
Deduplicates some code and lets us use LEA on atom when adjusting the
stack around callee-cleanup calls. This is the only intended
functionality change.

llvm-svn: 240044
2015-06-18 20:22:12 +00:00
Reid Kleckner 3854f7bc27 [X86] Remove unneeded parameters and deduplicate stack alignment code
NFC

llvm-svn: 240033
2015-06-18 18:03:25 +00:00
James Y Knight f90346f8f6 [SPARC] Repair GOT references to internal symbols.
They had been getting emitted as a section + offset reference, which
is bogus since the value needs to be the offset within the GOT, not
the actual address of the symbol's object.

Differential Revision: http://reviews.llvm.org/D10441

llvm-svn: 240020
2015-06-18 15:05:15 +00:00
Asaf Badouh 23c6283ae3 quick fix for failure from r.240012
failure:
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/11847/steps/build_Lld/logs/stdio

llvm-svn: 240015
2015-06-18 12:57:24 +00:00
Asaf Badouh 81f03c30a5 [AVX512]
add instructions: VPAVGB and VPAVGW


review
http://reviews.llvm.org/D10504

llvm-svn: 240012
2015-06-18 12:30:53 +00:00
Elena Demikhovsky d3057e5e37 AVX-512: (fixed) Added encoding of all forms of VPERMT2W/D/Q/PS/PD and VPERMI2W/D/Q/PS/PD.
Intrinsics and tests for them are comming in the next patch.

llvm-svn: 240003
2015-06-18 08:56:19 +00:00
Elena Demikhovsky 4f13f3f9b8 reverted 239999 due to test failures
llvm-svn: 240001
2015-06-18 08:06:49 +00:00
Elena Demikhovsky 975a637cd9 AVX-512: Added encoding of all forms of VPERMT2W/D/Q/PS/PD
and VPERMI2W/D/Q/PS/PD.
Intrinsics and tests for them are comming in the next patch.

llvm-svn: 239999
2015-06-18 07:29:40 +00:00
Simon Pilgrim 3aa039a4a8 [X86][SSE] Improved support for vector i16 to float conversions.
Added explicit sign extension for v4i16/v8i16 to v4i32/v8i32 before conversion to floats. Matches existing support for v4i8/v8i8.

Follow up to D10433

llvm-svn: 239966
2015-06-17 22:43:34 +00:00
Jingyue Wu cd3afea451 Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address
Summary:
This is done by first adding two additional instructions to convert the
alloca returned address to local and convert it back to generic. Then
replace all uses of alloca instruction with the converted generic
address. Then we can rely NVPTXFavorNonGenericAddrSpace pass to combine
the generic addresscast and the corresponding Load, Store, Bitcast, GEP
Instruction together.

Patched by Xuetian Weng (xweng@google.com). 

Test Plan: test/CodeGen/NVPTX/lower-alloca.ll

Reviewers: jholewinski, jingyue

Reviewed By: jingyue

Subscribers: meheff, broune, eliben, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10483

llvm-svn: 239964
2015-06-17 22:31:02 +00:00
Reid Kleckner f9977bfb23 Re-land "[X86] Cache variables that only depend on the subtarget"
Re-instates r239949 without accidentally flipping the sense of UseLEA.

llvm-svn: 239950
2015-06-17 21:50:02 +00:00
Reid Kleckner 09543c2998 Revert "[X86] Cache variables that only depend on the subtarget"
This reverts commit r239948, tests seem to be failing.

llvm-svn: 239949
2015-06-17 21:35:02 +00:00
Reid Kleckner 05b39483c1 [X86] Cache variables that only depend on the subtarget
There is a one-to-one relationship between X86Subtarget and
X86FrameLowering, but every frame lowering method would previously pull
the subtarget off the MachineFunction and query some subtarget
properties.

Over time, these locals began to grow in complexity and it became
important to keep their names and meaning in sync across all of the
frame lowering methods, leading to duplication. We can eliminate that
duplication by computing them once in the constructor.

llvm-svn: 239948
2015-06-17 21:31:17 +00:00
Matt Arsenault 417c93e3c1 AMDGPU: Change unreachable into reported error
llvm-svn: 239943
2015-06-17 20:55:25 +00:00
David Majnemer 7fddeccb8b Move the personality function from LandingPadInst to Function
The personality routine currently lives in the LandingPadInst.

This isn't desirable because:
- All LandingPadInsts in the same function must have the same
  personality routine.  This means that each LandingPadInst beyond the
  first has an operand which produces no additional information.

- There is ongoing work to introduce EH IR constructs other than
  LandingPadInst.  Moving the personality routine off of any one
  particular Instruction and onto the parent function seems a lot better
  than have N different places a personality function can sneak onto an
  exceptional function.

Differential Revision: http://reviews.llvm.org/D10429

llvm-svn: 239940
2015-06-17 20:52:32 +00:00
Rafael Espindola dfe2d359c5 Move IsUsedInReloc from MCSymbolELF to MCSymbol.
There is a free bit is MCSymbol and MachO needs the same information.

llvm-svn: 239933
2015-06-17 20:08:20 +00:00
Toma Tabacu f712ede932 [mips] [IAS] Add support for expanding LASym with a source register operand.
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9348

llvm-svn: 239910
2015-06-17 14:31:51 +00:00
Toma Tabacu 1a1083285c [mips] [IAS] Add support for the B{L,G}{T,E}(U) branch pseudo-instructions.
Summary:
This does not include support for the immediate variants of these pseudo-instructions.
Fixes llvm.org/PR20968.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: seanbruno, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D8537

llvm-svn: 239905
2015-06-17 13:20:24 +00:00
Toma Tabacu 9e7b90c244 [mips] [IAS] Fix LA with relative label operands.
Summary:
Call MCSymbolRefExpr::create() with a MCSymbol* argument, not with a StringRef
of the Symbol's name, in order to avoid creating invalid temporary symbols for
relative labels (e.g. {$,.L}tmp00, {$,.L}tmp10 etc.).

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10498

llvm-svn: 239901
2015-06-17 12:30:37 +00:00
Toma Tabacu 07c97b3b7e [mips] [IAS] Fix LW with relative label operands.
Summary:
Previously, MCSymbolRefExpr::create() was called with a StringRef of the symbol
name, which it would then search for in the Symbols StringMap (from MCContext).

However, relative labels (which are temporary symbols) are apparently not stored
in the Symbols StringMap, so we end up creating a new {$,.L}tmp symbol
({$,.L}tmp00, {$,.L}tmp10 etc.) each time we create an MCSymbolRefExpr by
passing in the symbol name as a StringRef.

Fortunately, there is a version of MCSymbolRefExpr::create() which takes an
MCSymbol* and we already have an MCSymbol* at that point, so we can just pass
that in instead of the StringRef.

I also removed the local StringRef calls to MCSymbolRefExpr::create() from
expandMemInst(), as those cases can be handled by evaluateRelocExpr() anyway.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9938

llvm-svn: 239897
2015-06-17 10:43:45 +00:00
Igor Breger dfcc3d31a7 AVX-512: cvtusi2ss/d intrinsics.
Change builtin function name and signature ( add third parameter - rounding mode ).
Added tests for intrinsics.

Differential Revision: http://reviews.llvm.org/D10473

llvm-svn: 239888
2015-06-17 07:23:57 +00:00
Chandler Carruth ac80dc7532 [PM/AA] Remove the Location typedef from the AliasAnalysis class now
that it is its own entity in the form of MemoryLocation, and update all
the callers.

This is an entirely mechanical change. References to "Location" within
AA subclases become "MemoryLocation", and elsewhere
"AliasAnalysis::Location" becomes "MemoryLocation". Hope that helps
out-of-tree folks update.

llvm-svn: 239885
2015-06-17 07:18:54 +00:00