Commit Graph

42488 Commits

Author SHA1 Message Date
Reid Kleckner 0ad69fc89f Revert "[X86] Replace slow LEA instructions in X86"
This reverts commit r303183, it broke various buildbots and introduced
sanitizer errors.

llvm-svn: 303199
2017-05-16 19:55:03 +00:00
Renato Golin d69570e017 Revert "[ARM] Mark LEApcrel instructions as isAsCheapAsAMove"
Revert "[ARM] Mark LEApcrel as not having side effects"

This reverts commit r303054 and r303053, as they broke the ARM
self-hosting buildbots:

http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/1550

http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/1349

http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/1845

Offline investigation on course.

llvm-svn: 303193
2017-05-16 17:59:07 +00:00
Stanislav Mekhanoshin acca0f5c02 [AMDGPU] Use GCNRPTracker dumper methods in scheduler
Differential Revision: https://reviews.llvm.org/D33244

llvm-svn: 303186
2017-05-16 16:31:45 +00:00
Stanislav Mekhanoshin b10860788f [AMDGPU] Cache live-ins and register pressure in scheduler
Using LIS can be quite expensive, so caching of calculated region
live-ins and pressure is implemented. It does two things:

1. Caches the info for the second stage when we schedule with
   decreased target occupancy.
2. Tracks the basic block from top to bottom thus eliminating the
   need to scan whole register file liveness at every region split
   in the middle of the block.

The scheduling is now done in 3 stages instead of two, with the first
one being really a no-op and only used to collect scheduling regions
as sent by the scheduler driver.

There is no functional change to the current behavior, only compilation
speed is affected. In general computeBlockPressure() could be simplified
if we switch to backward RP tracker, because scheduler sends regions
within a block starting from the last upward. We could use a natural
order of upward tracker to seamlessly change between regions of the same
block, since live reg set of a previous tracked region would become a
live-out of the next region. That however requires fixing upward tracker
to properly account defs and uses of the same instruction as both are
contributing to the current pressure. When we converge on the produced
pressure we should be able to switch between them back and forth. In
addition, backward tracker is less expensive as it uses LIS in recede
less often than forward uses it in advance.

At the moment the worst known case compilation time has improved from 26
minutes to 8.5.

Differential Revision: https://reviews.llvm.org/D33117

llvm-svn: 303184
2017-05-16 16:11:26 +00:00
Lama Saba 52e892577d [X86] Replace slow LEA instructions in X86
According to Intel's Optimization Reference Manual for SNB+:
  " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must
    dispatch via port 1:
  - LEA that has all three source operands: base, index, and offset
  - LEA that uses base and index registers where the base is EBP, RBP,or R13
  - LEA that uses RIP relative addressing mode
  - LEA that uses 16-bit addressing mode "
  This patch currently handles the first 2 cases only.
 
Differential Revision: https://reviews.llvm.org/D32277

llvm-svn: 303183
2017-05-16 16:01:36 +00:00
Stanislav Mekhanoshin 464cecf81e [AMDGPU] Turn register pressure estimation into forward tracker
This factors register pressure estimation mechanism from the
GCNSchedStrategy into the forward tracker to unify interface
with other strategies and expose it to other interested phases.

Differential Revision: https://reviews.llvm.org/D33105

llvm-svn: 303179
2017-05-16 15:43:52 +00:00
Chad Rosier 8b12a03215 Fix an improperly placed curly bracket. NFC.
llvm-svn: 303165
2017-05-16 12:43:23 +00:00
NAKAMURA Takumi 994a43d27a AMDGPUCodeGen: Fix warnings in r303111. [-Wunused-variable]
llvm-svn: 303137
2017-05-16 04:01:23 +00:00
Peter Collingbourne 6f0ecca3b5 IR: Give function GlobalValue::getRealLinkageName() a less misleading name: dropLLVMManglingEscape().
This function gives the wrong answer on some non-ELF platforms in some
cases. The function that does the right thing lives in Mangler.h. To try to
discourage people from using this function, give it a different name.

Differential Revision: https://reviews.llvm.org/D33162

llvm-svn: 303134
2017-05-16 00:39:01 +00:00
Davide Italiano 60d36c7506 [AMDGPU] Kill now unused phiInfoElementGetDebugLoc(). NFCI.
llvm-svn: 303122
2017-05-15 22:10:15 +00:00
Tim Northover 203c6f055d AArch64: use linker-private symbols for globals in MachO.
We don't use section-relative relocations on AArch64, so all symbols must be at
least visible to the linker (i.e. properly global or l_whatever, but not
L_whatever).

llvm-svn: 303118
2017-05-15 21:51:38 +00:00
Adam Nemet e29686e5c1 [SLP] Enable 64-bit wide vectorization on AArch64
ARM Neon has native support for half-sized vector registers (64 bits).  This
is beneficial for example for 2D and 3D graphics.  This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.

*** Performance Analysis

This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.

The results are with -O3 and PGO.  A negative percentage is an improvement.
The testsuite was run with a sample size of 4.

** SPEC

* CFP2006/482.sphinx3  -3.34%

A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.

* CFP2000/177.mesa     -3.34%
* CINT2000/256.bzip2   +6.97%

My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%.  There are also other problems with the codegen in
this loop so there is further room for improvement.

** LLVM testsuite

* SingleSource/Benchmarks/Misc/ReedSolomon               -10.75%

There are multiple small SLP vectorizations outside the hot code.  It's a bit
surprising that it adds up to 10%.  Some of this may be code-layout noise.

* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%

The opt-viewer screenshot can be seen at F3218284.  We start at a colder store
but the tree leads us into the hottest loop.

* MultiSource/Applications/lambda-0.1.3/lambda            -2.68%
* MultiSource/Benchmarks/Bullet/bullet                    -2.18%

This is using 3D vectors.

* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%

Noise, binary is unchanged.

* MultiSource/Benchmarks/Ptrdist/anagram/anagram          +4.90%

There is an additional SLP in the cold code.  The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.

* MultiSource/Applications/aha/aha                        +1.63%
* MultiSource/Applications/JM/lencod/lencod               +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark         +1.15%

Differential Revision: https://reviews.llvm.org/D31965

llvm-svn: 303116
2017-05-15 21:15:01 +00:00
Hans Wennborg bd6e9e77a7 Revert r302678 "[AArch64] Enable use of reduction intrinsics."
This caused PR33053.

Original commit message:

> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247

llvm-svn: 303115
2017-05-15 20:59:32 +00:00
Jan Sjodin a06bfe054e Re-submit AMDGPUMachineCFGStructurizer.
Differential Revision: https://reviews.llvm.org/D23209

llvm-svn: 303111
2017-05-15 20:18:37 +00:00
Tim Northover 8b96c7e9b5 AArch64: diagnose unrecognized features in .cpu directive.
We were silently ignoring any features we couldn't match up, which led to
errors in an inline asm block missing the conventional "\n\t".

llvm-svn: 303108
2017-05-15 19:42:15 +00:00
Geoff Berry e369653bf3 [AArch64][Falkor] Fix sched details for FMOV
llvm-svn: 303099
2017-05-15 18:50:22 +00:00
Jan Sjodin 0e289822fa Revert 303091.
llvm-svn: 303098
2017-05-15 18:39:47 +00:00
Jan Sjodin e9d2ddc9dd Add AMDGPUMachineCFGStructurizer.
Differential Revision: https://reviews.llvm.org/D23209

llvm-svn: 303091
2017-05-15 18:13:56 +00:00
Simon Pilgrim 55ff57861a [NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)
Follow up to D33147

NVPTXTargetLowering::LowerCall was trusting the default argument values.

Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.

Differential Revision: https://reviews.llvm.org/D33189

llvm-svn: 303082
2017-05-15 17:17:44 +00:00
Florian Hahn af91e7e6d2 [AArch64] Enable FeatureFuseAES on Cortex-A72.
This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.

llvm-svn: 303073
2017-05-15 15:15:22 +00:00
Dmitry Preobrazhensky 167f8b69e3 [AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64
See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D33123

llvm-svn: 303070
2017-05-15 14:28:23 +00:00
Dmitry Preobrazhensky 03852a9dca [AMDGPU][MC] Removed V_MQSAD_U16_U8
This instruction does not really exist

See Bug 33018: https://bugs.llvm.org//show_bug.cgi?id=33018

Reviewers: vpykhtin, artem.tamazov

Differential Revision: https://reviews.llvm.org/D33126

llvm-svn: 303055
2017-05-15 12:37:03 +00:00
John Brawn 9486becf09 [ARM] Mark LEApcrel instructions as isAsCheapAsAMove
Doing this means that if an LEApcrel is used in two places we will rematerialize
instead of generating two MOVs. This is particularly useful for printfs using
the same format string, where we want to generate an address into a register
that's going to get corrupted by the call.

Differential Revision: https://reviews.llvm.org/D32858

llvm-svn: 303054
2017-05-15 11:57:54 +00:00
John Brawn 43132c46a6 [ARM] Mark LEApcrel as not having side effects
Doing this lets us hoist it out of loops, and I've also marked it as
rematerializable the same as the thumb1 and thumb2 counterparts.

It looks like it being marked as such was just a mistake, as the commit that
made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the
LEApcrelJT instructions were marked as having side-effects, so it looks like
the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was
accidentally marked as such also.

Differential Revision: https://reviews.llvm.org/D32857

llvm-svn: 303053
2017-05-15 11:50:21 +00:00
Simon Pilgrim f8389656e3 [NVPTX] Don't rely on default arguments to SelectionDAG::getMemIntrinsicNode. NFC.
NFC followup to D33147, this explicitly sets all the arguments (instead of relying on the defaults) to SelectionDAG::getMemIntrinsicNode to help identify -verify-machineinstrs issues.

llvm-svn: 303047
2017-05-15 10:47:48 +00:00
Zvi Rackover e6b278bc65 [X86] Utilize SelectionDAG::getSelect(). NFC.
Replace SelectionDAG::getNode(ISD::SELECT, ...)
and SelectionDAG::getNode(ISD::VSELECT, ...)
with SelectionDAG::getSelect(...)
Saves a few lines of code and in some cases saves the need to explicitly
check the type of the desired node.

llvm-svn: 303024
2017-05-14 21:30:38 +00:00
Simon Pilgrim d0ef9d8e93 [X86][AVX1] Account for cost of extract/insert of 256-bit shifts
llvm-svn: 303023
2017-05-14 20:52:11 +00:00
Simon Pilgrim f96b4ab92d [X86][AVX2] Fix costs for v4i64 ashr by splat
llvm-svn: 303022
2017-05-14 20:25:42 +00:00
Simon Pilgrim de4467b182 [X86][AVX1] Account for cost of extract/insert of 256-bit shifts by splat
llvm-svn: 303021
2017-05-14 20:02:34 +00:00
Craig Topper ceea1a76a1 [X86] Remove unused value from IntrinsicType enum. NFC
llvm-svn: 303018
2017-05-14 19:38:06 +00:00
Simon Pilgrim d3f0d03cc5 [X86][AVX1] Account for cost of extract/insert of 256-bit SDIV/UDIV by mul sequences
llvm-svn: 303017
2017-05-14 18:52:15 +00:00
Simon Pilgrim 5bef9c627e [X86][XOP] XOP's general v16i8 shifts will be used instead of v8i16 shift + mask.
Tweak cost model to match what lowering actually does.

llvm-svn: 303013
2017-05-14 17:59:46 +00:00
Simon Pilgrim aa8dffb69b [X86][SSE] Account for cost of extract/insert of v32i8 vector shifts
llvm-svn: 303012
2017-05-14 17:36:07 +00:00
Simon Pilgrim 4599eaa09a [X86][XOP] Account for cost of extract/insert of 256-bit vector shifts
llvm-svn: 303010
2017-05-14 13:38:53 +00:00
Simon Pilgrim f3ee9c6997 [X86][AVX] Allow 32-bit targets to peek through subvectors to extract constant splats for vXi64 shifts.
llvm-svn: 303009
2017-05-14 11:46:26 +00:00
Simon Pilgrim ef46c2762a [x86, SSE] AVX1 PR28129 (256-bit all-ones rematerialization)
Further perf tests on Jaguar indicate that:

vxorps  %ymm0, %ymm0, %ymm0
vcmpps  $15, %ymm0, %ymm0, %ymm0

is consistently faster (by about 9%) than:

vpcmpeqd  %xmm0, %xmm0, %xmm0
vinsertf128  $1, %xmm0, %ymm0, %ymm0

Testing equivalent code on a SandyBridge (E5-2640) puts it slightly (~3%) faster as well.

Committed on behalf of @dtemirbulatov

Differential Revision: https://reviews.llvm.org/D32416

llvm-svn: 302989
2017-05-13 13:42:35 +00:00
Dylan McKay 0c4debc123 [AVR] When lowering Select8/Select16, put newly generated MBBs in the same spot
Contributed by Dr. Gergő Érdi.

Fixes a bug.

Raised from (https://github.com/avr-rust/rust/issues/49).

llvm-svn: 302973
2017-05-13 00:22:34 +00:00
Dylan McKay 0c707da6ac [AVR] Remove an unused variable
llvm-svn: 302970
2017-05-13 00:00:26 +00:00
Changpeng Fang 161e8c39af AMDGPU/SI: Don't promote to vector if the load/store is volatile.
Summary:
  We should not change volatile loads/stores in promoting alloca to vector.

Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D33107

llvm-svn: 302943
2017-05-12 20:31:12 +00:00
Simon Pilgrim a1978aaefd [NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146)
This fixes 47 of the 75 NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.

Differential Revision: https://reviews.llvm.org/D33147

llvm-svn: 302942
2017-05-12 19:56:43 +00:00
Tim Shen 10c64e6aea [PPC] Move the combine "a << (b % (sizeof(a) * 8)) -> (PPCshl a, b)" to the backend. NFC.
Summary:
Eli pointed out that it's unsafe to combine the shifts to ISD::SHL etc.,
because those are not defined for b > sizeof(a) * 8, even after some of
the combiners run.

However, PPCISD::SHL defines that behavior (as the instructions themselves).
Move the combination to the backend.

The tests in shift_mask.ll still pass.

Reviewers: echristo, hfinkel, efriedma, iteratee

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D33076

llvm-svn: 302937
2017-05-12 19:25:37 +00:00
Geoff Berry ddbbf6416c [AArch64][Falkor] Refine modeling of multiply accumulate forwarding.
llvm-svn: 302933
2017-05-12 18:57:10 +00:00
Simon Pilgrim b146e61828 Strip trailing whitespace. NFCI.
llvm-svn: 302927
2017-05-12 17:42:36 +00:00
Craig Topper 8df66c602a [KnownBits] Add bit counting methods to KnownBits struct and use them where possible
This patch adds min/max population count, leading/trailing zero/one bit counting methods.

The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.

Differential Revision: https://reviews.llvm.org/D32931

llvm-svn: 302925
2017-05-12 17:20:30 +00:00
Tom Stellard a0d67c748a AMDGPU/GlobalISel: Mark 32-bit integer constants as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33115

llvm-svn: 302919
2017-05-12 16:46:46 +00:00
James Y Knight d4e1b00e7c [SPARC] Support 'f' and 'e' inline asm constraints.
Based on patch by Patrick Boettcher and Chris Dewhurst.

Differential Revision: https://reviews.llvm.org/D29116

llvm-svn: 302911
2017-05-12 15:59:10 +00:00
Simon Pilgrim 7f03231cc6 Use SDValue::getOperand() helper. NFCI.
llvm-svn: 302894
2017-05-12 13:08:45 +00:00
Leslie Zhai a1149e01d2 [AVR] Migrate to new StructType::get owing to Supress all uses of LLVM_END_WITH_NULL
Reviewers: dylanmckay, jroelofs, RKSimon, serge-sans-paille

Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D33119

llvm-svn: 302885
2017-05-12 09:08:03 +00:00
Reid Kleckner 43bbeb4c9f Issue diagnostics when returning FP values on x86_64 without SSE1/2
Avoid using report_fatal_error, because it will ask the user to file a
bug. If the user attempts to disable SSE on x86_64 and them use floating
point, that's a bug in their code, not a bug in the compiler.

This is just a start. There are other ways to crash the backend in this
configuration, but they should be updated to follow this pattern.

Differential Revision: https://reviews.llvm.org/D27522

llvm-svn: 302835
2017-05-11 22:43:02 +00:00
Guozhi Wei 22e7da9597 [PPC] Change the register constraint of the first source operand of instruction mtvsrdd to g8rc_nox0
According to Power ISA V3.0 document, the first source operand of mtvsrdd is constant 0 if r0 is specified. So the corresponding register constraint should be g8rc_nox0.

This bug caused wrong output generated by 401.bzip2 when -mcpu=power9 and fdo are specified.

Differential Revision: https://reviews.llvm.org/D32880

llvm-svn: 302834
2017-05-11 22:17:35 +00:00