Commit Graph

172409 Commits

Author SHA1 Message Date
Tim Northover 848bb3ced5 ARM64: implement cunning optimisation from AArch64
A vector extract followed by a dup can become a single instruction even if the
types don't match. AArch64 handled this in ISelLowering, but a few reasonably
simple patterns can take care of it in TableGen, so that's where I've put it.

llvm-svn: 206573
2014-04-18 09:31:20 +00:00
Tim Northover 5ec51a8981 ARM64: spot a vector_shuffle that maps to INS and expand.
Tests will be coming very shortly when all the optimisations needed to
support AArch64's neon-copy.ll file are committed.

llvm-svn: 206572
2014-04-18 09:31:15 +00:00
Tim Northover 46d98ea8de ARM64: nick some AArch64 patterns for extract/insert -> INS.
Tests will be committed shortly when all optimisations needed to
support AArch64's neon-copy.ll file are supported.

llvm-svn: 206571
2014-04-18 09:31:11 +00:00
Tim Northover 8b2fa3dfef AArch64/ARM64: emit all vector FP comparisons as such.
ARM64 was scalarizing some vector comparisons which don't quite map to
AArch64's compare and mask instructions. AArch64's approach of sacrificing a
little efficiency to emulate them with the limited set available was better, so
I ported it across.

More "inspired by" than copy/paste since the backend's internal expectations
were a bit different, but the tests were invaluable.

llvm-svn: 206570
2014-04-18 09:31:07 +00:00
Tim Northover 0a44e66bb8 AArch64/ARM64: port BSL logic from AArch64 & enable test.
I enhanced it a little in the process. The decision shouldn't really be beased
on whether a BUILD_VECTOR is a splat: any set of constants will do the job
provided they're related in the correct way.

Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's
best to check for all 4 possibilities rather than assuming it'll be the RHS.

llvm-svn: 206569
2014-04-18 09:31:01 +00:00
Tim Northover 547a4ae6fa AArch64/ARM64: copy byval implementation from AArch64.
It's not actually used to handle C or C++ ABI rules on ARM64, but could well be
emitted by other language front-ends, so it's as well to have a sensible
implementation.

llvm-svn: 206568
2014-04-18 09:30:52 +00:00
Jiangning Liu 300a6b84f2 Add missing config file for newly added test case introduced by r206563.
llvm-svn: 206567
2014-04-18 09:05:50 +00:00
Yaron Keren d0751ce197 Updated test with register names following r206565.
llvm-svn: 206566
2014-04-18 08:50:09 +00:00
Yaron Keren 8ca45e0c05 Patch by Ray Donnelly.
Emit WIN64 SEH registers by name instead of just number.

llvm-svn: 206565
2014-04-18 08:03:38 +00:00
Kostya Serebryany 22e8810838 [asan] one more workaround for PR17409: don't do BB-level coverage instrumentation if there are more than N (=1500) basic blocks. This makes ASanCoverage work on libjpeg_turbo/jchuff.c used by Chrome, which has 1824 BBs
llvm-svn: 206564
2014-04-18 08:02:42 +00:00
Jiangning Liu ad874fca28 This commit allows vectorized loops to be unrolled by a factor of 2 for AArch64.
A new test case is also added for ARM64.

Patched by Z.Zheng

llvm-svn: 206563
2014-04-18 07:57:54 +00:00
Matt Arsenault 209a7b92b5 R600: Minor cleanups.
Fix indentation, better line wrapping, unused includes.

llvm-svn: 206562
2014-04-18 07:40:20 +00:00
Lang Hames bc876017c2 [ExecutionEngine] Allow JIT clients to enable/disable module verification.
Previously module verification was always enabled, with no way to turn it off.
As of this commit, module verification is on by default in Debug builds, and off
by default in release builds. The default behaviour can be overridden by calling
setVerifyModules(bool) on the JIT instance (this works for both the old JIT, and
MCJIT).

<rdar://problem/16150008>

llvm-svn: 206561
2014-04-18 06:48:23 +00:00
Rui Ueyama d48665b1fd [ELF] Fix GNU_RELRO section name.
llvm-svn: 206560
2014-04-18 06:01:43 +00:00
Jiangning Liu 40d81e10c5 This is one of the optimizations ported from ARM64 to AArch64 to address the performance gap between these two back ends. The test case newly added for AArch64 already exists in ARM64.
Patched by Z.Zheng

llvm-svn: 206559
2014-04-18 05:58:09 +00:00
Matt Arsenault 78b8670aac R600/SI: Try to use scalar BFE.
Use scalar BFE with constant shift and offset when possible.
This is complicated by the fact that the scalar version packs
the two operands of the vector version into one.

llvm-svn: 206558
2014-04-18 05:19:26 +00:00
Jiangning Liu e56c30614f This commit enables unaligned memory accesses of vector types on AArch64 back end. This should boost vectorized code performance.
Patched by Z. Zheng

llvm-svn: 206557
2014-04-18 03:58:38 +00:00
Duncan P. N. Exon Smith e576167df8 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commits r206548, r206549 and r206549.

There are some unit tests failing that aren't failing locally [1], so
reverting until I have time to investigate.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816

llvm-svn: 206556
2014-04-18 02:17:43 +00:00
Justin Bogner e877eda50d OnDiskHashTable: Provide iterator_range for keys and data
llvm-svn: 206555
2014-04-18 02:10:26 +00:00
Duncan P. N. Exon Smith 878cf2b804 blockfreq: Really fix r206548 (and r206549)
Turns out this code is dead.

llvm-svn: 206554
2014-04-18 02:10:09 +00:00
Jim Grosbach 5198f3e988 c++11: Tidy up tblgen w/ range loops.
IntrInfoEmitter cleanup.

llvm-svn: 206553
2014-04-18 02:09:07 +00:00
Jim Grosbach af81445411 iterator access to scheduling classes
llvm-svn: 206552
2014-04-18 02:09:04 +00:00
Jim Grosbach 56fd888e10 iterator_range accessor for CodeGenTarget instruction list.
llvm-svn: 206551
2014-04-18 02:09:02 +00:00
Jim Grosbach 0e28a3554b iterator based accessors for CodeGenInstruction operand list.
llvm-svn: 206550
2014-04-18 02:08:58 +00:00
Duncan P. N. Exon Smith c7abca54cf blockfreq: Fixing MSVC after r206548?
llvm-svn: 206549
2014-04-18 02:06:24 +00:00
Duncan P. N. Exon Smith 12e68e1733 blockfreq: Rewrite BlockFrequencyInfoImpl
Rewrite the shared implementation of BlockFrequencyInfo and
MachineBlockFrequencyInfo entirely.

The old implementation had a fundamental flaw:  precision losses from
nested loops (or very wide branches) compounded past loop exits (and
convergence points).

The @nested_loops testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating.  This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue).  The old analysis gives non-sensical results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    ---- Block Freqs ----
     entry = 1.0
     for.cond1.preheader = 1.00103
     for.cond4.preheader = 5.5222
     for.body6 = 18095.19995
     for.inc8 = 4.52264
     for.inc11 = 0.00109
     for.end13 = 0.0

The new analysis gives correct results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    block-frequency-info: nested_loops
     - entry: float = 1.0, int = 8
     - for.cond1.preheader: float = 4001.0, int = 32007
     - for.cond4.preheader: float = 16008001.0, int = 128064007
     - for.body6: float = 64048012001.0, int = 512384096007
     - for.inc8: float = 16008001.0, int = 128064007
     - for.inc11: float = 4001.0, int = 32007
     - for.end13: float = 1.0, int = 8

Most importantly, the frequency leaving each loop matches the frequency
entering it.

The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss.  I have
unit tests for these types out of tree, but it was decided in the review
to make the classes private to BlockFrequencyInfoImpl, and try to shrink
them (or remove them entirely) in follow-up commits.

The new algorithm should generally have a complexity advantage over the
old.  The previous algorithm was quadratic in the worst case.  The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear without it.

The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution.  Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes.  Mass is then distributed through the function, which is
now a DAG.  Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.

There are some remaining flaws.

  - Irreducible control flow isn't modelled correctly.  LoopInfo and
    MachineLoopInfo ignore irreducible edges, so this algorithm will
    fail to scale accordingly.  There's a note in the class
    documentation about how to get closer.  See also the comments in
    test/Analysis/BlockFrequencyInfo/irreducible.ll.

  - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
    the 64-bit integer precision used downstream.

  - The "bias" calculation proposed on llvmdev is *not* incorporated
    here.  This will be added in a follow-up commit, once comments from
    this review have been handled.

llvm-svn: 206548
2014-04-18 01:57:45 +00:00
Matt Arsenault 27cc958dff R600/SI: Match sign_extend_inreg to s_sext_i32_i8 and s_sext_i32_i16
llvm-svn: 206547
2014-04-18 01:53:18 +00:00
Reid Kleckner 7c1db7676e Fix a -Wmicrosoft warning on an unrepresentable enum
0x80000000 isn't representable as an int, which is the default enum
type.

llvm-svn: 206545
2014-04-18 01:21:55 +00:00
Paul Robinson f4e6a5436d Fix example for VS2012.
llvm-svn: 206544
2014-04-18 01:20:08 +00:00
Duncan P. N. Exon Smith 85e349fd2c BackendUtil: Pass through -mdisable-tail-calls
The frontend option -fno-optimize-sibling-calls resolves to -cc1's
-mdisable-tail-calls, which is passed to the TargetMachine in the
backend.  PassManagerBuilder was adding the -tailcallelim pass anyway.

Use a new DisableTailCalls option in PassManagerBuilder to disable tail
calls harder.

Requires the matching commit in LLVM that adds DisableTailCalls.

<rdar://problem/16050591>

llvm-svn: 206543
2014-04-18 01:05:25 +00:00
Duncan P. N. Exon Smith 49f3ec80c2 PMBuilder: Expose an option to disable tail calls
Adds API to allow frontends to disable tail calls in PassManagerBuilder.

<rdar://problem/16050591>

llvm-svn: 206542
2014-04-18 01:05:15 +00:00
Tom Stellard 1aa6cb4d88 R600/SI: Use SReg_64 instead of VSrc_64 when selecting BUILD_PAIR
llvm-svn: 206541
2014-04-18 00:36:21 +00:00
Jim Grosbach 6bfe18a365 [ARM64,C++11] Range'ify another loop.
llvm-svn: 206539
2014-04-17 23:41:57 +00:00
Rui Ueyama 3511b02a85 [ELF] Fix typo that caused a test to fail on FreeBSD.
llvm-svn: 206538
2014-04-17 23:38:01 +00:00
Jason Molenda 3e46e20746 Add FreeBSDSignals.cpp to project file.
llvm-svn: 206524
2014-04-17 23:29:19 +00:00
Tobias Grosser 50fd7010d8 Ensure a scalar pointer when issuing a vector load
Even tough we may want to generate a vector load, the address from which to load
still is a scalar. Make sure even if previous address computations may have been
vectorized, that the addresses are also available as scalars.

This fixes http://llvm.org/PR19469

Reported-by:  Jeremy Huddleston Sequoia <jeremyhu@apple.com>
llvm-svn: 206510
2014-04-17 23:13:49 +00:00
Justin Bogner 534f14abe7 test: Use llvm-profdata merge in Profile tests
In preparation for using a binary format for instrumentation based
profiling, explicitly treat the test inputs as text and transform them
before running. This will allow us to leave the checked in files in
human readable format once the instrumentation format is binary.

No functional change.

llvm-svn: 206509
2014-04-17 22:49:06 +00:00
Reid Kleckner fd385407fa MS ABI: Don't append to vbtables that we shouldn't extend
This was probably a benign bug, since nobody would look at the vbtable
slots that we were filling in.

llvm-svn: 206508
2014-04-17 22:47:52 +00:00
Diego Novillo 0915c047c2 Fix bug 19437 - Only add discriminators for DWARF 4 and above.
Summary:
This prevents the discriminator generation pass from triggering if
the DWARF version being used in the module is prior to 4.

Reviewers: echristo, dblaikie

CC: llvm-commits

Differential Revision: http://reviews.llvm.org/D3413

llvm-svn: 206507
2014-04-17 22:33:50 +00:00
Nuno Lopes 9ced19abe8 remove some dead code
lib/Analysis/IPA/InlineCost.cpp         |   18 ------------------
 lib/Analysis/RegionPass.cpp             |    1 -
 lib/Analysis/TypeBasedAliasAnalysis.cpp |    1 -
 lib/Transforms/Scalar/LoopUnswitch.cpp  |   21 ---------------------
 lib/Transforms/Utils/LCSSA.cpp          |    2 --
 lib/Transforms/Utils/LoopSimplify.cpp   |    6 ------
 utils/TableGen/AsmWriterEmitter.cpp     |   13 -------------
 utils/TableGen/DFAPacketizerEmitter.cpp |    7 -------
 utils/TableGen/IntrinsicEmitter.cpp     |    2 --
 9 files changed, 71 deletions(-)

llvm-svn: 206506
2014-04-17 22:26:44 +00:00
Reed Kotler 720c5ca4ea Start pushing changes for Mips Fast-Isel
llvm-svn: 206505
2014-04-17 22:15:34 +00:00
Timur Iskhodzhanov ed11ae3d21 Follow-up to r206457 -- fix static adjustments for some subtle virtual inheritance cases
Reviewed at http://reviews.llvm.org/D3410

llvm-svn: 206504
2014-04-17 22:01:48 +00:00
Aaron Ballman e80bfcd048 Making some public members into private members. This also introduces a bit more const-correctness, and now uses some range-based for loops. No functional changes intended.
llvm-svn: 206503
2014-04-17 21:44:08 +00:00
Louis Gerbarg e43a24f444 Make test/CodeGen/ARM64/vector-insertion.ll explicitly select neon syntax
Change the command line vector-insertion.ll to explicitly set the neon syntax
to apple so that buildbots that default to other syntaxes won't fail.

llvm-svn: 206502
2014-04-17 21:32:41 +00:00
Tom Stellard aeeea8a864 R600: Add comment clariying use of sext for result of MUL_U24
llvm-svn: 206501
2014-04-17 21:00:13 +00:00
Tom Stellard 868fd92e54 R600/SI: Stop using i128 as the resource descriptor type
Having i128 as a legal type complicates the legalization phase.  v4i32
is already a legal type, so we will use that instead.

This fixes several piglit tests.

llvm-svn: 206500
2014-04-17 21:00:11 +00:00
Tom Stellard 334b29c7f6 R600/SI: Change default register class for i32 to SReg_32
SIFixSGPRCopies is smart enough to handle this now.

llvm-svn: 206499
2014-04-17 21:00:09 +00:00
Tom Stellard 4f3b04de21 R600/SI: Teach SIInstrInfo::moveToVALU() how to handle PHI instructions
llvm-svn: 206498
2014-04-17 21:00:07 +00:00
Tom Stellard e1a244502c R600/SI: Legalize operands after changing dst reg in FixSGPRCopies
Otherwise we may not legalize some illegal REG_SEQUENCE instructions.

llvm-svn: 206497
2014-04-17 21:00:01 +00:00
Louis Gerbarg 153e695ee2 Improve ARM64 vector creation
This patch improves the performance of vector creation in caseiswhere where
several of the lanes in the vector are a constant floating point value. It
also includes new patterns to fold together some of the instructions when the
value is 0.0f. Test cases included.

rdar://16349427

llvm-svn: 206496
2014-04-17 20:51:50 +00:00