Commit Graph

23957 Commits

Author SHA1 Message Date
Duncan P. N. Exon Smith 10be9a8868 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206707, reapplying r206704.  The preceding commit
to CalcSpillWeights should have sorted out the failing buildbots.

<rdar://problem/14292693>

llvm-svn: 206766
2014-04-21 17:57:07 +00:00
Eli Bendersky 7cd70df708 Fix the test: DCE optimized away everything.
Use volatile store to protect the generated PTX from DCE.

Patch by Jingyue Wu.

llvm-svn: 206763
2014-04-21 17:23:12 +00:00
Michael Zolotukhin f2ba994bf6 Reapply r206732. This time without optimization of branches.
llvm-svn: 206749
2014-04-21 12:01:33 +00:00
Kostya Serebryany 49b88f54da [asan] add llvm-ish test for memset/etc instrumentation
llvm-svn: 206747
2014-04-21 11:57:43 +00:00
Chandler Carruth 572e3407c3 [PM] Add a new-PM-style CGSCC pass manager using the newly added
LazyCallGraph analysis framework. Wire it up all the way through the opt
driver and add some very basic testing that we can build pass pipelines
including these components. Still a lot more to do in terms of testing
that all of this works, but the basic pieces are here.

There is a *lot* of boiler plate here. It's something I'm going to
actively look at reducing, but I don't have any immediate ideas that
don't end up making the code terribly complex in order to fold away the
boilerplate. Until I figure out something to minimize the boilerplate,
almost all of this is based on the code for the existing pass managers,
copied and heavily adjusted to suit the needs of the CGSCC pass
management layer.

The actual CG management still has a bunch of FIXMEs in it. Notably, we
don't do *any* updating of the CG as it is potentially invalidated.
I wanted to get this in place to motivate the new analysis, and add
update APIs to the analysis and the pass management layers in concert to
make sure that the *right* APIs are present.

llvm-svn: 206745
2014-04-21 11:12:00 +00:00
NAKAMURA Takumi 54d9f88bed llvm/test/CodeGen/X86/bmi.ll: Relax expressions for targeting win32.
llvm-svn: 206743
2014-04-21 11:01:46 +00:00
Lang Hames 5aa6ee80b6 [X86] ISEL (and X, <constant mask>) to BZHI when BMI2 is available.
Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)),
was already supported, but we were missing the constant-mask case. This patch
fixes that.

<rdar://problem/15480077>

llvm-svn: 206738
2014-04-21 08:18:53 +00:00
Chandler Carruth a2533a7bef Revert r206732 which is causing llc to crash on most of the build bots.
Original commit message:
  Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN,
  safe.srem.iN, safe.urem.iN (iN = i8, i61, i32, or i64).

llvm-svn: 206735
2014-04-21 07:11:15 +00:00
Michael Zolotukhin 137a84616c Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN, safe.srem.iN,
safe.urem.iN (iN = i8, i16, i32, or i64).

llvm-svn: 206732
2014-04-21 05:33:09 +00:00
Duncan P. N. Exon Smith e63327e967 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206704, as expected.

llvm-svn: 206707
2014-04-19 22:46:00 +00:00
Duncan P. N. Exon Smith 6611a377eb Revert "blockfreq: Temporarily turn on -debug-only=block-freq"
This reverts commit r206705, as planned.

llvm-svn: 206706
2014-04-19 22:45:44 +00:00
Duncan P. N. Exon Smith bffee5bb90 blockfreq: Temporarily turn on -debug-only=block-freq
These tests fail after my BlockFrequencyInfo rewrite on two buildbots
[1][2].  I can't reproduce it locally, so I'm temporarily turning on
-debug-only=block-freq so I can find the problem.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1860
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18477

llvm-svn: 206705
2014-04-19 22:40:56 +00:00
Duncan P. N. Exon Smith 875ddfac75 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite.

I've done a careful audit, added some asserts, and fixed a couple of
bugs (unfortunately, they were in unlikely code paths).  There's a small
chance that this will appease the failing bots [1][2].  (If so, great!)

If not, I have a follow-up commit ready that will temporarily add
-debug-only=block-freq to the two failing tests, allowing me to compare
the code path between what the failing bots and what my machines (and
the rest of the bots) are doing.  Once I've triggered those builds, I'll
revert both commits so the bots go green again.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445

<rdar://problem/14292693>

llvm-svn: 206704
2014-04-19 22:34:26 +00:00
Yaron Keren d7ba46b287 Patch by Vadim Chugunov
Win64 stack unwinder gets confused when execution flow "falls through" after
a call to 'noreturn' function. This fixes the "missing epilogue" problem by 
emitting a trap instruction for IR 'unreachable' on x86_x64-pc-windows.

A secondary use for it would be for anyone wanting to make double-sure that
'noreturn' functions, indeed, do not return.

llvm-svn: 206684
2014-04-19 13:47:43 +00:00
Yaron Keren 421304d18c Patch by Ray Donnelly to print register names instead of numbers.
http://reviews.llvm.org/D3422

llvm-svn: 206683
2014-04-19 05:40:09 +00:00
Duncan P. N. Exon Smith 76b813619a Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206666, as planned.

Still stumped on why the bots are failing.  Sanitizer bots haven't
turned anything up.  If anyone can help me debug either of the failures
(referenced in r206666) I'll owe them a beer.  (In the meantime, I'll be
auditing my patch for undefined behaviour.)

llvm-svn: 206677
2014-04-19 00:42:46 +00:00
Justin Bogner fabf18329d llvm-profdata: Avoid writing to /dev/null in tests
We fseek on our output file in llvm-profdata, which errors on some
systems. Avoid getting into the situation by writing to /dev/null

llvm-svn: 206670
2014-04-18 23:25:35 +00:00
Kevin Enderby b7e51f6af5 Change the ARM assembler to require a :lower16: or :upper16 on non-constant
expressions for mov instructions instead of silently truncating by default.

For the ARM assembler, we want to avoid misleadingly allowing something
like "mov r0, <symbol>" especially when we turn it into a movw and the
expression <symbol> does not have a :lower16: or :upper16" as part of the
expression.  We don't want the behavior of silently truncating, which can be
unexpected and lead to bugs that are difficult to find since this is an easy
mistake to make.

This does change the previous behavior of llvm but actually matches an
older gnu assembler that would not allow this but print less useful errors
of like “invalid constant (0x927c0) after fixup” and “unsupported relocation on
symbol foo”.  The error for llvm is "immediate expression for mov requires
:lower16: or :upper16" with correct location information on the operand
as shown in the added test cases.

rdar://12342160

llvm-svn: 206669
2014-04-18 23:06:39 +00:00
Justin Bogner 6bdea86cab test: Add extra run lines to investigate an error on the bots
llvm-svn: 206668
2014-04-18 23:05:31 +00:00
Duncan P. N. Exon Smith b3caf3646f Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206628, reapplying r206622 (and r206626).

Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce
on Darwin, and Chandler can't reproduce on Linux.  Asan and valgrind
don't tell us anything, but we're hoping the msan bot will catch it.

So, I'm applying this again to get more feedback from the bots.  I'll
leave it in long enough to trigger builds in at least the sanitizer
buildbots (it was failing for reasons unrelated to my commit last time
it was in), and hopefully a few others.... and then I expect to revert a
third time.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445

llvm-svn: 206666
2014-04-18 22:30:03 +00:00
Alexey Samsonov 5c39fdfb7b [llvm-symbolizer] Print file/line for a PC even if there is no DIE describing it.
This is important for symbolizing executables with debug info in
unavailable .dwo files. Even if all DIE entries are missing, we can
still symbolize an address: function name can be fetched from symbol table,
and file/line info can be fetched from line table.

llvm-svn: 206665
2014-04-18 22:22:44 +00:00
David Blaikie 76d3a3cd35 Compress debug sections only when beneficial.
Both ZLIB and the debug info compressed section header ("ZLIB" + the
size of the uncompressed data) take some constant overhead so in some
cases the compressed data is actually larger than the uncompressed data.
In these cases, just don't compress or rename the section at all.

llvm-svn: 206659
2014-04-18 21:52:26 +00:00
Justin Bogner b7aa26303b ProfileData: Add support for the indexed instrprof format
This adds support for an indexed instrumentation based profiling
format, which is just a small header and an on disk hash table.  This
format will be used by clang's -fprofile-instr-use= for PGO.

llvm-svn: 206656
2014-04-18 21:48:40 +00:00
David Blaikie c029ab430c Update the fragments of symbols in compressed sections.
While unnamed relocations are already cached in side tables in
ELFObjectWriter::RecordRelocation, symbols still need their fragments
updated to refer to the newly compressed fragment (even if that fragment
isn't big enough to fit the offset). Even though we only create
temporary symbols in debug info sections this comes up in 32 bit builds
where even temporary symbols in mergeable sections (such as debug_str)
have to be emitted as named symbols.

I tried a few other ways to do this but they all didn't work for various
reasons:

1) Canonicalize the MCSymbolData in RecordRelocation, nulling out the
Fragment (so it didn't have to be updated by CompressDebugSection). This
doesn't work because some code relies on symbols having fragments to
indicate that they're defined, I think.

2) Canonicalize the MCSymbolData in RecordRelocation to be "first
fragment + absolute offset" so it would be cheaper to just test and
update the fragment in CompressDebugSections. This doesn't work because
the offset computed in RecordRelocation isn't that of the symbol's
fragment, it's the passed in fragment (I haven't figured out what that
fragment is - perhaps it's the location where the relocation is to be
written). And if the fragment offset has to be computed only for this
use we might as well just do it when we need to, in
CompressDebugSection.

I also added an assert to help catch this a bit more clearly, even
though it is UB. The test case improvements would either assert fail
and/or valgrind vail without the fix, even if they wouldn't necessarily
fail the FileCheck output.

llvm-svn: 206653
2014-04-18 21:24:12 +00:00
Chad Rosier 9149acb053 [ARM64] Ports the Cortex-A53 Machine Model description from AArch64.
Summary:
This port includes the rudimentary latencies that were provided for
the Cortex-A53 Machine Model in the AArch64 backend. It also changes
the SchedAlias for COPY in the Cyclone model to an explicit
WriteRes mapping to avoid conflicts in other subtargets.

Differential Revision: http://reviews.llvm.org/D3427
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 206652
2014-04-18 21:22:04 +00:00
Yaron Keren d0d38bf91e Expanded test for x86-pc-windows-gnu and x86_64-pc-windows-gnu environments.
llvm-svn: 206649
2014-04-18 21:10:11 +00:00
Adam Nemet ee7a3e38c9 [X86] Improve buildFromShuffleMostly for AVX
For a 256-bit BUILD_VECTOR consisting mostly of shuffles of 256-bit vectors,
both the BUILD_VECTOR and its operands may need to be legalized in multiple
steps.  Consider:

(v8f32 (BUILD_VECTOR (extract_vector_elt (v8f32 %vreg0,) Constant<1>),
                     (extract_vector_elt %vreg0, Constant<2>),
                     (extract_vector_elt %vreg0, Constant<3>),
                     (extract_vector_elt %vreg0, Constant<4>),
                     (extract_vector_elt %vreg0, Constant<5>),
                     (extract_vector_elt %vreg0, Constant<6>),
                     (extract_vector_elt %vreg0, Constant<7>),
                     %vreg1))

a. We can't build a 256-bit vector efficiently so, we need to split it into
two 128-bit vecs and combine them with VINSERTX128.

b. Operands like (extract_vector_elt (v8f32 %vreg0), Constant<7>) needs to be
split into a VEXTRACTX128 and a further extract_vector_elt from the
resulting 128-bit vector.

c. The extract_vector_elt from b. is lowered into a shuffle to the first
element and a movss.

Depending on the order in which we legalize the BUILD_VECTOR and its
operands[1], buildFromShuffleMostly may be faced with:

(v4f32 (BUILD_VECTOR (extract_vector_elt
                      (vector_shuffle<1,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     (extract_vector_elt
                      (vector_shuffle<2,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     (extract_vector_elt
                      (vector_shuffle<3,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     %vreg1))

In order to figure out the underlying vector and their identity we need to see
through the shuffles.

[1] Note that the order in which operations and their operands are legalized is
only guaranteed in the first iteration of LegalizeDAG.

Fixes <rdar://problem/16296956>

llvm-svn: 206634
2014-04-18 19:44:16 +00:00
Duncan P. N. Exon Smith 0842ff36a6 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206622 and the MSVC fixup in r206626.

Apparently the remotely failing tests are still failing, despite my
attempt to fix the nondeterminism in r206621.

llvm-svn: 206628
2014-04-18 17:56:08 +00:00
Duncan P. N. Exon Smith f8361d127a Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206556, effectively reapplying commit r206548 and
its fixups in r206549 and r206550.

In an intervening commit I've added target triples to the tests that
were failing remotely [1] (but passing locally).  I'm hoping the mystery
is solved?  I'll revert this again if the tests are still failing
remotely.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816

llvm-svn: 206622
2014-04-18 17:22:25 +00:00
Duncan P. N. Exon Smith c812b5b33a Add some target triples for better determinism
These tests were failing on some buildbots after r206548 (reverted in
r206556), but passing locally.

They were missing target triples, so maybe that's the problem?

llvm-svn: 206621
2014-04-18 17:22:19 +00:00
Tim Northover ff046b64d9 AArch64/ARM64: add more NEON tests.
Mostly no testing this time, since they were just wrangling
target-specific intrinsics.

llvm-svn: 206613
2014-04-18 14:54:53 +00:00
Tim Northover 37d9a9cebf ARM64: disable generation of .loh directives outside MachO.
Part of PR19455.

llvm-svn: 206611
2014-04-18 14:54:46 +00:00
Tim Northover be1d1b6681 ARM64: don't emit .subsections_via_symbols on ELF.
Part of PR19455.

llvm-svn: 206610
2014-04-18 14:54:41 +00:00
Tim Northover be3941cc79 ARM64: add extra NEG pattern.
llvm-svn: 206609
2014-04-18 14:54:35 +00:00
Tim Northover 0dbdfb8522 AArch64/ARM64: port more AArch64 tests to ARM64.
llvm-svn: 206592
2014-04-18 13:16:55 +00:00
Tim Northover e3028832d1 AArch64/ARM64: add non-scalar lowering for more FCVT operations.
llvm-svn: 206591
2014-04-18 13:16:42 +00:00
Tim Northover 01f315a556 AArch64/ARM64: improve spotting of EXT instructions from VECTOR_SHUFFLE.
We couldn't cope if the first mask element was UNDEF before, which
isn't ideal.

llvm-svn: 206588
2014-04-18 12:50:58 +00:00
Evgeniy Stepanov 65120ec8c6 [msan] Add -msan-instrumentation-with-call-threshold.
This flag replaces inline instrumentation for checks and origin stores with
calls into MSan runtime library. This is a workaround for PR17409.

Disabled by default.

llvm-svn: 206585
2014-04-18 12:17:20 +00:00
Chandler Carruth 18eadd9260 [LCG] Add support for building persistent and connected SCCs to the
LazyCallGraph. This is the start of the whole point of this different
abstraction, but it is just the initial bits. Here is a run-down of
what's going on here. I'm planning to incorporate some (or all) of this
into comments going forward, hopefully with better editing and wording.
=]

The crux of the problem with the traditional way of building SCCs is
that they are ephemeral. The new pass manager however really needs the
ability to associate analysis passes and results of analysis passes with
SCCs in order to expose these analysis passes to the SCC passes. Making
this work is kind-of the whole point of the new pass manager. =]

So, when we're building SCCs for the call graph, we actually want to
build persistent nodes that stick around and can be reasoned about
later. We'd also like the ability to walk the SCC graph in more complex
ways than just the traditional postorder traversal of the current CGSCC
walk. That means that in addition to being persistent, the SCCs need to
be connected into a useful graph structure.

However, we still want the SCCs to be formed lazily where possible.

These constraints are quite hard to satisfy with the SCC iterator. Also,
using that would bypass our ability to actually add data to the nodes of
the call graph to facilite implementing the Tarjan walk. So I've
re-implemented things in a more direct and embedded way. This
immediately makes it easy to get the persistence and connectivity
correct, and it also allows leveraging the existing nodes to simplify
the algorithm. I've worked somewhat to make this implementation more
closely follow the traditional paper's nomenclature and strategy,
although it is still a bit obtuse because it isn't recursive, using
an explicit stack and a tail call instead, and it is interruptable,
resuming each time we need another SCC.

The other tricky bit here, and what actually took almost all the time
and trials and errors I spent building this, is exactly *what* graph
structure to build for the SCCs. The naive thing to build is the call
graph in its newly acyclic form. I wrote about 4 versions of this which
did precisely this. Inevitably, when I experimented with them across
various use cases, they became incredibly awkward. It was all
implementable, but it felt like a complete wrong fit. Square peg, round
hole. There were two overriding aspects that pushed me in a different
direction:

1) We want to discover the SCC graph in a postorder fashion. That means
   the root node will be the *last* node we find. Using the call-SCC DAG
   as the graph structure of the SCCs results in an orphaned graph until
   we discover a root.

2) We will eventually want to walk the SCC graph in parallel, exploring
   distinct sub-graphs independently, and synchronizing at merge points.
   This again is not helped by the call-SCC DAG structure.

The structure which, quite surprisingly, ended up being completely
natural to use is the *inverse* of the call-SCC DAG. We add the leaf
SCCs to the graph as "roots", and have edges to the caller SCCs. Once
I switched to building this structure, everything just fell into place
elegantly.

Aside from general cleanups (there are FIXMEs and too few comments
overall) that are still needed, the other missing piece of this is
support for iterating across levels of the SCC graph. These will become
useful for implementing #2, but they aren't an immediate priority.

Once SCCs are in good shape, I'll be working on adding mutation support
for incremental updates and adding the pass manager that this analysis
enables.

llvm-svn: 206581
2014-04-18 10:50:32 +00:00
Benjamin Kramer e6c821ef4c X86: Pattern match scalar loads + vcvtph2ps into just vcvtph2ps.
vcvtph2ps only reads the lower 64 bits of the address passed to the
intrinsic.

llvm-svn: 206579
2014-04-18 10:45:33 +00:00
Chandler Carruth 1911882569 Revert r206565 (and r206566 which updated tests).
This commit was attributed to a different person from the person who
posted the patch to the list, and the person who posted it the list
claimed when they did that they were not the author, but that the author
was yet a third person. I don't know what is going on here, but
reverting until the attribution is clear and the author has explicitly
contributed the patch.

Also, the review hasn't really involved any of the MC maintainers and
that seems questionable too.

llvm-svn: 206576
2014-04-18 09:35:51 +00:00
Tim Northover 66c36b814f AArch64/ARM64: port atomics test to ARM64.
Covers quite a few extra instructions (like any of the max/min ones
which were broken until recently on ARM64).

llvm-svn: 206575
2014-04-18 09:31:31 +00:00
Tim Northover a2c4c71c12 AArch64/ARM64: spot a greater variety of concat_vector operations.
Code mostly copied from AArch64, just tidied up a trifle and plumbed
into the ARM64 way of doing things.

This also enables the AArch64 tests which inspired the previous
untested commits.

llvm-svn: 206574
2014-04-18 09:31:27 +00:00
Tim Northover 848bb3ced5 ARM64: implement cunning optimisation from AArch64
A vector extract followed by a dup can become a single instruction even if the
types don't match. AArch64 handled this in ISelLowering, but a few reasonably
simple patterns can take care of it in TableGen, so that's where I've put it.

llvm-svn: 206573
2014-04-18 09:31:20 +00:00
Tim Northover 8b2fa3dfef AArch64/ARM64: emit all vector FP comparisons as such.
ARM64 was scalarizing some vector comparisons which don't quite map to
AArch64's compare and mask instructions. AArch64's approach of sacrificing a
little efficiency to emulate them with the limited set available was better, so
I ported it across.

More "inspired by" than copy/paste since the backend's internal expectations
were a bit different, but the tests were invaluable.

llvm-svn: 206570
2014-04-18 09:31:07 +00:00
Tim Northover 0a44e66bb8 AArch64/ARM64: port BSL logic from AArch64 & enable test.
I enhanced it a little in the process. The decision shouldn't really be beased
on whether a BUILD_VECTOR is a splat: any set of constants will do the job
provided they're related in the correct way.

Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's
best to check for all 4 possibilities rather than assuming it'll be the RHS.

llvm-svn: 206569
2014-04-18 09:31:01 +00:00
Tim Northover 547a4ae6fa AArch64/ARM64: copy byval implementation from AArch64.
It's not actually used to handle C or C++ ABI rules on ARM64, but could well be
emitted by other language front-ends, so it's as well to have a sensible
implementation.

llvm-svn: 206568
2014-04-18 09:30:52 +00:00
Jiangning Liu 300a6b84f2 Add missing config file for newly added test case introduced by r206563.
llvm-svn: 206567
2014-04-18 09:05:50 +00:00
Yaron Keren d0751ce197 Updated test with register names following r206565.
llvm-svn: 206566
2014-04-18 08:50:09 +00:00
Kostya Serebryany 22e8810838 [asan] one more workaround for PR17409: don't do BB-level coverage instrumentation if there are more than N (=1500) basic blocks. This makes ASanCoverage work on libjpeg_turbo/jchuff.c used by Chrome, which has 1824 BBs
llvm-svn: 206564
2014-04-18 08:02:42 +00:00
Jiangning Liu ad874fca28 This commit allows vectorized loops to be unrolled by a factor of 2 for AArch64.
A new test case is also added for ARM64.

Patched by Z.Zheng

llvm-svn: 206563
2014-04-18 07:57:54 +00:00
Jiangning Liu 40d81e10c5 This is one of the optimizations ported from ARM64 to AArch64 to address the performance gap between these two back ends. The test case newly added for AArch64 already exists in ARM64.
Patched by Z.Zheng

llvm-svn: 206559
2014-04-18 05:58:09 +00:00
Matt Arsenault 78b8670aac R600/SI: Try to use scalar BFE.
Use scalar BFE with constant shift and offset when possible.
This is complicated by the fact that the scalar version packs
the two operands of the vector version into one.

llvm-svn: 206558
2014-04-18 05:19:26 +00:00
Jiangning Liu e56c30614f This commit enables unaligned memory accesses of vector types on AArch64 back end. This should boost vectorized code performance.
Patched by Z. Zheng

llvm-svn: 206557
2014-04-18 03:58:38 +00:00
Duncan P. N. Exon Smith e576167df8 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commits r206548, r206549 and r206549.

There are some unit tests failing that aren't failing locally [1], so
reverting until I have time to investigate.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816

llvm-svn: 206556
2014-04-18 02:17:43 +00:00
Duncan P. N. Exon Smith 12e68e1733 blockfreq: Rewrite BlockFrequencyInfoImpl
Rewrite the shared implementation of BlockFrequencyInfo and
MachineBlockFrequencyInfo entirely.

The old implementation had a fundamental flaw:  precision losses from
nested loops (or very wide branches) compounded past loop exits (and
convergence points).

The @nested_loops testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating.  This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue).  The old analysis gives non-sensical results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    ---- Block Freqs ----
     entry = 1.0
     for.cond1.preheader = 1.00103
     for.cond4.preheader = 5.5222
     for.body6 = 18095.19995
     for.inc8 = 4.52264
     for.inc11 = 0.00109
     for.end13 = 0.0

The new analysis gives correct results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    block-frequency-info: nested_loops
     - entry: float = 1.0, int = 8
     - for.cond1.preheader: float = 4001.0, int = 32007
     - for.cond4.preheader: float = 16008001.0, int = 128064007
     - for.body6: float = 64048012001.0, int = 512384096007
     - for.inc8: float = 16008001.0, int = 128064007
     - for.inc11: float = 4001.0, int = 32007
     - for.end13: float = 1.0, int = 8

Most importantly, the frequency leaving each loop matches the frequency
entering it.

The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss.  I have
unit tests for these types out of tree, but it was decided in the review
to make the classes private to BlockFrequencyInfoImpl, and try to shrink
them (or remove them entirely) in follow-up commits.

The new algorithm should generally have a complexity advantage over the
old.  The previous algorithm was quadratic in the worst case.  The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear without it.

The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution.  Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes.  Mass is then distributed through the function, which is
now a DAG.  Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.

There are some remaining flaws.

  - Irreducible control flow isn't modelled correctly.  LoopInfo and
    MachineLoopInfo ignore irreducible edges, so this algorithm will
    fail to scale accordingly.  There's a note in the class
    documentation about how to get closer.  See also the comments in
    test/Analysis/BlockFrequencyInfo/irreducible.ll.

  - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
    the 64-bit integer precision used downstream.

  - The "bias" calculation proposed on llvmdev is *not* incorporated
    here.  This will be added in a follow-up commit, once comments from
    this review have been handled.

llvm-svn: 206548
2014-04-18 01:57:45 +00:00
Matt Arsenault 27cc958dff R600/SI: Match sign_extend_inreg to s_sext_i32_i8 and s_sext_i32_i16
llvm-svn: 206547
2014-04-18 01:53:18 +00:00
Tom Stellard 1aa6cb4d88 R600/SI: Use SReg_64 instead of VSrc_64 when selecting BUILD_PAIR
llvm-svn: 206541
2014-04-18 00:36:21 +00:00
Diego Novillo 0915c047c2 Fix bug 19437 - Only add discriminators for DWARF 4 and above.
Summary:
This prevents the discriminator generation pass from triggering if
the DWARF version being used in the module is prior to 4.

Reviewers: echristo, dblaikie

CC: llvm-commits

Differential Revision: http://reviews.llvm.org/D3413

llvm-svn: 206507
2014-04-17 22:33:50 +00:00
Louis Gerbarg e43a24f444 Make test/CodeGen/ARM64/vector-insertion.ll explicitly select neon syntax
Change the command line vector-insertion.ll to explicitly set the neon syntax
to apple so that buildbots that default to other syntaxes won't fail.

llvm-svn: 206502
2014-04-17 21:32:41 +00:00
Tom Stellard 868fd92e54 R600/SI: Stop using i128 as the resource descriptor type
Having i128 as a legal type complicates the legalization phase.  v4i32
is already a legal type, so we will use that instead.

This fixes several piglit tests.

llvm-svn: 206500
2014-04-17 21:00:11 +00:00
Louis Gerbarg 153e695ee2 Improve ARM64 vector creation
This patch improves the performance of vector creation in caseiswhere where
several of the lanes in the vector are a constant floating point value. It
also includes new patterns to fold together some of the instructions when the
value is 0.0f. Test cases included.

rdar://16349427

llvm-svn: 206496
2014-04-17 20:51:50 +00:00
Jim Grosbach 0fba6d98fc ARM64: [su]xtw use W regs as inputs, not X regs.
Update the SXT[BHW]/UXTW instruction aliases and the shifted reg addressing
mode handling.

PR19455 and rdar://16650642

llvm-svn: 206495
2014-04-17 20:47:31 +00:00
Tim Northover 11a6082e33 ARM64: switch to IR-based atomic operations.
Goodbye code!

(Game: spot the bug fixed by the change).

llvm-svn: 206490
2014-04-17 20:00:33 +00:00
Tim Northover 0129f298c4 ARM64: add acquire/release versions of the existing atomic intrinsics.
These will be needed to support IR-level lowering of atomic
operations.

llvm-svn: 206489
2014-04-17 20:00:24 +00:00
Gerolf Hoflehner ecebc3730e Reverse 206485.
After some discussions the preferred semantics of
the always_inline attribute is
inline always when the compiler can determine
that it it safe to do so.

llvm-svn: 206487
2014-04-17 19:14:06 +00:00
Josh Magee adfde5fef6 [stack protector] Make the StackProtector pass respect ssp-buffer-size.
Previously, SSPBufferSize was assigned the value of the "stack-protector-buffer-size"
attribute after all uses of SSPBufferSize.  The effect was that the default
SSPBufferSize was always used during analysis.  I moved the check for the
attribute before the analysis; now --param ssp-buffer-size= works correctly again.

Differential Revision: http://reviews.llvm.org/D3349

llvm-svn: 206486
2014-04-17 19:08:36 +00:00
Tim Northover 037f26f212 Atomics: promote ARM's IR-based atomics pass to CodeGen.
Still only 32-bit ARM using it at this stage, but the promotion allows
direct testing via opt and is a reasonably self-contained patch on the
way to switching ARM64.

At this point, other targets should be able to make use of it without
too much difficulty if they want. (See ARM64 commit coming soon for an
example).

llvm-svn: 206485
2014-04-17 18:22:47 +00:00
Matt Arsenault a90d22fad5 R600/SI: f64 frint is legal on CI
llvm-svn: 206475
2014-04-17 17:06:37 +00:00
Craig Topper 0a9bf4c0c5 [X86] Add disassembler support for the 0x0f 0x7f form of movq %mm, %mm.
llvm-svn: 206447
2014-04-17 06:33:45 +00:00
Matt Arsenault 51df0c1965 R600/SI: Fix zext from i1 to i64
llvm-svn: 206437
2014-04-17 02:03:08 +00:00
Adam Nemet 287f989dde [ARM64] Fix "Cannot select" for vector ctpop
The commit of r205855:

Author: Arnold Schwaighofer <aschwaighofer@apple.com>
Date:   Wed Apr 9 14:20:47 2014 +0000

    SLPVectorizer: Only vectorize intrinsics whose operands are widened equally

    The vectorizer only knows how to vectorize intrinics by widening all operands by
    the same factor.

    Patch by Tyler Nowicki!

exposed a backend bug causing a regression (Cannot select ctpop).

The commit msg is a bit confusing because the patch actually changes the
behavior for the loop-vectorizer as well.  As things got refactored into a
helper ctpop got snuck in to the trivially-vectorizable helper which is now
used by both vectorizers.  In other words, we started seeing vector-ctpops in
the backend.

This change makes ctpop LegalizeAction::Expand for the types not supported by
the byte-only CNT instruction.  We may be able to custom-lower these later to
a single CNT but this is to fix the compiler crash first.

Fixes <rdar://problem/16578951>

llvm-svn: 206433
2014-04-17 01:01:37 +00:00
Gerolf Hoflehner 5f6268a40e Inline a function when the always_inline attribute
is set even when it contains a indirect branch.
The attribute overrules correctness concerns
like the escape of a local block address.

This is for rdar://16501761

llvm-svn: 206429
2014-04-17 00:21:52 +00:00
Konrad Anheim 4e40d7b074 Test commit - Added a new line
llvm-svn: 206399
2014-04-16 16:45:18 +00:00
Matheus Almeida 483d7e9349 [mips] Use TwoOperandAliasConstraint for shift instructions.
This enables TableGen to generate an additional two operand
matcher for our shift_rotate_imm and shift_rotate_reg class of instructions.

The tests were also updated so that they include now encoding information
for all affected instructions.

llvm-svn: 206398
2014-04-16 16:28:59 +00:00
Matheus Almeida 0051f2dc78 [mips] Add initial support for NaN2008 in the back-end.
This is so that EF_MIPS_NAN2008 is set if we are using IEEE 754-2008
NaN encoding (-mnan=2008). This patch also adds support for parsing
'.nan legacy' and '.nan 2008' assembly directives. The handling of
these directives should match GAS' behaviour i.e., the last directive
in use sets the ELF header bit (EF_MIPS_NAN2008).

Differential Revision: http://reviews.llvm.org/D3346

llvm-svn: 206396
2014-04-16 15:48:55 +00:00
Tim Northover cb37ab2d9c AArch64/ARM64: port some NEON tests to ARM64
These ones used completely different sets of intrinsics, so the only way to do
it is create a separate ARM64 copy and change them all.

Other than that, CodeGen was straightforward, no deficiencies detected here.

llvm-svn: 206392
2014-04-16 15:28:02 +00:00
Tim Northover 3e69958b6b AArch64/ARM64: produce correct relocation for conditional branches.
llvm-svn: 206391
2014-04-16 15:27:52 +00:00
Daniel Sanders 16fa1db637 [mips] Fix emission of '.option pic0' for MIPS-IV.
Summary: This was a case of incorrect usage of hasMips64() vs isABI_N64()

Reviewers: matheusalmeida, dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D3398

llvm-svn: 206388
2014-04-16 13:58:57 +00:00
Daniel Sanders a024fb0e04 [mips] Correct r206370 to account for non-Linux targets using the small data section.
This should fix the ninja-x64-msvc-RA-centos6 builder.

I suspect the check in MipsSubtarget.cpp is incorrect and is really trying to
check for a bare-metal target rather and anything other than linux. I'll
investigate this.

llvm-svn: 206385
2014-04-16 12:29:08 +00:00
Kostya Serebryany 0c02d26d6b [asan] add two new hidden compile-time flags for asan: asan-instrumentation-with-call-threshold and asan-memory-access-callback-prefix. This is part of the workaround for PR17409 (instrument huge functions with callbacks instead of inlined code). These flags will also help us experiment with kasan (kernel-asan) and clang
llvm-svn: 206383
2014-04-16 12:12:19 +00:00
Tim Northover 05a4039fc9 ARM64: specify triple so that Linux tests pass
Now that Linux is trying to reparse all inline asm it chokes on the different
comment character in this test.

llvm-svn: 206382
2014-04-16 12:03:56 +00:00
Tim Northover 46ecdf5a0f AArch64/ARM64: add another set of tests from AArch64
Another batch with no code changes.

llvm-svn: 206381
2014-04-16 11:53:07 +00:00
Tim Northover 3ec1de7767 AArch64/ARM64: port across stub handling for ELF C++ exceptions.
The most important part here is that we should actuall emit the stubs we refer
to in the exception table, but as a side issue this uses more sensible & GCC
compatible representations for some of the bits of information.

llvm-svn: 206380
2014-04-16 11:52:55 +00:00
Tim Northover 18f68f6d1a ARM64: use 32-bit moves for constants where possible.
If we know that a particular 64-bit constant has all high bits zero, then we
can rely on the fact that 32-bit ARM64 instructions automatically zero out the
high bits of an x-register. This gives the expansion logic less constraints to
satisfy and so sometimes allows it to pick better sequences.

Came up while porting test/CodeGen/AArch64/movw-consts.ll: this will allow a
32-bit MOVN to be used in @test8 soon.

llvm-svn: 206379
2014-04-16 11:52:51 +00:00
Tim Northover 9cfb57dafa ARM64: use the integrated assembler on ELF.
llvm-svn: 206378
2014-04-16 11:52:40 +00:00
Matheus Almeida dc7e48e084 [mips] Emit '.set nomicromips' before a function's entry label
if not in micromips mode.

The test (elf_st_other.ll) was renamed as the name and description didn't
make sense as the test wasn't checking any symbol table entry.

Differential Revision: http://reviews.llvm.org/D3346

llvm-svn: 206377
2014-04-16 11:46:59 +00:00
Daniel Sanders 11c0c067c2 [mips] Correct callee saved list for the N32 ABI and enable test
Summary: Depends on D3339

Reviewers: matheusalmeida, vmedic

Reviewed By: matheusalmeida

Differential Revision: http://reviews.llvm.org/D3340

llvm-svn: 206371
2014-04-16 10:23:37 +00:00
Daniel Sanders 9fe0ad0c07 [mips] Add calling convention tests covering O32, N32, and N64.
Summary:
I had difficulty finding tests for the N32 and N64 ABI so I've added a
collection of calling convention tests based on the document MIPS ABIs
Described (MD00305), the MIPSpro N32 Handbook, and the SYSV ABI. Where the
documents/implementations disagree, I've used GCC to resolve the conflict.

A few interesting details:
* For N32, LLVM uses 64-bit pointers when saving $ra despite pointers being
  32-bit. I've yet to find a supporting statement in the ABI documentation but
  the current behaviour matches GCC.

* For O32, the non-variable portion of a varargs argument list is also subject
  to the rule that floating-point is passed via GPR's (on N32/N64 only the
  variable portion is subject to this rule). This agrees with GCC's behaviour
  and the SYSV ABI but contradicts part of the MIPSpro N32 Handbook which talks about O32's behaviour.

* The N32 implementation has the wrong callee-saved register list.
  (I already have a fix for this but will commit it as a follow-up).

I've left RUN-TODO lines in for O32 on MIPS64. I don't plan to support this case
for now but we should revisit it.

Reviewers: matheusalmeida, vmedic

Reviewed By: matheusalmeida

Differential Revision: http://reviews.llvm.org/D3339

llvm-svn: 206370
2014-04-16 09:59:46 +00:00
Tim Northover f8d183e8b9 ARM64: explicitly ask for Apple NEON syntax so test passes on Linux
llvm-svn: 206368
2014-04-16 09:13:44 +00:00
Tim Northover 97c5b6fe4f ARM64: mark x7 as used when an i128 gets shunted onto the stack.
The second half of a split i128 was ending up in x7, which is not a good thing.

This is another part of PR19432.

llvm-svn: 206366
2014-04-16 09:03:25 +00:00
Tim Northover 863a789a99 DAGCombiner: don't optimise non-existant litpool load
This particular DAG combine is designed to kick in when both ConstantFPs will
end up being loaded via a litpool, however those nodes have a semi-legal
status, dictated by isFPImmLegal so in some cases there wouldn't have been a
litpool in the first place. Don't try to be clever in those circumstances.

Picked up while merging some AArch64 tests.

llvm-svn: 206365
2014-04-16 09:03:09 +00:00
Saleem Abdulrasool 057094c6f6 COFF: fix an off by one error
Adjust the tests to validate the number of auxiliary entries used to store the
filename.

Thanks to majnemer's sharp eye for catching the missing - 1 in the round up
calculation.

llvm-svn: 206359
2014-04-16 06:22:53 +00:00
Saleem Abdulrasool a2bf05aa2f COFF: add support for .file symbols
Add support for emitting .file records.  This is mostly a quality of
implementation change (more complete support for COFF file emission) that was
noticed while working on COFF file emission for Windows on ARM.

A .file record is emitted as a symbol with storage class FILE (103) and the name
".file".  A series of auxiliary format 4 records follow which contain the file
name.  The filename is stored as an ANSI string and is padded with NULL if the
length is not a multiple of COFF::SymbolSize (18).

llvm-svn: 206355
2014-04-16 04:15:32 +00:00
Saleem Abdulrasool 3b5e00130e tools: fix invalid printing, buffer overrun in llvm-readobj
All auxiliary records are consumed when accessing a File record.

llvm-svn: 206354
2014-04-16 04:15:29 +00:00
Matt Arsenault 4ef2588b65 R600: Extend r600 sign_extend_inreg tests for EG
Patch by: Jan Vesely <jan.vesely@rutgers.edu>

llvm-svn: 206349
2014-04-16 01:41:34 +00:00
Matt Arsenault 4d7d38333b R600/SI: Print more immediates in hex format
Print in decimal for inline immediates, and hex otherwise. Use hex
always for offsets in addressing offsets.

This approximately matches what the shader compiler does.

llvm-svn: 206335
2014-04-15 22:32:49 +00:00
Nick Lewycky 43855af9a7 Make this test not match its own filename, when being run from a path that includes the string 'add'.
llvm-svn: 206331
2014-04-15 22:29:32 +00:00
Matt Arsenault 470acd81a8 R600/SI: Fix loads of i1
llvm-svn: 206330
2014-04-15 22:28:39 +00:00
Akira Hatanaka 3d90f99d1a Make FastISel::SelectInstruction return before target specific fast-isel code
handles Intrinsic::trap if TargetOptions::TrapFuncName is set.

This fixes a bug in which the trap function was not taken into consideration
when a program was compiled without optimization (at -O0).

<rdar://problem/16291933>

llvm-svn: 206323
2014-04-15 21:30:06 +00:00
Andrea Di Biagio aac2eac4c2 [X86] Improve the lowering of packed shifts by constant build_vector.
This patch teaches the backend how to efficiently lower logical and
arithmetic packed shifts on both SSE and AVX/AVX2 machines.

When possible, instead of scalarizing a vector shift, the backend should try
to expand the shift into a sequence of two packed shifts by immedate count
followed by a MOVSS/MOVSD.

Example
  (v4i32 (srl A, (build_vector < X, Y, Y, Y>)))

Can be rewritten as:
  (v4i32 (MOVSS (srl A, <Y,Y,Y,Y>), (srl A, <X,X,X,X>)))

[with X and Y ConstantInt]

The advantage is that the two new shifts from the example would be lowered into
X86ISD::VSRLI nodes. This is always cheaper than scalarizing the vector into
four scalar shifts plus four pairs of vector insert/extract.

llvm-svn: 206316
2014-04-15 19:30:48 +00:00
Quentin Colombet 72dad56c53 [ARM64] Set default CPU to generic instead of cyclone.
llvm-svn: 206313
2014-04-15 19:08:46 +00:00
Robert Lougher a9bf2463b9 Revert r191049/r191059 as it can produce wrong code (see PR17975).
It has already been reverted on the 3.4 branch in r196521.

llvm-svn: 206311
2014-04-15 18:34:24 +00:00
Julien Lerouge be4fe32eb8 Add lifetime markers for allocas created to hold byval arguments, make them
appear in the InlineFunctionInfo.

llvm-svn: 206308
2014-04-15 18:06:46 +00:00
Tim Northover bd668872c0 AArch64/ARM64: enable more AArch64 tests on ARM64.
No code changes for this bunch, just some test rejigs.

llvm-svn: 206291
2014-04-15 14:00:29 +00:00
Tim Northover ebb3123a5f AArch64/ARM64: add missing pattern for extending load.
llvm-svn: 206290
2014-04-15 14:00:19 +00:00
Tim Northover cbcb7a37f7 AArch64/ARM64: only mangle MOVZ/MOVN during encoding when needed
Sometimes we need emit the bits that would actually be a MOVN when producing a
relocated MOVZ instruction (don't ask). But not always, a check which ARM64 got
wrong until now.

llvm-svn: 206289
2014-04-15 14:00:15 +00:00
Tim Northover 6e27b8ded5 AArch64/ARM64: add support for large code-model jump tables.
I've left the MachO CodeGen as it is, there's a reasonable chance it should use
the GOT like ConstPools, but I'm not certain.

llvm-svn: 206288
2014-04-15 14:00:11 +00:00
Tim Northover 221b583951 AArch64/ARM64: add patterns for various commutations of FNMADD.
llvm-svn: 206287
2014-04-15 14:00:06 +00:00
Tim Northover b37cff1ae2 AArch64/ARM64: add half as a storage type on ARM64.
This brings it into line with the AArch64 behaviour and should open the way for
certain OpenCL features.

llvm-svn: 206286
2014-04-15 14:00:03 +00:00
Tim Northover 80a70a265a AArch64/ARM64: copy patterns for fixed-point conversions
Code is mostly copied directly across, with a slight extension of the
ISelDAGToDAG function so that it can cope with the floating-point constants
being behind a litpool.

llvm-svn: 206285
2014-04-15 13:59:57 +00:00
Tim Northover f70577b1cd ARM64: add constraints to various FastISel operations
llvm-svn: 206284
2014-04-15 13:59:53 +00:00
Tim Northover 27010074fb AArch64/ARM64: add more arm64 lines to AArch64 regression tests
llvm-svn: 206282
2014-04-15 13:59:44 +00:00
Tim Northover 20603726ce AArch64/ARM64: add dp tests from AArch64
llvm-svn: 206281
2014-04-15 13:59:40 +00:00
Stepan Dyatkovskiy 95cdac43af Optional hash symbol feature support for ARM64
http://reviews.llvm.org/D3328

llvm-svn: 206276
2014-04-15 11:43:09 +00:00
NAKAMURA Takumi 0ec1918675 vect.omp.persistence.ll REQUIRES asserts due to -debug-only.
llvm-svn: 206271
2014-04-15 10:12:47 +00:00
Alexey Bataev b97f9e8698 D3348 - [BUG] "Rotate Loop" pass kills "llvm.vectorizer.enable" metadata
llvm-svn: 206266
2014-04-15 09:37:30 +00:00
Quentin Colombet 97c05b52b4 [MC] Emit an error if cfi_startproc is used before a symbol is defined.
Currently, we bind those directives with the last symbol, so if none
has been defined, this would lead to a crash of the compiler.

<rdar://problem/15939159>

llvm-svn: 206236
2014-04-15 01:17:45 +00:00
Quentin Colombet c396019837 [Register Coalescer] Add a test case for 206060.
<rdar://problem/16582185>

llvm-svn: 206235
2014-04-15 01:15:32 +00:00
David Blaikie 9027abae53 Change argument order and add explanatory comment to r206130
Changes requested in code review by Eric Christopher of r206130.

llvm-svn: 206219
2014-04-14 22:23:06 +00:00
Matt Arsenault fed3dc8dc6 Revert "Revert r206045, "Fix shift by constants for vector.""
Fix cases where the Value itself is used, and not the constant value.

llvm-svn: 206214
2014-04-14 21:50:37 +00:00
Adrian Prantl 8714aaf0a5 Re-apply r206096 after investigating the gdb buildbot failure.
Thanks to dblaikie for updating the testcase!

Debug info: (bugfix) C++ C/Dtors can be compiled to multiple functions,
therefore, their declaration cannot have one DW_AT_linkage_name.
The specific instances however can and should have that attribute.

This patch reorders the code in DwarfUnit::getOrCreateSubprogramDIE()
to emit linkage names for C/Dtors.

rdar://problem/16362674.

llvm-svn: 206210
2014-04-14 21:16:04 +00:00
Louis Gerbarg cfc05450e5 Fix for codegen bug that could cause illegal cmn instruction generation
In rare cases the dead definition elimination pass code can cause illegal cmn
instructions when it replaces dead registers on instructions that use
unmaterialized frame indexes. This patch disables the dead definition
optimization for instructions which include frame index operands.

rdar://16438284

llvm-svn: 206208
2014-04-14 21:05:05 +00:00
Louis Gerbarg 6d2e3c638f Add a flag to disable the ARM64DeadRegisterDefinitionsPass
This patch adds a -arm64-dead-def-elimination flag so that it is possible to
disable dead definition elimination. Includes test case.

llvm-svn: 206207
2014-04-14 21:05:02 +00:00
Akira Hatanaka 5638b89944 Fix a bug in which BranchProbabilityInfo wasn't setting branch weights of basic blocks inside loops correctly.
Previously, BranchProbabilityInfo::calcLoopBranchHeuristics would determine the weights of basic blocks inside loops even when it didn't have enough information to estimate the branch probabilities correctly. This patch fixes the function to exit early if it doesn't see any exit edges or back edges and let the later heuristics determine the weights.

This fixes PR18705 and <rdar://problem/15991090>.

Differential Revision: http://reviews.llvm.org/D3363

llvm-svn: 206194
2014-04-14 16:56:19 +00:00
Richard Trieu 3df79775c5 Fix 2008-03-05-SxtInRegBug.ll so that the CHECK-NOT will not match the filename.
llvm-svn: 206193
2014-04-14 16:53:50 +00:00
Kaelyn Takata f9d483128c Fix up MCFixup::getAccessVariant to handle unary expressions.
This allows correct relocations to be generated for a symbolic
address that is being adjusted by a negative constant. Since r204294,
such expressions have triggered undefined behavior when LLVM was built
without assertions.

Credit goes to Rafael for this patch; I'm submitting it on his behalf
as he is on vacation this week.

llvm-svn: 206192
2014-04-14 16:50:22 +00:00
Saleem Abdulrasool 13a3f6914b tools: fix heap-buffer-overrun detected via ASAN
Once the auxiliary fields relating to the filename have been inspected, any
following auxiliary fields need not be visited as they have been consumed (the
following fields comprise the filepath as a single unit).

Adjust the test to catch this even if ASAN is not enabled.

llvm-svn: 206190
2014-04-14 16:38:25 +00:00
Daniel Sanders 863c35a358 [mips] Fix fcopysign for MIPS-IV and add the test.
Summary:
This was another incorrect use of hasMips64() vs isGP64bit().

Depends on D3344

Reviewers: matheusalmeida, vmedic

Reviewed By: vmedic

Differential Revision: http://reviews.llvm.org/D3347

llvm-svn: 206187
2014-04-14 16:24:12 +00:00
Daniel Sanders 1d3ae27f01 [mips] MIPS-IV is broadly the same as MIPS64 so duplicate all -mcpu=mips64 tests with -mcpu=mips4 as a starting point
Summary:
Two exceptions to this:
  test/CodeGen/Mips/octeon.ll
  test/CodeGen/Mips/octeon_popcnt.ll
these test extensions to MIPS64

One test is altered for MIPS-IV:
  test/CodeGen/Mips/mips64countleading.ll
    Tests dclo/dclz which were added in MIPS64. The MIPS-IV version tests
    that dclo/dclz are not emitted.

Four tests fail and are not in this patch:
  test/CodeGen/Mips/abicalls.ll
  test/CodeGen/Mips/fcopysign-f32-f64.ll
  test/CodeGen/Mips/fcopysign.ll
  test/CodeGen/Mips/stack-alignment.ll

Depends on D3343

Reviewers: matheusalmeida, vmedic

Reviewed By: vmedic

Differential Revision: http://reviews.llvm.org/D3344

llvm-svn: 206185
2014-04-14 16:00:28 +00:00
Daniel Sanders 3d84935d28 [mips] Fix more incorrect uses of HasMips64 and isMips64()
Summary:
- Conditional moves acting on 64-bit GPR's should require MIPS-IV rather than MIPS64
- ISD::MUL, and ISD::MULH[US] should be lowered on all 64-bit ISA's

Patch by David Chisnall
His work was sponsored by: DARPA, AFRL

I've added additional testcases to cover as much of the codegen changes
affecting MIPS-IV as I can. Where I've been unable to find an existing
MIPS64 testcase that can be re-used for MIPS-IV (mainly tests covering
ISD::GlobalAddress and similar), I at least agree that MIPS-IV should
behave like MIPS64. Further testcases that are fixed by this patch will follow
in my next commit. The testcases from that commit that fail for MIPS-IV without
this patch are:
    LLVM :: CodeGen/Mips/2010-07-20-Switch.ll
    LLVM :: CodeGen/Mips/cmov.ll
    LLVM :: CodeGen/Mips/eh-dwarf-cfa.ll
    LLVM :: CodeGen/Mips/largeimmprinting.ll
    LLVM :: CodeGen/Mips/longbranch.ll
    LLVM :: CodeGen/Mips/mips64-f128.ll
    LLVM :: CodeGen/Mips/mips64directive.ll
    LLVM :: CodeGen/Mips/mips64ext.ll
    LLVM :: CodeGen/Mips/mips64fpldst.ll
    LLVM :: CodeGen/Mips/mips64intldst.ll
    LLVM :: CodeGen/Mips/mips64load-store-left-right.ll
    LLVM :: CodeGen/Mips/sint-fp-store_pattern.ll

Reviewers: dsanders

Reviewed By: dsanders

CC: matheusalmeida

Differential Revision: http://reviews.llvm.org/D3343

llvm-svn: 206183
2014-04-14 15:44:42 +00:00
Tim Northover db2860f49e ARM64: specify full triple in tests to pacify Windows.
llvm-svn: 206175
2014-04-14 13:18:48 +00:00
Tim Northover a89617bd33 AArch64: add newline to end of test files.
Should be no other change.

llvm-svn: 206174
2014-04-14 13:18:40 +00:00
Tim Northover cb9c3cfb58 ARM64: remove buggy REV16 pattern.
The 32-bit pattern is still valid: 0123 -> 3210 -> 1032.

llvm-svn: 206172
2014-04-14 12:59:52 +00:00
Tim Northover b6abe806c7 AArch64/ARM64: enable directcond.ll test on ARM64.
Code change is because optimizeCompareInstr didn't know how to pull the
condition code out of FCSEL instructions.

llvm-svn: 206171
2014-04-14 12:51:06 +00:00
Tim Northover 0d7bd4f444 ARM64: add patterns for csXYZ with reversed operands.
AArch64 tests for this, and it's obviously a good idea. Have to invert the
condition code, of course.

llvm-svn: 206170
2014-04-14 12:51:02 +00:00
Tim Northover c398cd53aa ARM64: enable more regression tests from AArch64
llvm-svn: 206169
2014-04-14 12:50:58 +00:00
Tim Northover 2f48303436 ARM64: add support for AArch64's addsub_ext.ll
There was one definite issue in ARM64 (the off-by-1 check for whether
a shift could be folded in) and one difference that is probably
correct: ARM64 didn't fold nodes with multiple uses into the
arithmetic operations unless optimising for code size.

llvm-svn: 206168
2014-04-14 12:50:50 +00:00
Tim Northover 23b1f08282 ARM64: optimise (cmp x, (sub 0, y)) to (cmn x, y).
This transformation is only valid when being used for an EQ or NE
comparison since the flags change otherwise.

llvm-svn: 206167
2014-04-14 12:50:47 +00:00
Tim Northover d1719a8f76 ARM64: start porting regression test suite from AArch64
llvm-svn: 206166
2014-04-14 12:50:41 +00:00
Richard Osborne da16ff47cd [XCore] Don't create invalid MKMSK instructions inside loadImmediate().
Summary:
Previously loadImmediate() would produce MKMSK instructions with invalid
immediate values such as mkmsk r0, 9. Fix this by checking the mask size
is valid.

Reviewers: robertlytton

Reviewed By: robertlytton

CC: llvm-commits

Differential Revision: http://reviews.llvm.org/D3289

llvm-svn: 206163
2014-04-14 12:30:35 +00:00
NAKAMURA Takumi 58ad0c87f8 Whitespace.
llvm-svn: 206154
2014-04-14 07:03:13 +00:00
NAKAMURA Takumi 26afa982ec Revert r206045, "Fix shift by constants for vector."
It broke some builders, at least, i686.

llvm-svn: 206153
2014-04-14 07:02:57 +00:00
Hal Finkel 56bf297e3a Don't assert in BasicTTI::getMemoryOpCost for non-simple types
BasicTTI::getMemoryOpCost must explicitly check for non-simple types; setting
AllowUnknown=true with TLI->getSimpleValueType is not sufficient because, for
example, non-power-of-two vector types return non-simple EVTs (not MVT::Other).

llvm-svn: 206150
2014-04-14 05:59:09 +00:00
Saleem Abdulrasool d38c6b1e4b tools: address possible non-null terminated filenames
If a filename is a multiple of 18 characters, there will be no null-terminator.
This will result in an invalid access by the constructed StringRef.  Add a test
case to exercise this and fix that handling.  Address this same vulnerability in
llvm-readobj as well.

llvm-svn: 206145
2014-04-14 02:37:23 +00:00
Hal Finkel 0192cbac66 [PowerPC] [Constant Hoisting] Enable constant hoisting on PPC
Implements the various TTI functions to enable constant hoisting on PPC. The
only significant test-suite change is this:

MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup
(which essentially reverses the slowdown from r206120).

llvm-svn: 206141
2014-04-13 23:02:40 +00:00
Serge Pavlov 4bb54d51c8 Recognize test for overflow in integer multiplication.
If multiplication involves zero-extended arguments and the result is
compared as in the patterns:

    %mul32 = trunc i64 %mul64 to i32
    %zext = zext i32 %mul32 to i64
    %overflow = icmp ne i64 %mul64, %zext
or
    %overflow = icmp ugt i64 %mul64 , 0xffffffff

then the multiplication may be replaced by call to umul.with.overflow.
This change fixes PR4917 and PR4918.

Differential Revision: http://llvm-reviews.chandlerc.com/D2814

llvm-svn: 206137
2014-04-13 18:23:41 +00:00
Hal Finkel d9963c75da [PowerPC] Fix rlwimi isel when mask is not constant
We had been using the known-zero values of the operand of the or to construct
the mask for an rlwimi; this is not quite correct, but fine when the mask is
constant. When the mask is constant, then the known zeros of the operand must
be a superset of the zeros in the mask. However, when the mask is not a
constant, then there might be bits in the operand that are not known to be zero
that, at runtime, might be zero in the mask. Therefore, we check that any bits
not known to be zero *are* known to be one in the mask. Otherwise, we can't
fold the mask with the or and shift.

This was revealed as a miscompile of
MultiSource/Benchmarks/BitBench/drop3/drop3 when I started experimenting with
constant hoisting.

llvm-svn: 206136
2014-04-13 17:10:58 +00:00
David Blaikie 269e0fb2e4 Fix instruction debug info location during legalization
I found this from a particular GDB test suite case of inlining
(something similar is provided as a test case) but came across a few
other related cases (other callers of the same functions, and one other
instance of the same coding mistake in a separate function).

I'm not sure what the best way to test this is (let alone to cover the
other cases I discovered), so hopefully this sufficies - open to ideas.

llvm-svn: 206130
2014-04-13 06:39:55 +00:00
Saleem Abdulrasool 9ede5c7dd0 tools: teach objdump about FILE aux records
Add support for file auxiliary symbol entries in COFF symbol tables.  A COFF
symbol table with a FILE entry is followed by sizeof(__FILE__) / 18 auxiliary
symbol records which contain the filename.  Read them and form the original
filename that the record contains.  Then display the name in the output.

llvm-svn: 206126
2014-04-13 03:11:08 +00:00
Hal Finkel 34974ed503 [PowerPC] Implement some additional TLI callbacks
Add implementations of:
  bool isLegalICmpImmediate(int64_t Imm) const
  bool isLegalAddImmediate(int64_t Imm) const
  bool isTruncateFree(Type *Ty1, Type *Ty2) const
  bool isTruncateFree(EVT VT1, EVT VT2) const
  bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const

Unfortunately, this regresses counter-register-based loop formation because
some of the loops now end up in forms were SE cannot compute loop counts.
However, nevertheless, the test-suite results favor committing:

SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup
MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup

MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown

llvm-svn: 206120
2014-04-12 21:52:38 +00:00
David Blaikie ee11f22640 PR13337: Omit DW_TAG_restrict_type when compiling for DWARF2
DWARF3 introduced DW_TAG_restrict_type, so avoid using it in prior
versions.

llvm-svn: 206105
2014-04-12 05:35:59 +00:00
Richard Trieu 97a268d905 Add extra checks to mvn.ll test to prevent the "f1" check from matching
on a directory name instead of the function name.

llvm-svn: 206104
2014-04-12 04:47:04 +00:00
Adrian Prantl d3dd11d628 Revert "Debug info: (bugfix) C++ C/Dtors can be compiled to multiple functions,"
This reverts commit 206096 while I investigate why this broke the gdb
buildbot.

llvm-svn: 206103
2014-04-12 04:25:02 +00:00
Juergen Ributzka cf03068d91 [ARM64] Never hoist the shift value of a shift instruction.
There is no need to check if we want to hoist the immediate value of an
shift instruction. Simply return TCC_Free right away.

llvm-svn: 206101
2014-04-12 02:53:51 +00:00
Juergen Ributzka 6e17aa45a3 [ARM64] Fix the cost model for cheap large constants.
Originally the cost model would give up for large constants and just return the
maximum cost. This is not what we want for constant hoisting, because some of
these constants are large in bitwidth, but are still cheap to materialize.

This commit fixes the cost model to either return TCC_Free if the cost cannot be
determined, or accurately calculate the cost even for large constants
(bitwidth > 128).

This fixes <rdar://problem/16591573>.

llvm-svn: 206100
2014-04-12 02:36:28 +00:00
Adrian Prantl 1f2f3c3434 Debug info: (bugfix) C++ C/Dtors can be compiled to multiple functions,
therefore, their declaration cannot have one DW_AT_linkage_name.
The specific instances however can and should have that attribute.

This patch reorders the code in DwarfUnit::getOrCreateSubprogramDIE()
to emit linkage names for C/Dtors.

rdar://problem/16362674.

llvm-svn: 206096
2014-04-12 01:44:42 +00:00
Hal Finkel 3b48d08f54 Reenable use of TBAA during CodeGen
We had disabled use of TBAA during CodeGen (even when otherwise using AA)
because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to
miss basic type punning that it should catch (and, thus, we'd fail to override
TBAA when we should).

However, when AA is in use during CodeGen, CGP now uses normal GEPs and
bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a
result, BasicAA should be able to make us do the right thing in the face of
type-punning, and it seems safe to enable use of TBAA again. self-hosting seems
fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle.

Note: We still don't update TBAA when merging stack slots, although because
BasicAA should now catch all such cases, this is no longer a blocking issue.
Nevertheless, I plan to commit code to deal with this properly in the near
future.

llvm-svn: 206093
2014-04-12 01:26:00 +00:00
Hal Finkel c3998306f4 Add the ability to use GEPs for address sinking in CGP
The current memory-instruction optimization logic in CGP, which sinks parts of
the address computation that can be adsorbed by the addressing mode, does this
by explicitly converting the relevant part of the address computation into
IR-level integer operations (making use of ptrtoint and inttoptr). For most
targets this is currently not a problem, but for targets wishing to make use of
IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a
problem for two reasons:
  1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr
  2. In cases where type-punning was used, and BasicAA was used
     to override TBAA, BasicAA may no longer do so. (this had forced us to disable
     all use of TBAA in CodeGen; something which we can now enable again)

This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by
default (except for those targets that use AA during CodeGen), and so aside
from some PowerPC subtargets and SystemZ, there should be no change in
behavior. We may be able to switch completely away from the ptrtoint/inttoptr
sinking on all targets, but further testing is required.

I've doubled-up on a number of existing tests that are sensitive to the
address sinking behavior (including some store-merging tests that are
sensitive to the order of the resulting ADD operations at the SDAG level).

llvm-svn: 206092
2014-04-12 00:59:48 +00:00
Louis Gerbarg b9a0551862 Add ARM64 CLS patterns
This patch adds patterns to generate the cls instruction ARM64. Includes tests
for 64 bit and 32 bit operands.

rdar://15611957

llvm-svn: 206079
2014-04-11 22:27:58 +00:00
Quentin Colombet 4344da1c71 [RegAllocGreedy][Last Chance Recoloring] Change the name of the exhaustive search option.
fexhaustive-register-search => exhaustive-register-search
'f' is a Clang thing!

This is related to PR18747.

llvm-svn: 206075
2014-04-11 21:51:09 +00:00
Quentin Colombet 567e30bc2b [RegAllocGreedy][Last Chance Recoloring] Addition of
-fexhaustive-register-search option to allow an exhaustive search during last
chance recoloring.

This is related to PR18747

Patch by MAYUR PANDEY <mayur.p@samsung.com>. 

llvm-svn: 206072
2014-04-11 21:39:44 +00:00
Rafael Espindola 9ef844165a Don't lose the thumb bit by using relocations with sections.
This fixes a regression from r205076.

llvm-svn: 206047
2014-04-11 19:18:01 +00:00
Adrian Prantl 63fb6efd03 Add some CHECKs to this testcase.
llvm-svn: 206046
2014-04-11 18:08:37 +00:00
Matt Arsenault 173a1e577c Fix shift by constants for vector.
ashr <N x iM>, <N x iM> M -> undef

llvm-svn: 206045
2014-04-11 17:57:53 +00:00
Adrian Prantl 3bdcb52dd1 Debug info: Store the DIVariable in DebugLocEntry also for constants,
so DwarfDebug::emitDebugLocEntry can emit them with the correct signedness.

rdar://problem/15928306

llvm-svn: 206042
2014-04-11 17:49:47 +00:00
Tom Stellard a1a5d9aa2e SelectionDAG: Use helper function to improve legalization of ISD::MUL
The TargetLowering::expandMUL() helper contains lowering code extracted
from the DAGTypeLegalizer and allows the SelectionDAGLegalizer to expand more
ISD::MUL patterns without having to use a library call.

llvm-svn: 206037
2014-04-11 16:12:01 +00:00
Daniel Sanders 4b43e91abc Revert: r205182 - llvm/test/MC/Mips/mips64r2/valid-xfail.s: This REQUIRES asserts. Seems it doesn't fail with -Asserts.
This was most likely caused by an uninitialized value and the relevant code was re-written in r205292. Reverting to see if it still fails on any of the buildbots.

llvm-svn: 206033
2014-04-11 15:33:36 +00:00
Simon Atanasyan 42ac0dd3c3 [yaml2obj][ELF] ELF Relocations Support.
The patch implements support for both relocation record formats: Elf_Rel
and Elf_Rela. It is possible to define relocation against symbol only.
Relocations against sections will be implemented later. Now yaml2obj
recognizes X86_64, MIPS and Hexagon relocation types.

Example of relocation section specification:
Sections:
- Name: .text
  Type: SHT_PROGBITS
  Content: "0000000000000000"
  AddressAlign: 16
  Flags: [SHF_ALLOC]

- Name: .rel.text
  Type: SHT_REL
  Info: .text
  AddressAlign: 4
  Relocations:
    - Offset: 0x1
      Symbol: glob1
      Type: R_MIPS_32
    - Offset: 0x2
      Symbol: glob2
      Type: R_MIPS_CALL16

The patch reviewed by Michael Spencer, Sean Silva, Shankar Easwaran.

llvm-svn: 206017
2014-04-11 04:13:39 +00:00
Reid Kleckner 9c6582129a Move the segmented stack switch to a function attribute
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.

Patch by Luqman Aden and Alex Crichton!

llvm-svn: 205997
2014-04-10 22:58:43 +00:00
Josh Magee 79ae600818 [stack protector] Refactor and clean-up test. No functionality change.
Refactored stack-protector.ll to use new-style function attributes everywhere
and eliminated unnecessary attributes.

This cleanup is in preparation for an upcoming test change.

llvm-svn: 205996
2014-04-10 22:47:27 +00:00
Kaelyn Takata f7c12fcd83 Remove the use of "%e" as it is not a valid expansion like "%t".
llvm-svn: 205991
2014-04-10 21:55:58 +00:00
David Blaikie 8019bf815d Reimplement debug info compression by compressing the whole section, rather than a fragment.
To support compressing the debug_line section that contains multiple
fragments (due, I believe, to variation in choices of line table
encoding depending on the size of instruction ranges in the actual
program code) we needed to support compressing multiple MCFragments in a
single pass.

This patch implements that behavior by mutating the post-relaxed and
relocated section to be the compressed form of its former self,
including renaming the section.

This is a more flexible (and less invasive, to a degree) implementation
that will allow for other features such as "use compression only if it's
smaller than the uncompressed data".

Compressing debug_frame would be a possible further extension to this
work, but I've left it for now. The hurdle there is alignment sections -
which might require going as far as to refactor
MCAssembler.cpp:writeFragment to handle writing to a byte buffer or an
MCObjectWriter (there's already a virtual call there, so it shouldn't
add substantial compile-time cost) which could in turn involve
refactoring MCAsmBackend::writeNopData to use that same abstraction...
which involves touching all the backends. This would remove the limited
handling of fragment writing seen in
ELFObjectWriter.cpp:getUncompressedData which would be nice - but it's
more invasive.

I did discover that I (perhaps obviously) don't need to handle
relocations when I rewrite the fragments - since the relocations have
already been applied and computed (and stored into
ELFObjectWriter::Relocations) by this stage (necessarily, because we
need to have written any immediate values or assembly-time relocations
into the data already before we compress it, which we have). The test
case doesn't necessarily cover that in detail - I can add more test
coverage if that's preferred.

llvm-svn: 205990
2014-04-10 21:53:53 +00:00
David Blaikie 4d3b043542 Revert debug info compression support.
To support compression for debug_line and debug_frame a different
approach is required. To simplify review, revert the old implementation
and XFAIL the test case. New implementation to follow shortly.

Reverts r205059 and r204958.

llvm-svn: 205989
2014-04-10 21:53:47 +00:00
Kevin Enderby 488f20b64e For the ARM integrated assembler add checking of the
alignments on vld/vst instructions.  And report errors for
alignments that are not supported.

While this is a large diff and an big test case, the changes
are very straight forward.  But pretty much had to touch
all vld/vst instructions changing the addrmode to one of the
new ones that where added will do the proper checking for
the specific instruction.

FYI, re-committing this with a tweak so MemoryOp's default
constructor is trivial and will work with MSVC 2012. Thanks
to Reid Kleckner and Jim Grosbach for help with the tweak.

rdar://11312406

llvm-svn: 205986
2014-04-10 20:18:58 +00:00
Arnold Schwaighofer b373e01d87 Reapply "SLPVectorizer: Ignore users that are insertelements we can reschedule them"
This commit reapplies 205018. After 205855 we should correctly vectorize
intrinsics.

llvm-svn: 205965
2014-04-10 13:41:35 +00:00
Daniel Sanders ca275d2a14 [mips] Switch the MIPS-III and MIPS-IV assembler tests to use -mcpu=mips4.
Summary:
It is now the smallest superset for these ISA's.

FeatureMips4 now contains FeatureFPIdx since [ls][dw]xc1 were added in MIPS-IV.
Made the FPIdx feature bit lowercase so that it can be used in the -mattr option.

Depends on D3274

Reviewers: matheusalmeida

Reviewed By: matheusalmeida

Differential Revision: http://reviews.llvm.org/D3275

llvm-svn: 205964
2014-04-10 13:16:49 +00:00
David Majnemer 7788033be6 YAMLIO: Allow scalars to dictate quotation rules
Introduce ScalarTraits::mustQuote which determines whether or not a
StringRef needs quoting before it is acceptable to output.

llvm-svn: 205955
2014-04-10 07:37:33 +00:00
Juergen Ributzka 48c8c07d0a [ARM64] Fix immediate cost calculation for types larger than i64.
The immediate cost calculation code was hitting an assertion in the included
test case, because APInt was still internally 128-bits. Truncating it to 64-bits
fixed the issue.

Fixes <rdar://problem/16572521>.

llvm-svn: 205947
2014-04-10 01:36:59 +00:00
Reid Kleckner 2d4a69e9c9 Revert "For the ARM integrated assembler add checking of the alignments on vld/vst instructions. And report errors for alignments that are not supported."
It doesn't build with MSVC 2012, because MSVC doesn't allow union
members that have non-trivial default constructors.  This change added
'SMLoc AlignmentLoc' to MemoryOp, which made MemoryOp's default ctor
non-trivial.

This reverts commit r205930.

llvm-svn: 205944
2014-04-10 00:52:14 +00:00
Jim Grosbach 576f8cf19f X86: Tighten up test.
llc CPU autodection bites again. Speculative fix for bot failures.

llvm-svn: 205940
2014-04-10 00:27:43 +00:00
Jim Grosbach e4fef71981 Add support for load folding of avx1 logical instructions
AVX supports logical operations using an operand from memory. Unfortunately
because integer operations were not added until AVX2 the AVX1 logical
operation's types were preventing the isel from folding the loads. In a limited
number of cases the peephole optimizer would fold the loads, but most were
missed. This patch adds explicit patterns with appropriate casts in order for
these loads to be folded.

The included test cases run on reduced examples and disable the peephole
optimizer to ensure the folds are being pattern matched.

Patch by Louis Gerbarg <lgg@apple.com>

rdar://16355124

llvm-svn: 205938
2014-04-09 23:39:25 +00:00
Jim Grosbach cad4cd6c9e SelectionDAG: Don't constant fold target-specific nodes.
FoldConstantArithmetic() only knows how to deal with a few target independent
ISD opcodes. Bail early if it sees a target-specific ISD node. These node do
funny things with operand types which may break the assumptions of the code
that follows, and there's no actual folding that can be done anyway. For example,
non-constant 256 bit vector shifts on X86 have a shift-amount operand that's a
128-bit v4i32 vector regardless of what the first operand type is and that breaks
the assumption that the operand types must match.

rdar://16530923

llvm-svn: 205937
2014-04-09 23:28:11 +00:00
Kevin Enderby c296ecd96c For the ARM integrated assembler add checking of the
alignments on vld/vst instructions.  And report errors for
alignments that are not supported.

While this is a large diff and an big test case, the changes
are very straight forward.  But pretty much had to touch
all vld/vst instructions changing the addrmode to one of the
new ones that where added will do the proper checking for
the specific instruction.

rdar://11312406

llvm-svn: 205930
2014-04-09 21:32:59 +00:00
Chad Rosier 5f8d6a6c15 [AArch64] Implement the isZExtFree APIs.
llvm-svn: 205926
2014-04-09 20:51:21 +00:00
Chad Rosier 9ce19fb65c [AArch64] Implement the isTruncateFree API.
In AArch64 i64 to i32 truncate operation is a subregister access.

This allows more opportunities for LSR optmization to eliminate
variables of different types (i32 and i64).

llvm-svn: 205925
2014-04-09 20:43:40 +00:00
Quentin Colombet 0b1a5584d6 [DAGCombiner] DAG combine does not know how to combine indexed loads with
sign/zero/any extensions. However a few places were not checking properly the
property of the load and were turning an indexed load into a regular extended
load. Therefore the indexed value was lost during the process and this was
triggering an assertion.

<rdar://problem/16389332>

llvm-svn: 205923
2014-04-09 20:03:05 +00:00
David Majnemer 97d8ee3824 Revert "Revert "YAMLIO: Encode ambiguous hex strings explicitly""
Don't quote octal compatible strings if they are only two wide, they
aren't ambiguous.

This reverts commit r205857 which reverted r205857.

llvm-svn: 205914
2014-04-09 17:04:27 +00:00
David Majnemer 3bb6073919 obj2yaml: Don't crash if the characteristics field is zero
obj2yaml would fail when seeing a Weak External auxiliary record with a
characteristics field holding zero instead of one of
IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY, IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY,
or IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY.

llvm-svn: 205911
2014-04-09 16:38:15 +00:00
Justin Holewinski 30d56a7b86 [NVPTX] Add preliminary intrinsics and codegen support for textures/surfaces
This commit adds intrinsics and codegen support for the surface read/write and texture read instructions that take an explicit sampler parameter. Codegen operates on image handles at the PTX level, but falls back to direct replacement of handles with kernel arguments if image handles are not enabled. Note that image handles are explicitly disabled for all target architectures in this change (to be enabled later).

llvm-svn: 205907
2014-04-09 15:39:15 +00:00
Justin Holewinski 9d852a8e08 [NVPTX] Add support for addrspacecast in global variable initializers, including emitting generic() when casting to address space 0.
llvm-svn: 205906
2014-04-09 15:39:11 +00:00
Alp Toker 16f98b255d Fix some doc and comment typos
llvm-svn: 205899
2014-04-09 14:47:27 +00:00
Bradley Smith 246b0b617d [ARM64] Change SYS without a register to an alias to make disassembling more consistant.
llvm-svn: 205898
2014-04-09 14:44:58 +00:00
Bradley Smith 2cef19a2e6 [ARM64] Correctly disassemble ISB operand as ISB not DBarrier.
llvm-svn: 205897
2014-04-09 14:44:54 +00:00
Bradley Smith 239120cada [ARM64] Properly support both apple and standard syntax for FMOV
llvm-svn: 205896
2014-04-09 14:44:49 +00:00
Bradley Smith a2308f47d3 [ARM64] Flag setting logical/add/sub immediate instructions don't use SP.
llvm-svn: 205895
2014-04-09 14:44:44 +00:00
Bradley Smith f280e91849 [ARM64] Conditional branches must always print their condition code, even AL.
llvm-svn: 205894
2014-04-09 14:44:39 +00:00
Bradley Smith a19b7e83dc [ARM64] Fix disassembly logic for extended loads/stores with 32-bit registers.
llvm-svn: 205893
2014-04-09 14:44:36 +00:00
Bradley Smith 70c6acbbfd [ARM64] Add missing shifted register MVN alias to ORN
llvm-svn: 205891
2014-04-09 14:44:26 +00:00
Bradley Smith 403bbf95c0 [ARM64] SXTW/UXTW are only valid aliases for 32-bit operations.
llvm-svn: 205890
2014-04-09 14:44:22 +00:00
Bradley Smith 779238a216 [ARM64] Fix canonicalisation of MOVs. MOV is too complex to be modelled by a dumb alias.
llvm-svn: 205889
2014-04-09 14:44:18 +00:00
Bradley Smith f823079acd [ARM64] Fixup ADR/ADRP parsing such that they accept immediates and all labels types
llvm-svn: 205888
2014-04-09 14:44:12 +00:00
Bradley Smith af2710c96f [ARM64] Ensure sp is decoded as SP, not XZR in LD1 instructions.
llvm-svn: 205887
2014-04-09 14:44:07 +00:00
Bradley Smith a0dce246ed [ARM64] Tighten up the special casing in emitting arithmetic extends. UXTW should only be translated when the instruction uses WSP, not SP. Vice versa for UXTX and 64-bit instructions.
llvm-svn: 205886
2014-04-09 14:44:03 +00:00
Bradley Smith 3971d3dc75 [ARM64] Rename LR to the UAL-compliant 'X30'.
llvm-svn: 205885
2014-04-09 14:43:59 +00:00
Bradley Smith 6f1aa59c31 [ARM64] Rename FP to the UAL-compliant 'X29'.
llvm-svn: 205884
2014-04-09 14:43:50 +00:00
Bradley Smith 5511f08055 [ARM64] Add a PostEncoderMethod to FCMP - the Rm field should canonically be zero but should be decoded/disassembled with any value.
llvm-svn: 205883
2014-04-09 14:43:40 +00:00
Bradley Smith eb4ca04db2 [ARM64] SCVTF and FCVTZS/U are undefined if scale<5> == 0.
llvm-svn: 205882
2014-04-09 14:43:35 +00:00
Bradley Smith db7b9b17eb [ARM64] EXT and EXTR instructions on v8i8 and W regs respectively must have the top bit of their immediate clear.
llvm-svn: 205881
2014-04-09 14:43:31 +00:00
Bradley Smith 7525b47208 [ARM64] UBFM/BFM is undefined on w registers when imms<5> or immr<5> is 1.
llvm-svn: 205879
2014-04-09 14:43:24 +00:00
Bradley Smith 0243aa33fa [ARM64] Floating point to fixed point scaled conversions are only available on fcvtzs and fcvtzu.
llvm-svn: 205878
2014-04-09 14:43:20 +00:00
Bradley Smith 8f906a3c5f [ARM64] Port over the PostEncoderMethod fix for SMULH/UMULH from AArch64.
llvm-svn: 205877
2014-04-09 14:43:15 +00:00
Bradley Smith 9f29b726d5 [ARM64] Add missing tlbi operands and error for extra/missing register on tlbi aliases.
llvm-svn: 205876
2014-04-09 14:43:11 +00:00
Bradley Smith e8b4166acc [ARM64] Rework system register parsing to overcome SPSel clash in MSR variants.
llvm-svn: 205875
2014-04-09 14:43:06 +00:00
Bradley Smith bc35b1f138 [ARM64] Port over the PostEncoderMethod from AArch64 for exclusive loads and stores, so the unused register fields are set to all-ones canonically but are recognised with any value.
llvm-svn: 205874
2014-04-09 14:43:01 +00:00
Bradley Smith 16478c4ccf [ARM64] Add WZR to isGPR32Register, since every use needs to check for this anyway.
llvm-svn: 205871
2014-04-09 14:42:49 +00:00
Bradley Smith fb90df563f [ARM64] Move CPSRField and DBarrier operands over to AArch64-style disassembly and assembly. This removes the last users of namespace ARM64SYS.
llvm-svn: 205869
2014-04-09 14:42:42 +00:00
Bradley Smith 08c391c156 [ARM64] Switch the decoder, disassembler, instprinter and asmparser over to using AArch64-style system registers, and fix up test failures discovered in the process.
llvm-svn: 205868
2014-04-09 14:42:36 +00:00
Bradley Smith 8c0b88c987 [ARM64] Shifted register ALU ops are reserved if sf=0 and imm6<5>=1, and also (for add/sub only) if shift=11.
llvm-svn: 205865
2014-04-09 14:42:11 +00:00
Bradley Smith 527bf86e56 [ARM64] Add support for NV condition code (exists only for valid assembly/disassembly, equivilant to AL)
llvm-svn: 205864
2014-04-09 14:42:07 +00:00
Bradley Smith 6d7af17a3f [ARM64] Add missing 1Q -> 1q vector kind alias
llvm-svn: 205863
2014-04-09 14:42:01 +00:00
Bradley Smith 7d253f29a4 [ARM64] Add parsing for vector lists such as {v0.8b-v3.8b}
llvm-svn: 205862
2014-04-09 14:41:58 +00:00
Bradley Smith 664aa67153 [ARM64] Correctly alias LSL to UXTW for 32bit instruction variants, rather than UXTX
llvm-svn: 205861
2014-04-09 14:41:53 +00:00
Bradley Smith 35cadc58c9 [ARM64] STRHro and STRBro were not being decoded at all.
llvm-svn: 205860
2014-04-09 14:41:49 +00:00
Bradley Smith 87c60e00d5 [ARM64] MOVK with sf=0 and hw<1>=1 is unallocated. Shift amount for ADD/SUB instructions is unallocated if shift > 4.
llvm-svn: 205859
2014-04-09 14:41:45 +00:00
Bradley Smith cd91e5cd0c [ARM64] Register-offset loads and stores with the 'option' field equal to 00x or 10x are undefined.
llvm-svn: 205858
2014-04-09 14:41:38 +00:00
Filipe Cabecinhas 2c4e8ae0fd Revert "YAMLIO: Encode ambiguous hex strings explicitly"
This reverts commit r205839.

It broke several tests in lld.

llvm-svn: 205857
2014-04-09 14:35:17 +00:00
Arnold Schwaighofer fd0bf5d6e5 SLPVectorizer: Only vectorize intrinsics whose operands are widened equally
The vectorizer only knows how to vectorize intrinics by widening all operands by
the same factor.

Patch by Tyler Nowicki!

llvm-svn: 205855
2014-04-09 14:20:47 +00:00
Elena Demikhovsky cf0b9bafc3 AVX-512: insert element to mask vector; store i1 data
Implemented INSERT_VECTOR_ELT operation for v16i1 and v8i1 vectors;
Implemented "store" for i1 type

llvm-svn: 205850
2014-04-09 12:37:50 +00:00
Daniel Sanders b282f1fec5 Re-commit: [mips] abs.[ds], and neg.[ds] should be allowed regardless of -enable-no-nans-fp-math
Summary:
They behave in accordance with the Has2008 and ABS2008 configuration bits of the processor which are used to select between the 1985 and 2008 versions of IEEE 754. In 1985 mode, these instructions are arithmetic (i.e. they raise invalid operation exceptions when given NaN), in 2008 mode they are non-arithmetic (i.e. they are copies).

nmadd.[ds], and nmsub.[ds] are still subject to -enable-no-nans-fp-math because the ISA spec does not explicitly state that they obey Has2008 and ABS2008.

Fixed the issue with the previous version of this patch (r205628). A pre-existing 'let Predicate =' statement was removing some predicates that were necessary for FP64 to behave correctly.

Reviewers: matheusalmeida

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3274

llvm-svn: 205844
2014-04-09 09:56:43 +00:00
David Majnemer 815433587c YAMLIO: Encode ambiguous hex strings explicitly
YAMLIO would turn a BinaryRef into the string 0000000004000000.
However, the leading zero causes parsers to interpret it as being an
octal number instead of a hexadecimal one.

Instead, escape such strings as needed.

llvm-svn: 205839
2014-04-09 07:56:27 +00:00
Matt Arsenault 2c33562cd6 R600/SI: Match not instruction.
llvm-svn: 205837
2014-04-09 07:16:16 +00:00
Tim Northover b36d428d27 ARM64: scalarize v1i64 mul operation
This is the second part of fixing PR19367.

llvm-svn: 205836
2014-04-09 07:07:02 +00:00
Tim Northover b430cf6681 ARM64: add pattern for <1 x i64> custom not node.
This should fix PR19367.

llvm-svn: 205835
2014-04-09 06:55:39 +00:00
David Majnemer a9bdb32f04 WinCOFF: Emit common symbols as specified in the COFF spec
Summary:
Local common symbols were properly inserted into the .bss section.
However, putting external common symbols in the .bss section would give
them a strong definition.

Instead, encode them as undefined, external symbols who's symbol value
is equivalent to their size.

Reviewers: Bigcheese, rafael, rnk

CC: llvm-commits

Differential Revision: http://reviews.llvm.org/D3324

llvm-svn: 205811
2014-04-08 22:33:40 +00:00
Sebastian Pop b5b84e0963 in findGCD of multiply expr return the gcd
we used to return 1 instead of the gcd

llvm-svn: 205800
2014-04-08 21:21:05 +00:00
Juergen Ributzka c11e8b67bb [Constant Hoisting][ARM64] Enable constant hoisting for ARM64.
This implements the target-hooks for ARM64 to enable constant hoisting.

This fixes <rdar://problem/14774662> and <rdar://problem/16381500>.

llvm-svn: 205791
2014-04-08 20:39:59 +00:00
Kevin Enderby d88fec3d3a Fix the ARM VLD3 (single 3-element structure to all lanes)
size 16 double-spaced registers instruction printing.

This:
	vld3.16 {d0[], d2[], d4[]}, [r4]!

was being printed as:

	vld3.16	{d0[], d1[], d2[]}, [r4]!

rdar://16531387

llvm-svn: 205779
2014-04-08 18:00:52 +00:00
Diego Novillo c6574c1aa3 Add -pass-remarks flag to 'opt'.
Summary:
This adds support in 'opt' to filter pass remarks emitted by
optimization passes. A new flag -pass-remarks specifies which
passes should emit a diagnostic when LLVMContext::emitOptimizationRemark
is invoked.

This will allow the front end to simply pass along the regular
expression from its own -Rpass flag when launching the backend.

Depends on D3227.

Reviewers: qcolombet

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D3291

llvm-svn: 205775
2014-04-08 16:42:38 +00:00
NAKAMURA Takumi 35289340de X86MCAsmInfoGNUCOFF: Set PointerSize as 8 for targeting x64. It caused DW_LNE_set_address was misemitted on x64.
FIXME: I haven't investigate whether CalleeSaveStackSlotSize should be 8.
llvm-svn: 205772
2014-04-08 15:28:50 +00:00
Tim Northover 33d07468bc ARM64: fix fmsub patterns which assumed accum operand was first
Confusingly, the NEON fmla instructions put the accumulator first but the
scalar versions put it at the end (like the fma lib function & LLVM's
intrinsic).

This should fix PR19345, assuming there's only one issue.

llvm-svn: 205758
2014-04-08 12:23:51 +00:00
Elena Demikhovsky 3dcfbdfa54 AVX-512: Added fp_to_uint and uint_to_fp patterns.
llvm-svn: 205754
2014-04-08 07:24:02 +00:00
David Majnemer a1c861d379 obj2yaml: Use the correct relocation type for different machine types
The IO normalizer would essentially lump I386 and AMD64 relocations
together.  Relocation types with the same numeric value would then get
mapped in appropriately.

For example:
IMAGE_REL_AMD64_ADDR64 and IMAGE_REL_I386_DIR16 both have a numeric
value of one.  We would see IMAGE_REL_I386_DIR16 in obj2yaml conversions
of object files with a machine type of IMAGE_FILE_MACHINE_AMD64.

llvm-svn: 205746
2014-04-07 23:12:20 +00:00
Reed Kotler 735da8e015 Reverting commit r205628 due to mips64 issues.
llvm-svn: 205741
2014-04-07 22:11:40 +00:00
Tom Stellard 204e61bbdf R600/SI: Handle INSERT_SUBREG in SIFixSGPRCopies
llvm-svn: 205732
2014-04-07 19:45:45 +00:00
Tom Stellard 50122a5890 R600: Match 24-bit arithmetic patterns in a Target DAGCombine
Moving these patterns from TableGen files to PerformDAGCombine()
should allow us to generate better code by eliminating unnecessary
shifts and extensions earlier.

This also fixes a bug where the MAD pattern was calling
SimplifyDemandedBits with a 24-bit mask on the first operand
even when the full pattern wasn't being matched.  This occasionally
resulted in some instructions being incorrectly deleted from the
program.

v2:
  - Fix bug with 64-bit mul

llvm-svn: 205731
2014-04-07 19:45:41 +00:00
Eric Christopher 5c8c5e5573 Revert the last couple of patches here and go back to something
that at least failed reliably.

llvm-svn: 205711
2014-04-07 13:36:26 +00:00
Eric Christopher beb2cd6b7c Handle vlas during inline cost computation if they'll be turned
into a constant size alloca by inlining.

Ran a run over the testsuite, no results out of the noise, fixes
the testcase in the PR.

PR19115.

llvm-svn: 205710
2014-04-07 13:36:21 +00:00
Eric Christopher edc8cbf9a5 XFAIL this completely at the moment:
cygwin has llvm-dwarfdump problems and isn't paying attention to the
specific xfail there.

s390x isn't matching for an unknown reason.

llvm-svn: 205708
2014-04-07 13:10:27 +00:00
Eric Christopher 57910ac671 Make test run on most platforms and only fail on cygwin/mingw while
it's being investigated for those.

llvm-svn: 205704
2014-04-07 12:32:12 +00:00