Commit Graph

22785 Commits

Author SHA1 Message Date
Tom Stellard 2937cbc005 R600/SI: Add a MUBUF store pattern for Imm offsets
llvm-svn: 200934
2014-02-06 18:36:39 +00:00
Tom Stellard 11624bc577 R600/SI: Add a MUBUF load pattern for Reg+Imm offsets
llvm-svn: 200933
2014-02-06 18:36:38 +00:00
Tom Stellard 044e418f15 R600/SI: Use immediates offsets for SMRD instructions whenever possible
There was a problem with the old pattern, so we were copying some
larger immediates into registers when we could have been encoding
them in the instruction.

llvm-svn: 200932
2014-02-06 18:36:34 +00:00
Tim Northover f0e21616f3 X86: add costs for 64-bit vector ext/trunc & rebalance
The most important part of this is probably adding any cost at all for
operations like zext <8 x i8> to <8 x i32>. Before they were being
recorded as extremely costly (24, I believe) which made LLVM fall back
on a 4-wide vectorisation of a loop.

It also rebalances the values for sext, zext and trunc. Lacking any
other sane metric that might work across CPU microarchitectures I went
for instructions. This seems to be in reasonable accord with the rest
of the table (sitofp, ...) though no doubt at least one value is
sub-optimal for some bizarre reason.

Finally, separate AVX and AVX2 values are provided where appropriate.
The CodeGen is quite different in many cases.

rdar://problem/15981990

llvm-svn: 200928
2014-02-06 18:18:36 +00:00
Nick Lewycky 993849490e A memcpy out of an fresh alloca is a no-op, delete it. Patch by Patrick Walton!
llvm-svn: 200907
2014-02-06 06:29:19 +00:00
Chandler Carruth bf71a34eb9 [PM] Add a new "lazy" call graph analysis pass for the new pass manager.
The primary motivation for this pass is to separate the call graph
analysis used by the new pass manager's CGSCC pass management from the
existing call graph analysis pass. That analysis pass is (somewhat
unfortunately) over-constrained by the existing CallGraphSCCPassManager
requirements. Those requirements make it *really* hard to cleanly layer
the needed functionality for the new pass manager on top of the existing
analysis.

However, there are also a bunch of things that the pass manager would
specifically benefit from doing differently from the existing call graph
analysis, and this new implementation tries to address several of them:

- Be lazy about scanning function definitions. The existing pass eagerly
  scans the entire module to build the initial graph. This new pass is
  significantly more lazy, and I plan to push this even further to
  maximize locality during CGSCC walks.
- Don't use a single synthetic node to partition functions with an
  indirect call from functions whose address is taken. This node creates
  a huge choke-point which would preclude good parallelization across
  the fanout of the SCC graph when we got to the point of looking at
  such changes to LLVM.
- Use a memory dense and lightweight representation of the call graph
  rather than value handles and tracking call instructions. This will
  require explicit update calls instead of some updates working
  transparently, but should end up being significantly more efficient.
  The explicit update calls ended up being needed in many cases for the
  existing call graph so we don't really lose anything.
- Doesn't explicitly model SCCs and thus doesn't provide an "identity"
  for an SCC which is stable across updates. This is essential for the
  new pass manager to work correctly.
- Only form the graph necessary for traversing all of the functions in
  an SCC friendly order. This is a much simpler graph structure and
  should be more memory dense. It does limit the ways in which it is
  appropriate to use this analysis. I wish I had a better name than
  "call graph". I've commented extensively this aspect.

This is still very much a WIP, in fact it is really just the initial
bits. But it is about the fourth version of the initial bits that I've
implemented with each of the others running into really frustrating
problms. This looks like it will actually work and I'd like to split the
actual complexity across commits for the sake of my reviewers. =] The
rest of the implementation along with lots of wiring will follow
somewhat more rapidly now that there is a good path forward.

Naturally, this doesn't impact any of the existing optimizer. This code
is specific to the new pass manager.

A bunch of thanks are deserved for the various folks that have helped
with the design of this, especially Nick Lewycky who actually sat with
me to go through the fundamentals of the final version here.

llvm-svn: 200903
2014-02-06 04:37:03 +00:00
Juergen Ributzka fa0eba6c8b [DAG] Don't pull the binary operation though the shift if the operands have opaque constants.
During DAGCombine visitShiftByConstant assumes that certain binary operations
with only constant operands can always be folded successfully. This is no longer
true when the constant is opaque. This commit fixes visitShiftByConstant by not
performing the optimization for opaque constants. Otherwise we would end up in
an infinite DAGCombine loop.

llvm-svn: 200900
2014-02-06 04:09:06 +00:00
Manman Ren d461244972 Set default of inlinecold-threshold to 225.
225 is the default value of inline-threshold. This change will make sure
we have the same inlining behavior as prior to r200886.

As Chandler points out, even though we don't have code in our testing
suite that uses cold attribute, there are larger applications that do
use cold attribute.

r200886 + this commit intend to keep the same behavior as prior to r200886.
We can later on tune the inlinecold-threshold.

The main purpose of r200886 is to help performance of instrumentation based
PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
For instrumentation based PGO, we try to increase inlining of hot functions and
reduce inlining of cold functions by setting inlinecold-threshold.

Another option suggested by Chandler is to use a boolean flag that controls
if we should use OptSizeThreshold for cold functions. The default value
of the boolean flag should not change the current behavior. But it gives us
less freedom in controlling inlining of cold functions.

llvm-svn: 200898
2014-02-06 01:59:22 +00:00
Kevin Enderby d6b107136a Update the X86 assembler for .intel_syntax to accept
the << and >> bitwise operators.

rdar://15975725

llvm-svn: 200896
2014-02-06 01:21:15 +00:00
Paul Robinson af4e64d095 Disable most IR-level transform passes on functions marked 'optnone'.
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.

llvm-svn: 200892
2014-02-06 00:07:05 +00:00
Manman Ren e8781b1a36 Inliner uses a smaller inline threshold for callees with cold attribute.
Added command line option inlinecold-threshold to set threshold for inlining
functions with cold attribute. Listen to the cold attribute when it would
decrease the inline threshold.

llvm-svn: 200886
2014-02-05 22:53:44 +00:00
Quentin Colombet 87769713cf [RegAlloc] Add a last chance recoloring mechanism when everything else failed to
find a register.

The idea is to choose a color for the variable that cannot be allocated and
recolor its interferences around. Unlike the current register allocation scheme,
it is allowed to change the color of an already assigned (but maybe not
splittable or spillable) live interval while propagating this change to its
neighbors.
In other word, there are two things that may help finding an available color:
- Already assigned variables (RS_Done) can be recolored to different color.
- The recoloring allows to catch solutions that needs to touch more that just
  the neighbors of the current allocated variable.

E.g.,
vA can use {R1, R2    }
vB can use {    R2, R3}
vC can use {R1        }
Where vA, vB, and vC cannot be split anymore (they are reloads for instance) and
they all interfere.

vA is assigned R1
vB is assigned R2
vC tries to evict vA but vA is already done.
=> Regular register allocation heuristic fails.

Last chance recoloring kicks in:
vC does as if vA was evicted => vC uses R1.
vC is marked as fixed.
vA needs to find a color.
None are available.
vA cannot evict vC: vC is a fixed virtual register now.
vA does as if vB was evicted => vA uses R2.
vB needs to find a color.
R3 is available.
Recoloring => vC = R1, vA = R2, vB = R3.

<rdar://problem/15947839>

llvm-svn: 200883
2014-02-05 22:13:59 +00:00
Rafael Espindola b4eec1daa1 Remove support for not using .loc directives.
Clang itself was not using this. The only way to access it was via llc.

llvm-svn: 200862
2014-02-05 18:00:21 +00:00
Petar Jovanovic 9725016af3 [mips] Add NaCl target and forbid indexed loads and stores for it
This patch adds NaCl target for Mips. It also forbids indexed loads and
stores if the target is NaCl.

Patch by Sasa Stankovic.

Differential Revision: http://llvm-reviews.chandlerc.com/D2690

llvm-svn: 200855
2014-02-05 17:19:30 +00:00
Petar Jovanovic f387387835 mips: XFAIL non-extern-addend-smallcodemodel test
Small code model (and default reloc model) set Reloc::PIC_ in this test,
and PIC is not yet supported in MCJIT for MIPS.

llvm-svn: 200852
2014-02-05 16:47:59 +00:00
Elena Demikhovsky 0b79be8ab2 AVX-512: optimized icmp -> sext -> icmp pattern
llvm-svn: 200849
2014-02-05 16:17:36 +00:00
Logan Chien d5c48aa3d3 ARM: Resolve thumb_bl fixup in same MCFragment.
In Thumb1 mode, bl instruction might be selected for branches between
basic blocks in the function if the offset is greater than 2KB.
However, this might cause SEGV because the destination symbol
is not marked as thumb function and the execution mode will be reset
to ARM mode.

Since we are sure that these symbols are in the same data fragment, we
can simply resolve these local symbols, and don't emit any relocation
information for this bl instruction.

llvm-svn: 200842
2014-02-05 14:15:16 +00:00
Elena Demikhovsky a38114c45e AVX-512: fixed a bug in EVEX encoding (the bug appeared after r200624)
llvm-svn: 200837
2014-02-05 13:03:01 +00:00
Michel Danzer 5d26fdfcba R600/SI: Add pattern for zero-extending i1 to i32
Fixes opencl-example if_* tests with radeonsi.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74469

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200830
2014-02-05 09:48:05 +00:00
Kai Nacke 382c140567 ARM: Enable use of relocation type tlsldo in debug info for tls data.
This fixes PR18554.

Reviewers: Renato Golin, Keith Walker
llvm-svn: 200826
2014-02-05 07:23:09 +00:00
Elena Demikhovsky a30e437659 AVX-512: Added intrinsic for cvtph2ps.
Added VPTESTNM instruction.
Added a pattern to vselect (lit tests will follow).

llvm-svn: 200823
2014-02-05 07:05:03 +00:00
Rafael Espindola 02eac9a270 Add a test for printing absolute symbols in ELF.
llvm-svn: 200818
2014-02-05 04:36:47 +00:00
Rafael Espindola f42c58d2ef Small fix for llvm-nm handling of weak symbols on ELF (print 'v').
llvm-svn: 200808
2014-02-04 23:53:15 +00:00
Manman Ren 81f136f94a Update testing case for r200806.
llvm-svn: 200807
2014-02-04 23:53:12 +00:00
Rafael Espindola fb66ef05f7 Add a test for common symbols in coff.
llvm-svn: 200803
2014-02-04 23:18:52 +00:00
Benjamin Kramer 34f460ed29 SimplifyLibCalls: Push TLI through the exp2->ldexp transform.
For the odd case of platforms with exp2 available but not ldexp.

llvm-svn: 200795
2014-02-04 20:27:23 +00:00
Petar Jovanovic a5da588b2f [mips] Implement %hi(sym1 - sym2) and %lo(sym1 - sym2) expressions
Patch implements %hi(sym1 - sym2) and %lo(sym1 - sym2) expressions for MIPS
by creating target expression class MipsMCExpr.

Patch by Sasa Stankovic.

Differential Revision: http://llvm-reviews.chandlerc.com/D2592

llvm-svn: 200783
2014-02-04 18:41:57 +00:00
David Peixotto b9b7362cdc Fix PR18345: ldr= pseudo instruction produces incorrect code when using in inline assembly
This patch fixes the ldr-pseudo implementation to work when used in
inline assembly.  The fix is to move arm assembler constant pools
from the ARMAsmParser class to the ARMTargetStreamer class.

Previously we kept the assembler generated constant pools in the
ARMAsmParser object. This does not work for inline assembly because
a new parser object is created for each blob of inline assembly.
This patch moves the constant pools to the ARMTargetStreamer class
so that the constant pool will remain alive for the entire code
generation process.

An ARMTargetStreamer class is now required for the arm backend.
There was no existing implementation for MachO, only Asm and ELF.
Instead of creating an empty MachO subclass, we decided to make the
ARMTargetStreamer a non-abstract class and provide default
(llvm_unreachable) implementations for the non constant-pool related
methods.

Differential Revision: http://llvm-reviews.chandlerc.com/D2638

llvm-svn: 200777
2014-02-04 17:22:40 +00:00
Tom Stellard 0ec134f3d6 R600/SI: Custom lower i64 ISD::SELECT
llvm-svn: 200774
2014-02-04 17:18:40 +00:00
Tom Stellard bfebd1fc7e R600: Enable vector fpow.
The OpenCL specs say: "The vector versions of the math functions operate
component-wise. The description is per-component."

Patch by: Jan Vesely

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 200773
2014-02-04 17:18:37 +00:00
Tim Northover 103e648d30 OS X: the correct function is __sincospif_stret, not __sincospi_stretf
rdar://problem/13729466

llvm-svn: 200771
2014-02-04 16:28:20 +00:00
Tim Northover fdbdb4b6d5 ARM & AArch64: merge NEON absolute compare intrinsics
There was an extremely confusing proliferation of LLVM intrinsics to implement
the vacge & vacgt instructions. This combines them all into two polymorphic
intrinsics, shared across both backends.

llvm-svn: 200768
2014-02-04 14:55:42 +00:00
Justin Bogner c6af350698 llvm-cov: Implement the preserve-paths flag
Until now, when a path in a gcno file included a directory, we would
emit our .gcov file in that directory, whereas gcov always emits the
file in the current directory. In doing so, this implements gcov's
strange name-mangling -p flag, which is needed to avoid clobbering
files when two with the same name exist in different directories.

The path mangling is a bit ugly and only handles unix-like paths, but
it's simple, and it doesn't make any guesses as to how it should
behave outside of what gcov documents. If we decide this should be
cross platform later, we can consider the compatibility implications
then.

llvm-svn: 200754
2014-02-04 10:45:02 +00:00
Tim Northover e42fb07618 ARM: fix fast-isel assertion failure
Missing braces on if meant we inserted both ARM and Thumb load for a litpool
entry. This didn't end well.

rdar://problem/15959157

llvm-svn: 200752
2014-02-04 10:38:46 +00:00
Michel Danzer 624b02aa67 R600/SI: Fix fneg for 0.0
V_ADD_F32 with source modifier does not produce -0.0 for this. Just
manipulate the sign bit directly instead.

Also add a pattern for (fneg (fabs ...)).

Fixes a bunch of bit encoding piglit tests with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200743
2014-02-04 07:12:38 +00:00
Justin Bogner f69557e670 llvm-cov: Implement the object-directory flag
llvm-svn: 200741
2014-02-04 06:41:43 +00:00
Justin Bogner 93d1edbb7d llvm-cov: Ignore missing .gcda files
When gcov is run without gcda data, it acts as if the counts are all
zero and labels the file as - to indicate that there was no data. We
should do the same.

llvm-svn: 200740
2014-02-04 06:41:39 +00:00
Justin Bogner b5bff69ac6 llvm-cov: Document the llvm-cov tests
llvm-svn: 200739
2014-02-04 06:41:33 +00:00
Kai Nacke ab7ee461c2 Revert: ARM: Enable use of relocation type tlsldo in debug info for tls data.
There seems to be a new problem with the debug info in the test case.
I'll have to investigate this.

llvm-svn: 200737
2014-02-04 06:07:00 +00:00
Kai Nacke a56bb78021 Add strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls
Add the missing transformation strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls
and remove the ToDo comment.

Reviewer: Duncan P.N. Exan Smith
llvm-svn: 200736
2014-02-04 05:55:16 +00:00
Kai Nacke 5e8c30f192 ARM: Enable use of relocation type tlsldo in debug info for tls data.
This fixes PR18554.

Reviewers: Renato Golin, Keith Walker
llvm-svn: 200735
2014-02-04 05:43:09 +00:00
David Blaikie 5e390e4df7 DebugInfo: Remove some unneeded conditionals now that DIBuilder no longer emits zero-length arrays as {i32 0}
A bunch of test cases needed to be cleaned up for this, many my fault -
when implementid imported modules I updated test cases by simply
duplicating the prior metadata field - which wasn't always the empty
metadata entry.

llvm-svn: 200731
2014-02-04 01:23:52 +00:00
Reid Kleckner d47a59a4f8 inalloca: Don't remove dead arguments in the presence of inalloca args
It disturbs the layout of the parameters in memory and registers,
leading to problems in the backend.

The plan for optimizing internal inalloca functions going forward is to
essentially SROA the argument memory and demote any captured arguments
(things that aren't trivially written by a load or store) to an indirect
pointer to a static alloca.

llvm-svn: 200717
2014-02-03 20:42:49 +00:00
Tim Northover 24979d8e10 AArch64 & ARM: refactor crypto intrinsics to take scalars
Some of the SHA instructions take a scalar i32 as one argument (largely because
they work on 160-bit hash fragments). This wasn't reflected in the IR
previously, with ARM and AArch64 choosing different types (<4 x i32> and <1 x
i32> respectively) which was ugly.

This makes all the affected intrinsics take a uniform "i32", allowing them to
become non-polymorphic at the same time.

llvm-svn: 200706
2014-02-03 17:27:49 +00:00
Hal Finkel 5c968d9440 Expand vector bswap in LegalizeVectorOps
ISD::BSWAP was missing from the list of node types that should be expanded
element-wise.

llvm-svn: 200705
2014-02-03 17:27:25 +00:00
Duncan P. N. Exon Smith 1ff08e389f Lower llvm.expect intrinsic correctly for i1
LowerExpectIntrinsic previously only understood the idiom of an expect
intrinsic followed by a comparison with zero. For llvm.expect.i1, the
comparison would be stripped by the early-cse pass.

Patch by Daniel Micay.

llvm-svn: 200664
2014-02-02 22:43:55 +00:00
Arnold Schwaighofer 17455633c7 LoopVectorizer: Enable unrolling of conditional stores and the load/store
unrolling heuristic per default

Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed
up some benchmarks while not causing any interesting regressions.

llvm-svn: 200621
2014-02-02 03:12:34 +00:00
Matt Arsenault 6e63dd27a2 Add some xfailed R600 tests for 64-bit private accesses.
llvm-svn: 200620
2014-02-02 00:13:12 +00:00
Matt Arsenault f5958dded4 R600/SI: Fix insertelement with dynamic indices.
This didn't work for any integer vectors, and didn't
work with some sizes of float vectors. This should now
work with all sizes of float and i32 vectors.

llvm-svn: 200619
2014-02-02 00:05:35 +00:00
Venkatraman Govindaraju 52b6473d74 [Sparc] Set %o7 as the return address register instead of %i7 in MCRegisterInfo. Also, add CFI instructions to initialize the frame correctly.
llvm-svn: 200617
2014-02-01 18:54:16 +00:00
Arnold Schwaighofer 445f7fb064 ARMTTI: We don't have 16 allocatable scalar registers
This caused an regression on libquantum after enabling the new loop vectorizer
unroll heuristics.

llvm-svn: 200616
2014-02-01 18:00:25 +00:00
David Woodhouse 6c9a6f9b3d MC: Fix .octa output for APInts with BitWidth > 128
llvm-svn: 200615
2014-02-01 16:52:33 +00:00
David Woodhouse d6de0d99c5 MC: Add support for .octa
This is a minimal implementation which accepts only constants rather than
full expressions, but that should be perfectly sufficient for all known
users for now.

Patch from PaX Team <pageexec@freemail.hu>

llvm-svn: 200614
2014-02-01 16:20:59 +00:00
Chandler Carruth 1665152cce [LPM] Apply a really big hammer to fix PR18688 by recursively reforming
LCSSA when we promote to SSA registers inside of LICM.

Currently, this is actually necessary. The promotion logic in LICM uses
SSAUpdater which doesn't understand how to place LCSSA PHI nodes.
Teaching it to do so would be a very significant undertaking. It may be
worthwhile and I've left a FIXME about this in the code as well as
starting a thread on llvmdev to try to figure out the right long-term
solution.

For now, the PR needs to be fixed. Short of using the promition
SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes,
I don't see a cleaner or cheaper way of achieving this. Fortunately,
LCSSA is relatively lazy and sparse -- it should only update
instructions which need it. We can also skip the recursive variant when
we don't promote to SSA values.

llvm-svn: 200612
2014-02-01 13:35:14 +00:00
Chandler Carruth 6b4cc8b66a [inliner] Skip debug intrinsics even earlier in computing the inline
cost so that they don't impact the vector bonus. Fundamentally, counting
unsimplified instructions is just *wrong*; it will continue to introduce
instability as things which do not generate code bizarrely impact
inlining. For example, sufficiently nested inlined functions could turn
off the vector bonus with lifetime markers just like the debug
intrinsics do. =/

This is a short-term tactical fix. Long term, I think we need to remove
the vector bonus entirely. That's a separate patch and discussion
though.

The patch to fix this provided by Dario Domizioli. I've added some
comments about the planned direction and used a heavily pruned form of
debug info intrinsics for the test case. While this debug info doesn't
work or "do" anything useful, it lets us easily test all manner of
interference easily, and I suspect this will not be the last time we
want to craft a pattern where debug info interferes with the inliner in
a problematic way.

llvm-svn: 200609
2014-02-01 10:38:17 +00:00
David Majnemer 5a67e2b1b8 Update a .fill test to use the updated semantics.
Something funny happened, this should've been part of r200606.

llvm-svn: 200607
2014-02-01 07:36:52 +00:00
David Majnemer 522d3db745 MC: Improve the .fill directive's compatibility with GAS
Per the GAS documentation, .fill should permit pattern widths that
aren't a power of two. While I was in the neighborhood, I added some
sanity checking. This change was motivated by a use of this construct
in the Linux Kernel.

llvm-svn: 200606
2014-02-01 07:19:38 +00:00
Reid Kleckner a04504fe97 Revert "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..."
This reverts commit r200576.  It broke 32-bit self-host builds by
vectorizing two calls to @llvm.bswap.i64, which we then fail to expand.

llvm-svn: 200602
2014-02-01 01:37:30 +00:00
Josh Magee 24c7f06333 [stackprotector] Implement the sspstrong rules for stack layout.
This changes the PrologueEpilogInserter and LocalStackSlotAllocation passes to
follow the extended stack layout rules for sspstrong and sspreq.

The sspstrong layout rules are:
 1. Large arrays and structures containing large arrays (>= ssp-buffer-size)
are closest to the stack protector.
 2. Small arrays and structures containing small arrays (< ssp-buffer-size) are
2nd closest to the protector.
 3. Variables that have had their address taken are 3rd closest to the
protector.


Differential Revision: http://llvm-reviews.chandlerc.com/D2546

llvm-svn: 200601
2014-02-01 01:36:16 +00:00
Reid Kleckner f5b76518c9 Implement inalloca codegen for x86 with the new inalloca design
Calls with inalloca are lowered by skipping all stores for arguments
passed in memory and the initial stack adjustment to allocate argument
memory.

Now the frontend is responsible for the memory layout, and the backend
doesn't have to do any work.  As a result these changes are pretty
minimal.

Reviewers: echristo

Differential Revision: http://llvm-reviews.chandlerc.com/D2637

llvm-svn: 200596
2014-01-31 23:50:57 +00:00
Reid Kleckner dfbed59cc2 Don't put non-static allocas in the static alloca map
Allocas marked inalloca are never static, but we were trying to put them
into the static alloca map if they were in the entry block.  Also add an
assertion in x86 fastisel.

llvm-svn: 200593
2014-01-31 23:45:12 +00:00
Lang Hames b0bd489e4a Split out small-code-model MCJIT testcase in order to xfail for AArch64, where
PC-rel relocations aren't yet fully implemented.

llvm-svn: 200592
2014-01-31 23:36:25 +00:00
Rafael Espindola 972e71ab5a Remove another hasRawTextSupport.
To remove this one simply move the end of file logic from the asm printer to
the target mc streamer.

This removes the last call to hasRawTextSupport from lib/Target.

llvm-svn: 200590
2014-01-31 23:10:26 +00:00
Reid Kleckner edb94c70c1 Set -mcpu to make this test pass on atom bots
llvm-svn: 200588
2014-01-31 22:58:10 +00:00
Rafael Espindola e9d1102545 Mark the first dynamic elf symbol as SF_FormatSpecific.
llvm-svn: 200578
2014-01-31 21:40:13 +00:00
Lang Hames 5ec150c967 Replace X86 FMA intrinsic pseduo-instructions with def pats.
It looks like these pseudos were only used for pattern matching. Def pats are
the appropriate way to do that. As a bonus, these intrinsics will now have
memory operands folded properly, and better FMA3 variants selected where
appropriate (see r199933).

<rdar://problem/15611947>

llvm-svn: 200577
2014-01-31 21:29:19 +00:00
Chandler Carruth b3da389e30 [SLPV] Recognize vectorizable intrinsics during SLP vectorization and
transform accordingly. Based on similar code from Loop vectorization.
Subsequent commits will include vectorization of function calls to
vector intrinsics and form function calls to vector library calls.

Patch by Raul Silvera! (Much delayed due to my not running dcommit)

llvm-svn: 200576
2014-01-31 21:14:40 +00:00
David Blaikie 322d79b4a2 DebugInfo: Flag type unit references as declarations
This ensures DWARF consumers don't confuse these references for
definitions. I'd argue it might be nice to improve debuggers so we don't
need this, but it's just one field in an abbreviation anyway - so it
doesn't seem worth the fight.

llvm-svn: 200569
2014-01-31 19:52:26 +00:00
Reid Kleckner 1c843228f8 [ms-cxxabi] Add a new calling convention that swaps 'this' and 'sret'
MSVC always places the 'this' parameter for a method first.  The
implicit 'sret' pointer for methods always comes second.  We already
implement this for __thiscall by putting sret parameters on the stack,
but __cdecl methods require putting both parameters on the stack in
opposite order.

Using a special calling convention allows frontends to keep the sret
parameter first, which avoids breaking lots of assumptions in LLVM and
Clang.

Fixes PR15768 with the corresponding change in Clang.

Reviewers: ributzka, majnemer

Differential Revision: http://llvm-reviews.chandlerc.com/D2663

llvm-svn: 200561
2014-01-31 17:41:22 +00:00
Matheus Almeida 1ace1f1236 [mips][msa] Add insert.d instruction.
This instruction is only available on Mips64 cores that implement the MSA ASE.

llvm-svn: 200543
2014-01-31 13:31:20 +00:00
Matheus Almeida 8114cf70aa Update FileCheck prefixes in preparation for the addition of Mips64 MSA tests.
No functional changes.

llvm-svn: 200541
2014-01-31 13:05:56 +00:00
Chandler Carruth c12224cb93 [vectorizer] Tweak the way we do small loop runtime unrolling in the
loop vectorizer to not do so when runtime pointer checks are needed and
share code with the new (not yet enabled) load/store saturation runtime
unrolling. Also ensure that we only consider the runtime checks when the
loop hasn't already been vectorized. If it has, the runtime check cost
has already been paid.

I've fleshed out a test case to cover the scalar unrolling as well as
the vector unrolling and comment clearly why we are or aren't following
the pattern.

llvm-svn: 200530
2014-01-31 10:51:08 +00:00
Craig Topper 327f13b43b Move address override handling in X86MCCodeEmitter to a place where it works for VEX encoded instructions too. This allows 32-bit addressing to work in 64-bit mode.
llvm-svn: 200516
2014-01-31 05:33:45 +00:00
Manman Ren 413a6cb42b This patch teaches the DAGCombiner how to fold insert_subvector nodes
when the input is a concat_vectors and the insert replaces one of the
concat halves:

Lower half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors Z, Y)
Upper half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors X, Z)

This can be seen with the following IR:

define <8 x float> @lower_half(<4 x float> %v1, <4 x float> %v2, <4 x
float> %v3) {
  %1 = shufflevector <4 x float> %v1, <4 x float> %v2, <8 x i32> <i32
0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
  %2 = tail call <8 x float> @llvm.x86.avx.vinsertf128.ps.256(<8 x
float> %1, <4 x float> %v3, i8 0)

The vinsertf128 intrinsic is converted into an insert_subvector node
in SelectionDAGBuilder.cpp.

Using AVX, without the patch this generates two vinsertf128 instructions:

vinsertf128 $1, %xmm1, %ymm0, %ymm0
vinsertf128 $0, %xmm2, %ymm0, %ymm0

With the patch this is optimized into:

vinsertf128 $1, %xmm1, %ymm2, %ymm0

Patch by Robert Lougher.

llvm-svn: 200506
2014-01-31 01:10:35 +00:00
Manman Ren 4ece7452ba PGO branch weight: update edge weights in SelectionDAGBuilder.
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.

The previous attempt at r200431 was reverted at r200434 because of
two testing case failures. I modified my patch a little, but forgot
to re-run "make check-all".

Testing case CodeGen/ARM/lsr-unfolded-offset.ll is updated because of
the patch's impact on branch probability which causes changes in
spill placement.

llvm-svn: 200502
2014-01-31 00:42:44 +00:00
Matt Arsenault ee364ee729 Allow speculating llvm.sqrt, fma and fmuladd
This doesn't set errno, so this should be OK.
Also update the documentation to explicitly state
that errno are not set.

llvm-svn: 200501
2014-01-31 00:09:00 +00:00
Timur Iskhodzhanov f33d8b979b Add a link to a bug to a couple of FIXMEs
llvm-svn: 200500
2014-01-30 23:14:38 +00:00
David Woodhouse 0b6c94909e [x86] Fix signed relocations for i64i32imm operands
These should end up (in ELF) as R_X86_64_32S relocs, not R_X86_64_32.
Kill the horrid and incomplete special case and FIXME in
EncodeInstruction() and set things up so it can infer the signedness
from the ImmType just like it can the size and whether it's PC-relative.

llvm-svn: 200495
2014-01-30 22:20:41 +00:00
Chad Rosier fe5ab2f5ba [AArch64] Custom lower concat_vector patterns with v4i16, v4i32, v8i8, v8i16, v16i8 types.
llvm-svn: 200491
2014-01-30 21:46:54 +00:00
Timur Iskhodzhanov 3e4ac4eeca Fix PR18381 - print a minimal diagnostic rather than assert on unresolved .secidx target
llvm-svn: 200490
2014-01-30 21:13:05 +00:00
Rafael Espindola 196666cf5f Only ELF has a dynamic symbol table. Remove it from ObjectFile.
COFF has only one symbol table.
MachO has a LC_DYSYMTAB, but that is not a symbol table, just extra info about
the one symbol table (LC_SYMTAB).
IR (coming soon) also has only one table.

llvm-svn: 200488
2014-01-30 20:45:33 +00:00
Rafael Espindola 41186dd288 This has been fixed.
llvm-svn: 200487
2014-01-30 20:29:25 +00:00
Juergen Ributzka fb4d648295 [Stackmaps] Record the stack size of each function that contains a stackmap/patchpoint intrinsic.
Re-applying the patch, but this time without using AsmPrinter methods.

Reviewed by Andy

llvm-svn: 200481
2014-01-30 18:58:27 +00:00
Timur Iskhodzhanov 4a83bf168f Explicitly specify the CPU to avoid Atom-specific assembly mismatch
llvm-svn: 200473
2014-01-30 17:53:45 +00:00
Evgeniy Stepanov 02bc78b9cd Reenable ARM EHABI on Android.
Broken in r200388.

llvm-svn: 200466
2014-01-30 14:18:25 +00:00
Jakob Stoklund Olesen ef1d59a175 Implement SPARCv9 atomic_swap_64 with a pseudo.
The SWAP instruction only exists in a 32-bit variant, but the 64-bit
atomic swap can be implemented in terms of CASX, like the other atomic
rmw primitives.

llvm-svn: 200453
2014-01-30 04:48:46 +00:00
Saleem Abdulrasool 4c4789be5e ARM IAS: support .object_arch
The .object_arch directive indicates an alternative architecture to be specified
in the object file.  The directive does *not* effect the enabled feature bits
for the object file generation.  This is particularly useful when the code
performs runtime detection and would like to indicate a lower architecture as
the requirements than the actual instructions used.

llvm-svn: 200451
2014-01-30 04:46:41 +00:00
Saleem Abdulrasool 15d16d809b tools: add support for decoding ARM attributes
Enhance the ARM specific parsing support in llvm-readobj to support attributes.
This allows for simpler tests to validate encoding of the build attributes as
specified in the ARM ELF specification.

llvm-svn: 200450
2014-01-30 04:46:33 +00:00
Saleem Abdulrasool 5d962d31f1 ARM IAS: support .movsp
.movsp is an ARM unwinding directive that indicates to the unwinder that a
register contains an offset from the current stack pointer.  If the offset is
unspecified, it defaults to zero.

llvm-svn: 200449
2014-01-30 04:46:24 +00:00
Saleem Abdulrasool 56e06e8640 ARM: suuport .tlsdescseq directive
This enhances the ARMAsmParser to handle .tlsdescseq directives.  This is a
slightly special relocation.  We must be able to generate them, but not consume
them in assembly.  The relocation is meant to assist the linker in generating a
TLS descriptor sequence.  The ELF target streamer is enhanced to append
additional fixups into the current segment and that is used to emit the new
R_ARM_TLS_DESCSEQ relocations.

llvm-svn: 200448
2014-01-30 04:02:47 +00:00
Saleem Abdulrasool a3f12bdeec ARM: support TLS descriptor relocations
Add support for tlsdesc relocations which are part of the ABI, marked as
experimental.  These relocations permit the linker to perform TLS reference
optimizations.

llvm-svn: 200447
2014-01-30 04:02:38 +00:00
Saleem Abdulrasool 6e00ca887e ARM: support tlscall relocations
This adds support for TLS CALL relocations.  TLS CALL relocations are used to
indicate to the linker to generate appropriate entries to resolve TLS references
via an appropriate function invocation (e.g. __tls_get_addr(PLT)).

In order to accomodate the linker relaxation of the TLS access model for the
references (GD/LD -> IE, IE -> LE), the relocation addend must be incomplete.
This requires that the partial inplace value is also incomplete (i.e. 0).  We
simply avoid the offset value calculation at the time of the fixup adjustment in
the ARM assembler backend.

llvm-svn: 200446
2014-01-30 04:02:31 +00:00
Juergen Ributzka f6f0ce903e Revert "[Stackmaps] Record the stack size of each function that contains a stackmap/patchpoint intrinsic."
This reverts commit r200444 to unbreak buildbots.

llvm-svn: 200445
2014-01-30 03:34:02 +00:00
Juergen Ributzka aece7583a7 [Stackmaps] Record the stack size of each function that contains a stackmap/patchpoint intrinsic.
Reviewed by Andy

llvm-svn: 200444
2014-01-30 03:06:14 +00:00
Timur Iskhodzhanov f166f6c8d0 Reland r200340 - 'Add line table debug info to COFF files when using a win32 triple'
This incorporates a couple of fixes reviewed at http://llvm-reviews.chandlerc.com/D2651

llvm-svn: 200440
2014-01-30 01:39:17 +00:00
Manman Ren 7407e0e31c Revert r200431 due to bot failures.
llvm-svn: 200434
2014-01-30 00:53:27 +00:00
Rafael Espindola 9fa52155a9 Fix TLS handling in ELF's getAddress and llvm-nm to print 'D' for it.
llvm-svn: 200433
2014-01-30 00:42:30 +00:00
Manman Ren 104e0c80cc PGO branch weight: update edge weights in SelectionDAGBuilder.
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.

llvm-svn: 200431
2014-01-30 00:24:37 +00:00
Manman Ren b681918ddd PGO branch weight: update edge weights in IfConverter.
This commit only handles IfConvertTriangle. To update edge weights
of a successor, one interface is added to MachineBasicBlock:
/// Set successor weight of a given iterator.
setSuccWeight(succ_iterator I, uint32_t weight)

An existing testing case test/CodeGen/Thumb2/v8_IT_5.ll is updated,
since we now correctly update the edge weights, the cold block
is placed at the end of the function and we jump to the cold block.

llvm-svn: 200428
2014-01-29 23:18:47 +00:00
Eric Christopher 8873adaa60 If we use DW_AT_ranges we need to specify a base address that ranges
are relative to in the compile unit. Currently let's just use 0...

Thanks to Greg Clayton for the catch!

llvm-svn: 200425
2014-01-29 22:22:56 +00:00
Eric Christopher fb8dd0085e Turn on CU ranges if we've got multiple compile units in the same
module since there's no range guarantee that we could make given
output order. This also fixes up the testcases that have multiple
CUs to have the correct range offset.

llvm-svn: 200422
2014-01-29 22:06:27 +00:00
Justin Bogner a99a3902a5 llvm-cov: Expect a source file as a positional parameter
Currently, llvm-cov isn't command-line compatible with gcov, which
accepts a source file name as its first parameter and infers the gcno
and gcda file names from that. This change keeps our -gcda and -gcno
options available for convenience in overriding this behaviour, but
adds the required parameter and inference behaviour as a compatible
default.

llvm-svn: 200417
2014-01-29 21:31:34 +00:00
David Majnemer 91fc4c2169 MC: Better management of macro arguments
The linux kernel makes uses of a GAS `feature' which substitutes nothing
for macro arguments which aren't specified.

Proper support for these kind of macro arguments necessitated a cleanup of
differences between `GAS' and `Darwin' dialect macro processing.

Differential Revision: http://llvm-reviews.chandlerc.com/D2634

llvm-svn: 200409
2014-01-29 18:57:46 +00:00
Arnold Schwaighofer 85a26704e9 LoopVectorizer: Add a test case for unrolling of small loops that need a runtime
check.

llvm-svn: 200408
2014-01-29 18:55:44 +00:00
Lang Hames 2cbbdf41c0 Add support for PC-relative non-extern relocations to RuntimeDyldMachO.
Also replaces testcase for r180790 (support for absolute non-externs relocs)
with a more robust version.

<rdar://problem/15864721>

llvm-svn: 200404
2014-01-29 18:31:35 +00:00
Matheus Almeida ec079d9e1d [mips][msa] Add fill.d instruction.
This instruction is only available on Mips64 cores
that implement the MSA ASE.

llvm-svn: 200400
2014-01-29 15:12:02 +00:00
Matheus Almeida 4cb577c614 [mips][msa] CHECK-DAG-ize MSA 2r_vector_scalar.ll test.
This update is a preparation for the addition of Mips64 MSA tests.

No functional changes.

llvm-svn: 200399
2014-01-29 14:32:03 +00:00
Matheus Almeida 74070327b2 [mips][msa] Add copy_{u,s}.d.
These instructions are only available on Mips64 cores
that implement the MSA ASE.

llvm-svn: 200398
2014-01-29 14:05:28 +00:00
Matheus Almeida a64f0600f3 [mips][msa] CHECK-DAG-ize MSA elm_copy.ll test.
This update is a preparation for the addition of Mips64 MSA tests.

No functional changes.

llvm-svn: 200395
2014-01-29 13:51:34 +00:00
Chandler Carruth d4be9dc02d [LPM] Fix PR18643, another scary place where loop transforms failed to
preserve loop simplify of enclosing loops.

The problem here starts with LoopRotation which ends up cloning code out
of the latch into the new preheader it is buidling. This can create
a new edge from the preheader into the exit block of the loop which
breaks LoopSimplify form. The code tries to fix this by splitting the
critical edge between the latch and the exit block to get a new exit
block that only the latch dominates. This sadly isn't sufficient.

The exit block may be an exit block for multiple nested loops. When we
clone an edge from the latch of the inner loop to the new preheader
being built in the outer loop, we create an exiting edge from the outer
loop to this exit block. Despite breaking the LoopSimplify form for the
inner loop, this is fine for the outer loop. However, when we split the
edge from the inner loop to the exit block, we create a new block which
is in neither the inner nor outer loop as the new exit block. This is
a predecessor to the old exit block, and so the split itself takes the
outer loop out of LoopSimplify form. We need to split every edge
entering the exit block from inside a loop nested more deeply than the
exit block in order to preserve all of the loop simplify constraints.

Once we try to do that, a problem with splitting critical edges
surfaces. Previously, we tried a very brute force to update LoopSimplify
form by re-computing it for all exit blocks. We don't need to do this,
and doing this much will sometimes but not always overlap with the
LoopRotate bug fix. Instead, the code needs to specifically handle the
cases which can start to violate LoopSimplify -- they aren't that
common. We need to see if the destination of the split edge was a loop
exit block in simplified form for the loop of the source of the edge.
For this to be true, all the predecessors need to be in the exact same
loop as the source of the edge being split. If the dest block was
originally in this form, we have to split all of the deges back into
this loop to recover it. The old mechanism of doing this was
conservatively correct because at least *one* of the exiting blocks it
rewrote was the DestBB and so the DestBB's predecessors were fixed. But
this is a much more targeted way of doing it. Making it targeted is
important, because ballooning the set of edges touched prevents
LoopRotate from being able to split edges *it* needs to split to
preserve loop simplify in a coherent way -- the critical edge splitting
would sometimes find the other edges in need of splitting but not
others.

Many, *many* thanks for help from Nick reducing these test cases
mightily. And helping lots with the analysis here as this one was quite
tricky to track down.

llvm-svn: 200393
2014-01-29 13:16:53 +00:00
Renato Golin 8cea6e8fc6 Enable EHABI by default
After all hard work to implement the EHABI and with the test-suite
passing, it's time to turn it on by default and allow users to
disable it as a work-around while we fix the eventual bugs that show
up.

This commit also remove the -arm-enable-ehabi-descriptors, since we
want the tables to be printed every time the EHABI is turned on
for non-Darwin ARM targets.

Although MCJIT EHABI is not working yet (needs linking with the right
libraries), this commit also fixes some relocations on MCJIT regarding
the EH tables/lib calls, and update some tests to avoid using EH tables
when none are needed.

The EH tests in the test-suite that were previously disabled on ARM
now pass with these changes, so a follow-up commit on the test-suite
will re-enable them.

llvm-svn: 200388
2014-01-29 11:50:56 +00:00
David Majnemer ad5c1734e2 MC: Reorganize macro MC test along dialect lines
This commit seeks to do two things:
 - Run the surfeit of tests under the Darwin dialect.  This ends up
   affecting tests which assumed that spaces could deliminate arguments.
 - The GAS dialect tests should limit their surface area to things that
   could plausibly work under GAS. For example, Darwin style arguments
   have no business being in such a test.

llvm-svn: 200383
2014-01-29 09:18:43 +00:00
Kostya Serebryany 6e15ec1442 [asan] simplify a test
llvm-svn: 200378
2014-01-29 07:35:43 +00:00
Venkatraman Govindaraju 141d0e2221 [Sparc] Use %r_disp32 for pc_rel entries in FDE as well.
This makes MCAsmInfo::getExprForFDESymbol() a virtual function and overrides it in SparcMCAsmInfo.

llvm-svn: 200376
2014-01-29 06:59:20 +00:00
NAKAMURA Takumi b366f01f83 Revert r200340, "Add line table debug info to COFF files when using a win32 triple."
It was incompatible with --target=i686-win32.

llvm-svn: 200375
2014-01-29 06:05:38 +00:00
Venkatraman Govindaraju fd5c1f9497 [Sparc] Use %r_disp32 for pc_rel entries in gcc_except_table and eh_frame.
Otherwise, assembler (gas) fails to assemble them with error message "operation
combines symbols in different segments". This is because MC computes
pc_rel entries with subtract expression between labels from different sections.

llvm-svn: 200373
2014-01-29 04:51:35 +00:00
Chandler Carruth 66f0b16360 [LPM] Fix PR18642, a pretty nasty bug in IndVars that "never mattered"
because of the inside-out run of LoopSimplify in the LoopPassManager and
the fact that LoopSimplify couldn't be "preserved" across two
independent LoopPassManagers.

Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI
node because it thought it was rewriting (via SCEV) the incoming value
to a loop invariant value. While it may well be invariant for the
current loop, it may be rewritten in terms of an enclosing loop's
values. This in and of itself is fine, as the LCSSA PHI node in the
enclosing loop for the inner loop value we're rewriting will have its
own LCSSA PHI node if used outside of the enclosing loop. With me so
far?

Well, the current loop and the enclosing loop may share an exiting
block and exit block, and when they do they also share LCSSA PHI nodes.
In this case, its not valid to RAUW through the LCSSA PHI node.

Expected crazy test included.

llvm-svn: 200372
2014-01-29 04:40:19 +00:00
Rafael Espindola 7a578026ae We do use pipefail these days. Update the test.
llvm-svn: 200370
2014-01-29 04:08:05 +00:00
Venkatraman Govindaraju 50f32d949b [SparcV9] Use correct register class (I64RegClass) to hold the address of _GLOBAL_OFFSET_TABLE_ in sparcv9.
llvm-svn: 200368
2014-01-29 03:35:08 +00:00
Rafael Espindola 310f501ef0 Use a raw_stream to implement the mangler.
This is a bit more convenient for some callers, but more importantly, it is
easier to implement correctly. Doing this removes the patching of already
printed data that was used for fastcall, fixing a crash with private fastcall
symbols.

llvm-svn: 200367
2014-01-29 02:30:38 +00:00
Kevin Qin 92d64d2d56 [AArch64 NEON] Lower SELECT_CC with vector operand.
When the scalar compare is between floating point and operands are
vector, we custom lower SELECT_CC to use NEON SIMD compare for
generating less instructions.

llvm-svn: 200365
2014-01-29 01:57:30 +00:00
David Woodhouse 21bfc71752 [ARM] Remove superfluous inline asm mode switch test
llvm-svn: 200361
2014-01-29 00:49:28 +00:00
David Woodhouse 7db3705f9e Tests for mode switching
1. test that inlineasm works
2. test that relaxable instructions are re-encoded in the correct mode.

llvm-svn: 200351
2014-01-28 23:13:30 +00:00
Timur Iskhodzhanov 7523743075 Disable the COFF tests on non-X86 archs
llvm-svn: 200341
2014-01-28 21:47:33 +00:00
Timur Iskhodzhanov 2c659648b3 Add line table debug info to COFF files when using a win32 triple.
Reviewed at http://llvm-reviews.chandlerc.com/D2232

llvm-svn: 200340
2014-01-28 21:33:27 +00:00
Matheus Almeida 2e03f24301 [mips] Fix ELF header flags.
As opposed to GCC/GAS the default ABI for Mips64 is n64.
Compatibility bit should be set if o32 ABI is used when targeting Mips64.

llvm-svn: 200332
2014-01-28 19:24:11 +00:00
Gautam Chakrabarti 2c283400f9 [NVPTX] Fix emitting aggregate parameters
The code was missing the case for aggregate parameters and
hence was emitting them as .b0 type. Also fixed a couple
of comments.

llvm-svn: 200325
2014-01-28 18:35:29 +00:00
Andrea Di Biagio 2ea61f17ad [X86] Add extra rules for combining vselect dag nodes into movsd.
This improves the fix committed at revision 199683 adding the
following new target specific combine rules:

1) fold (v4i32: vselect <0,0,-1,-1>, A, B) ->
        (v4i32 (bitcast (movsd (v2i64 (bitcast A)), (v2i64 (bitcast B))) ))

2) fold (v4f32: vselect <0,0,-1,-1>, A, B) ->
        (v4f32 (bitcast (movsd (v2f64 (bitcast A)), (v2f64 (bitcast B))) ))

3) fold (v4i32: vselect <-1,-1,0,0>, A, B) ->
        (v4i32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))

4) fold (v4f32: vselect <-1,-1,0,0>, A, B) ->
        (v4f32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))

llvm-svn: 200324
2014-01-28 18:14:21 +00:00
Rafael Espindola ab73c493ea Fix pr14893.
When simplifycfg moves an instruction, it must drop metadata it doesn't know
is still valid with the preconditions changes. In particular, it must drop
the range and tbaa metadata.

The patch implements this with an utility function to drop all metadata not
in a white list.

llvm-svn: 200322
2014-01-28 16:56:46 +00:00
Andrea Di Biagio b6d39afbda [DAGCombiner] Avoid introducing an illegal build_vector when folding a sign_extend.
Make sure that we don't introduce illegal build_vector dag nodes
when trying to fold a sign_extend of a build_vector.

This fixes a regression introduced by r200234.
Added test CodeGen/X86/fold-vector-sext-crash.ll
to verify that llc no longer crashes with an assertion failure
due to an illegal build_vector of type MVT::v4i64.

Thanks to Ilia Filippov for spotting this regression and for
providing a reproducible test case.

llvm-svn: 200313
2014-01-28 12:53:56 +00:00
Chandler Carruth b783628560 [vectorizer] Completely disable the block frequency guidance of the loop
vectorizer, placing it behind an off-by-default flag.

It turns out that block frequency isn't what we want at all, here or
elsewhere. This has been I think a nagging feeling for several of us
working with it, but Arnold has given some really nice simple examples
where the results are so comprehensively wrong that they aren't useful.

I'm planning to email the dev list with a summary of why its not really
useful and a couple of ideas about how to better structure these types
of heuristics.

llvm-svn: 200294
2014-01-28 09:10:41 +00:00
Hal Finkel 4e703bcecd Handle spilling the PPC GPRC_NOR0 register class
GPRC_NOR0 is not a subclass of GPRC (because it also contains the ZERO pseudo
register). As a result, we also need to check for it in the spilling code.

llvm-svn: 200288
2014-01-28 05:32:58 +00:00
Michel Danzer bf1a641060 R600/SI: Add pattern for truncating i32 to i1
Fixes half a dozen piglit tests with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200283
2014-01-28 03:01:16 +00:00
Jakob Stoklund Olesen 83c677353b Fix the DWARF EH encodings for Sparc PIC code.
Also emit the stubs that were generated for references to typeinfo
symbols.

llvm-svn: 200282
2014-01-28 02:52:26 +00:00
Reid Kleckner 26af2cae05 Update optimization passes to handle inalloca arguments
Summary:
I searched Transforms/ and Analysis/ for 'ByVal' and updated those call
sites to check for inalloca if appropriate.

I added tests for any change that would allow an optimization to fire on
inalloca.

Reviewers: nlewycky

Differential Revision: http://llvm-reviews.chandlerc.com/D2449

llvm-svn: 200281
2014-01-28 02:38:36 +00:00
Arnold Schwaighofer 18865db3c1 LoopVectorize: Support conditional stores by scalarizing
The vectorizer takes a loop like this and widens all instructions except for the
store. The stores are scalarized/unrolled and hidden behind an "if" block.

  for (i = 0; i < 128; ++i) {
    if (a[i] < 10)
      a[i] += val;
  }

  for (i = 0; i < 128; i+=2) {
    v = a[i:i+1];
    v0 = (extract v, 0) + 10;
    v1 = (extract v, 1) + 10;
    if (v0 < 10)
      a[i] = v0;
    if (v1 < 10)
      a[i] = v1;
  }

The vectorizer relies on subsequent optimizations to sink instructions into the
conditional block where they are anticipated.

The flag "vectorize-num-stores-pred" controls whether and how many stores to
handle this way. Vectorization of conditional stores is disabled per default for
now.

This patch also adds a change to the heuristic when the flag
"enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small
loops until load/store ports are saturated. This heuristic uses TTI's
getMaxUnrollFactor as a measure for load/store ports.

I also added a second flag -enable-cond-stores-vec. It will enable vectorization
of conditional stores. But there is no cost model for vectorization of
conditional stores in place yet so this will not do good at the moment.

rdar://15892953

Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll
-vectorize-num-stores-pred=1 (before the BFI change):

 Performance Regressions:
   Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower)
   Applications/siod/siod         2.18%
 Performance improvements:
   mesa                          -4.42%
   libquantum                    -4.15%

 With a patch that slightly changes the register heuristics (by subtracting the
 induction variable on both sides of the register pressure equation, as the
 induction variable is probably not really unrolled):

 Performance Regressions:
   Benchmarks/Ptrdist/yacr2/yacr2  7.73%
   Applications/siod/siod          1.97%

 Performance Improvements:
   libquantum                    -13.05% (we now also unroll quantum_toffoli)
   mesa                           -4.27%

llvm-svn: 200270
2014-01-28 01:01:53 +00:00
Manman Ren f1cb16e481 PGO branch weight: keep halving the weights until they can fit into
uint32.

When folding branches to common destination, the updated branch weights
can exceed uint32 by more than factor of 2. We should keep halving the
weights until they can fit into uint32.

llvm-svn: 200262
2014-01-27 23:39:03 +00:00
Mark Seaborn ba86cf51c9 ARM MC: Fix the initial DWARF CFI unwind info at the start of a function
This brings MC into line with GNU 'as' on ARM, and it brings the ARM
target into line with most other LLVM targets, which declare the
initial CFI state with addInitialFrameState().

Without this, functions generated with .cfi_startproc/endproc on ARM
will tend to cause GDB to abort with:
  gdb/dwarf2-frame.c:1132: internal-error: Unknown CFA rule.

I've also tested this by comparing the output of "readelf -w" on the
object files produced by llvm-mc and gas when given the .s file added
here.

This change is part of addressing PR18636.

Differential Revision: http://llvm-reviews.chandlerc.com/D2597

llvm-svn: 200255
2014-01-27 22:38:14 +00:00
David Peixotto b76f55f74a Fix unsupported addressing mode assertion for pld
Summary:
This commit gives an address mode to the PLD instruction. We
were getting an assertion failure in the frame lowering code
because we had code that was doing a pld of a stack allocated
address. The frame lowering was checking the address mode and
then asserting because pld had none defined.

This commit fixes pld for arm mode. There was a previous fix for
thumb mode in a separate commit. The commit for thumb mode
added a test in a separate file because it would otherwise fail
for arm. This commit moves the thumb test back into the prefetch.ll
file and adds the corresponding arm test.

Differential Revision: http://llvm-reviews.chandlerc.com/D2622

llvm-svn: 200248
2014-01-27 21:39:04 +00:00
Andrea Di Biagio f09a357765 [DAGCombiner] Teach how to fold sext/aext/zext of constant build vectors.
This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when
the operand in input is a build vector of constants (or UNDEFs).

The inability to fold a sext/zext of a constant build_vector was the root
cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support.

Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a
ConstantSDNode.

llvm-svn: 200234
2014-01-27 18:45:30 +00:00
David Majnemer e035cf9ce4 MC: Add support for .cfi_startproc simple
This commit allows LLVM MC to process .cfi_startproc directives when
they are followed by an additional `simple' identifier. This signals to
elide the emission of target specific CFI instructions that would
normally occur initially.

This fixes PR16587.

Differential Revision: http://llvm-reviews.chandlerc.com/D2624

llvm-svn: 200227
2014-01-27 17:20:25 +00:00
Chandler Carruth e24f3973eb [vectorize] Initial version of respecting PGO in the vectorizer: treat
cold loops as-if they were being optimized for size.

Nothing fancy here. Simply test case included. The nice thing is that we
can now incrementally build on top of this to drive other heuristics.
All of the infrastructure work is done to get the profile information
into this layer.

The remaining work necessary to make this a fully general purpose loop
unroller for very hot loops is to make it a fully general purpose loop
unroller. Things I know of but am not going to have time to benchmark
and fix in the immediate future:

1) Don't disable the entire pass when the target is lacking vector
   registers. This really doesn't make any sense any more.
2) Teach the unroller at least and the vectorizer potentially to handle
   non-if-converted loops. This is trivial for the unroller but hard for
   the vectorizer.
3) Compute the relative hotness of the loop and thread that down to the
   various places that make cost tradeoffs (very likely only the
   unroller makes sense here, and then only when dealing with loops that
   are small enough for unrolling to not completely blow out the LSD).

I'm still dubious how useful hotness information will be. So far, my
experiments show that if we can get the correct logic for determining
when unrolling actually helps performance, the code size impact is
completely unimportant and we can unroll in all cases. But at least
we'll no longer burn code size on cold code.

One somewhat unrelated idea that I've had forever but not had time to
implement: mark all functions which are only reachable via the global
constructors rigging in the module as optsize. This would also decrease
the impact of any more aggressive heuristics here on code size.

llvm-svn: 200219
2014-01-27 13:11:50 +00:00
Benjamin Kramer 9e709bce86 ConstantHoisting: We can't insert instructions directly in front of a PHI node.
Insert before the terminating instruction of the dominating block instead.

llvm-svn: 200218
2014-01-27 13:11:43 +00:00
Chandler Carruth edfa37effa [vectorizer] Add an override for the target instruction cost and use it
to stabilize a test that really is trying to test generic behavior and
not a specific target's behavior.

llvm-svn: 200215
2014-01-27 11:41:50 +00:00
Chandler Carruth 147c23278f [vectorizer] Teach the loop vectorizer's unroller to only unroll by
powers of two. This is essentially always the correct thing given the
impact on alignment, scaling factors that can be used in addressing
modes, etc. Also, fix the management of the unroll vs. small loop cost
to more accurately model things with this world.

Enhance a test case to actually exercise more of the unroll machinery if
using synthetic constants rather than a specific target model. Before
this change, with the added flags this test will unroll 3 times instead
of either 2 or 4 (the two sensible answers).

While I don't expect this to make a huge difference, if there are lots
of loops sitting right on the edge of hitting the 'small unroll' factor,
they might change behavior. However, I've benchmarked moving the small
loop cost up and down in many various ways and by a huge factor (2x)
without seeing more than 0.2% code size growth. Small adjustments such
as the series that led up here have led to about 1% improvement on some
benchmarks, but it is very close to the noise floor so I mostly checked
that nothing regressed. Let me know if you see bad behavior on other
targets but I don't expect this to be a sufficiently dramatic change to
trigger anything.

llvm-svn: 200213
2014-01-27 11:12:24 +00:00
Nick Lewycky 629199ccb3 Fix crasher introduced in r200203 and caught by a libc++ buildbot. Don't assume that getMulExpr returns a SCEVMulExpr, it may have simplified it to something else!
llvm-svn: 200210
2014-01-27 10:47:44 +00:00
Nick Lewycky 31eaca5513 Teach SCEV to handle more cases of 'and X, CST', specifically where CST is any number of contiguous 1 bits in a row, with any number of leading and trailing 0 bits.
Unfortunately, this in turn led to some lower quality SCEVs due to some different paths through expression simplification, so add getUDivExactExpr and use it. This fixes all instances of the problems that I found, but we can make that function smarter as necessary.

Merge test "xor-and.ll" into "and-xor.ll" since I needed to update it anyways. Test 'nsw-offset.ll' analyzes a little deeper, %n now gets a scev in terms of %no instead of a SCEVUnknown.

llvm-svn: 200203
2014-01-27 10:04:03 +00:00
Stepan Dyatkovskiy 55139555c4 Additional fix for 200201: due to dependence on bitwidth test was moved to X86 directory.
llvm-svn: 200202
2014-01-27 09:43:10 +00:00
Stepan Dyatkovskiy 157bb42e27 Fix for PR18102.
Issue outcomes from DAGCombiner::MergeConsequtiveStores, more precisely from
mem-ops sequence sorting.

Consider, how MergeConsequtiveStores works for next example:

store i8 1, a[0]
store i8 2, a[1]
store i8 3, a[1]   ; a[1] again.
return   ; DAG starts here

1. Method will collect all the 3 stores.
2. It sorts them by distance from the base pointer (farthest with highest
index).
3. It takes first consecutive non-overlapping stores and (if possible) replaces
them with a single store instruction.

The point is, we can't determine here which 'store' instruction
would be the second after sorting ('store 2' or 'store 3').
It happens that 'store 3' would be the second, and 'store 2' would be the third.

So after merging we have the next result:

store i16 (1 | 3 << 8), base   ; is a[0] but bit-casted to i16
store i8 2, a[1]

So actually we swapped 'store 3' and 'store 2' and got wrong contents in a[1].

Fix: In sort routine just also take into account mem-op sequence number. 
llvm-svn: 200201
2014-01-27 09:18:31 +00:00
Michel Danzer 13736221e3 R600/SI: Add intrinsic for BUFFER_LOAD_DWORD* instructions
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200196
2014-01-27 07:20:51 +00:00