Commit Graph

23244 Commits

Author SHA1 Message Date
Evgeniy Stepanov 7ab838eb56 [msan] Fix handling of byval arguments in VarArg calls.
llvm-svn: 203794
2014-03-13 13:17:11 +00:00
Elena Demikhovsky fd05667276 AVX-512: masked load/store + intrinsics for them.
llvm-svn: 203790
2014-03-13 12:05:52 +00:00
Hal Finkel 27774d9274 [PowerPC] Initial support for the VSX instruction set
VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things, this adds
<2 x double> support and generally helps to reduce register pressure.

The interesting part of this ISA feature is the register configuration: there
are 64 new 128-bit vector registers, the 32 of which are super-registers of the
existing 32 scalar floating-point registers, and the second 32 of which overlap
with the 32 Altivec vector registers. This makes things like vector insertion
and extraction tricky: this can be free but only if we force a restriction to
the right register subclass when needed. A new "minipass" PPCVSXCopy takes care
of this (although it could do a more-optimal job of it; see the comment about
unnecessary copies below).

Please note that, currently, VSX is not enabled by default when targeting
anything because it is not yet ready for that.  The assembler and disassembler
are fully implemented and tested. However:

 - CodeGen support causes miscompiles; test-suite runtime failures:
      MultiSource/Benchmarks/FreeBench/distray/distray
      MultiSource/Benchmarks/McCat/08-main/main
      MultiSource/Benchmarks/Olden/voronoi/voronoi
      MultiSource/Benchmarks/mafft/pairlocalalign
      MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
      SingleSource/Benchmarks/CoyoteBench/almabench
      SingleSource/Benchmarks/Misc/matmul_f64_4x4

 - The lowering currently falls back to using Altivec instructions far more
   than it should. Worse, there are some things that are scalarized through the
   stack that shouldn't be.

 - A lot of unnecessary copies make it past the optimizers, and this needs to
   be fixed.

 - Many more regression tests are needed.

Normally, I'd fix these things prior to committing, but there are some
students and other contributors who would like to work this, and so it makes
sense to move this development process upstream where it can be subject to the
regular code-review procedures.

llvm-svn: 203768
2014-03-13 07:58:58 +00:00
Saleem Abdulrasool dadf94ce84 ARM: support emission of complex SO expressions
Support to the IAS was added to actually parse and handle the complex SO
expressions.  However, the object file lowering was not updated to compensate
for the fact that the shift operand may be an absolute expression.

When trying to assemble to an object file, the lowering would fail while
succeeding when emitting purely assembly.  Add an appropriate test.

The test case is inspired by the test case provided by Jiangning Liu who also
brought the issue to light.

llvm-svn: 203762
2014-03-13 07:02:41 +00:00
Saleem Abdulrasool 9b7c0af292 Support: add support to identify WinCOFF/ARM objects
Add the Windows COFF ARM object file magic.  This enables the LLVM tools to
interact with COFF object files for Windows on ARM.

llvm-svn: 203761
2014-03-13 07:02:35 +00:00
Karthik Bhat 294607e122 Fix PR18800. llvm intrinsic memcpy takes 5 arguments void @llvm.memcpy.p0i8.p0i8.i32(i8* <dest>, i8* <src>, i32 <len>, i32 <align>, i1 <isvolatile>).The test case incorrectly uses the old format resulting in isVolatile function in MemIntrinsic to crash during SROA transformation.Modified the test case to use correct signature of memcpy and memset.
llvm-svn: 203750
2014-03-13 04:50:29 +00:00
NAKAMURA Takumi 477a2f39cb llvm/test/BugPoint/compile-custom.ll.py: Make it py3-compatible. [PR19112]
FIXME: Get rid of invoking this.
I guess it wouldn't run on win32 due to lacking of shell support.

llvm-svn: 203740
2014-03-13 00:10:37 +00:00
NAKAMURA Takumi 8a5a590cd1 decl-derived-member.ll: Try to unbreak. Don't add -mtriple to %llc_dwarf.
llvm-svn: 203732
2014-03-12 23:08:19 +00:00
Rafael Espindola 1cf777bc12 This test need the X86 backend, move it to the X86 sub directory.
llvm-svn: 203725
2014-03-12 22:03:43 +00:00
Justin Bogner ec49f9820c Back out Profile library and dependent commits
Chandler voiced some concern with checking this in without some
discussion first. Reverting for now.

This reverts r203703, r203704, r203708, and 203709.

llvm-svn: 203723
2014-03-12 22:00:57 +00:00
Michael Zolotukhin 66806aef1e PR17473:
Don't normalize an expression during postinc transformation unless it's
invertible.

llvm-svn: 203719
2014-03-12 21:31:05 +00:00
Adam Nemet d4e56073c7 [X86] Add peephole for masked rotate amount
Extend what's currently done for shift because the HW performs this masking
implicitly:

   (rotl:i32 x, (and y, 31)) -> (rotl:i32 x, y)

I use the newly factored out multiclass that was only supporting shifts so
far.

For testing I extended my testcase for the new rotation idiom.

<rdar://problem/15295856>

llvm-svn: 203718
2014-03-12 21:20:55 +00:00
Rafael Espindola f217e099bb Fix the ocaml test to not create a alias to a declaration.
llvm-svn: 203717
2014-03-12 21:20:42 +00:00
Raul E. Silvera 62f0236d36 Resubmit "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..."
This reverts commit 86cb795388643710dab34941ddcb5a9470ac39d8.
The problems previously found have been resolved through other CLs.

llvm-svn: 203707
2014-03-12 20:21:50 +00:00
Rafael Espindola b676e72d56 Add a triple to fix the test on OS X.
llvm-svn: 203706
2014-03-12 20:21:35 +00:00
Rafael Espindola f3336bc1d5 Reject alias to undefined symbols in the verifier.
On ELF and COFF an alias is just another name for a position in the file.
There is no way to refer to a position in another file, so an alias to
undefined is meaningless.

MachO currently doesn't support aliases. The spec has a N_INDR, which when
implemented will have a different set of restrictions. Adding support for
it shouldn't be harder than any other IR extension.

For now, having the IR represent what is actually possible with current
tools makes it easier to fix the design of GlobalAlias.

llvm-svn: 203705
2014-03-12 20:15:49 +00:00
Justin Bogner bfee8d49c4 llvm-profdata: Use the Profile library, implement show and generate
This replaces the llvm-profdata tool with a version that uses the
recently introduced Profile library. The new tool has the ability to
generate and summarize profdata files as well as merging them.

llvm-svn: 203704
2014-03-12 20:14:17 +00:00
Eric Christopher da6b4f028a Fix two thinkos in testcase and remove XFAIL.
llvm-svn: 203702
2014-03-12 20:12:02 +00:00
Roman Divacky a26f9a6a42 Allow exclamation and tilde to be parsed as a part of the ppc asm operand.
llvm-svn: 203699
2014-03-12 19:25:57 +00:00
Eric Christopher bc82fe338e XFAIL this temporarily.
llvm-svn: 203698
2014-03-12 19:06:03 +00:00
Eric Christopher e9305f037f Move test to X86 only for now.
llvm-svn: 203697
2014-03-12 19:02:44 +00:00
Matt Arsenault e389dd5d68 R600: Fix trunc store from i64 to i1
llvm-svn: 203695
2014-03-12 18:45:52 +00:00
Hans Wennborg b73c0b041d Allow switch-to-lookup table for tables with holes by adding bitmask check
This allows us to generate table lookups for code such as:

  unsigned test(unsigned x) {
    switch (x) {
      case 100: return 0;
      case 101: return 1;
      case 103: return 2;
      case 105: return 3;
      case 107: return 4;
      case 109: return 5;
      case 110: return 6;
      default: return f(x);
    }
  }

Since cases 102, 104, etc. are not constants, the lookup table has holes
in those positions. We therefore guard the table lookup with a bitmask check.

Patch by Jasper Neumann!

llvm-svn: 203694
2014-03-12 18:35:40 +00:00
Eric Christopher 8cc04fc40d When computing the size of a base type be conservative if the type
is a declaration and return the size of the type.

llvm-svn: 203690
2014-03-12 18:18:05 +00:00
Evan Cheng ad6efbfa0f Revert r203488 and r203520.
llvm-svn: 203687
2014-03-12 18:09:37 +00:00
Eric Christopher 7924e0cca2 Turn on hashing by default for split dwarf compile units.
llvm-svn: 203680
2014-03-12 17:14:43 +00:00
Rafael Espindola 3d5d464df8 Try harder to evaluate expressions when printing assembly.
When printing assembly we don't have a Layout object, but we can still
try to fold some constants.

Testcase by Ulrich Weigand.

llvm-svn: 203677
2014-03-12 16:55:59 +00:00
Daniel Sanders df22154579 [mips] BSEL's and BINS[RL] operands are reversed compared to the vselect node used in the pattern.
Summary:
Correct the match patterns and the lowerings that made the CodeGen tests pass despite the mistakes.

The original testcase that discovered the problem was SingleSource/UnitTests/SignlessType/factor.c in test-suite.
During review, we also found that some of the existing CodeGen tests were incorrect and fixed them:
* bitwise.ll: In bsel_v16i8 the IfSet/IfClear were reversed because bsel and bmnz have different operand orders and the test didn't correctly account for this. bmnz goes 'IfClear, IfSet, CondMask', while bsel goes 'CondMask, IfClear, IfSet'.
* vec.ll: In the cases where a bsel is emitted as a bmnz (they are the same operation with a different input tied to the result) the operands were in the wrong order.
* compare.ll and compare_float.ll: The bsel operand order was correct for a greater-than comparison, but a greater-than comparison instruction doesn't exist. Lowering this operation inverts the condition so the IfSet/IfClear need to be swapped to match.

The differences between BSEL, BMNZ, and BMZ and how they map to/from vselect are rather confusing. I've therefore added a note to MSA.txt to explain this in a single place in addition to the comments that explain each case.

Reviewers: matheusalmeida, jacksprat

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3028

llvm-svn: 203657
2014-03-12 11:54:00 +00:00
Tim Northover 3cccc45a9f ARM: correct Dwarf output for non-contiguous VFP saves.
When the list of VFP registers to be saved was non-contiguous (so multiple
vpush/vpop instructions were needed) these were being ordered oddly, as in:
    vpush {d8, d9}
    vpush {d11}

This led to the layout in memory being [d11, d8, d9] which is ugly and doesn't
match the CFI_INSTRUCTIONs we're generating either (so Dwarf info would be
broken).

This switches the order of vpush/vpop (in both prologue and epilogue,
obviously) so that the Dwarf locations are correct again.

rdar://problem/16264856

llvm-svn: 203655
2014-03-12 11:29:23 +00:00
Hans Wennborg 14863418ed [ARM] Use DWARF register numbers for CFI directives in ELF assembly
It seems gas can't handle CFI directives with VFP register names ("d12", etc.).
This broke us trying to build Chromium for Android after 201423.

A gas bug has been filed: https://sourceware.org/bugzilla/show_bug.cgi?id=16694

compnerd suggested making this conditional on whether we're using the integrated
assembler or not. I'll look into that in a follow-up patch.

Differential Revision: http://llvm-reviews.chandlerc.com/D3049

llvm-svn: 203635
2014-03-12 03:52:34 +00:00
David Blaikie adbea1ef9f DebugInfo: Omit pubnames/pubtypes when compiling with -gmlt
llvm-svn: 203634
2014-03-12 03:34:38 +00:00
David Blaikie ce2f1cb918 DebugInfo: Do not emit pubnames/pubtypes sections if they are empty
llvm-svn: 203622
2014-03-11 23:35:06 +00:00
David Blaikie fe04abbc89 Test for empty pubnames/pubtypes
llvm-svn: 203621
2014-03-11 23:35:03 +00:00
David Blaikie 0f55e833a6 DebugInfo: Refactor emitDebugPubNames/Types into a common implementation
I could fold the callers into their one call site, but the indirection
(given how verbose choosing the section is) seemed helpful.

The use of a member function pointer's a bit "tricky", but seems limited
enough, the call sites are simple/clean/clear, and there's only one use.

llvm-svn: 203619
2014-03-11 23:18:15 +00:00
David Blaikie fb6058a455 Clean up test/DebugInfo/empty.ll now that we have an alias for "llc with dwarf output"
llvm-svn: 203616
2014-03-11 22:46:12 +00:00
Sasa Stankovic 8600ebc74d [mips] Implement NaCl sandboxing of function calls:
* Add masking instructions before indirect calls (in MC layer).
  * Align call + branch delay to the bundle end (in MC layer).

Differential Revision: http://llvm-reviews.chandlerc.com/D3032

llvm-svn: 203606
2014-03-11 21:23:40 +00:00
Rafael Espindola 698a5bdbba Don't assume an empty stderr.
GuardMalloc can print info to stderr, causing these tests to fail.
Since FileCheck errors on empty inputs, just add a bit of dummy
data to make it happy.

llvm-svn: 203595
2014-03-11 18:25:33 +00:00
Hans Wennborg 6c37f8b985 X86: Don't generate 64-bit movd after cmpneqsd in 32-bit mode (PR19059)
This fixes the bug where we would bitcast the 64-bit floating point result
of cmpneqsd to a 64-bit integer even on 32-bit targets.

Differential Revision: http://llvm-reviews.chandlerc.com/D3009

llvm-svn: 203581
2014-03-11 15:49:24 +00:00
Saleem Abdulrasool 0d96f3dd6e ARM: honour -f{no-,}optimize-sibling-calls
Use the options in the ARMISelLowering to control whether tail calls are
optimised or not.  Previously, this option was entirely ignored on the ARM
target and only honoured on x86.

This option is mostly useful in profiling scenarios.  The default remains that
tail call optimisations will be applied.

llvm-svn: 203577
2014-03-11 15:09:54 +00:00
Saleem Abdulrasool b720a6bab7 ARM: remove ancient -arm-tail-calls option
This option is from 2010, designed to work around a linker issue on Darwin for
ARM.  According to grosbach this is no longer an issue and this option can
safely be removed.

llvm-svn: 203576
2014-03-11 15:09:49 +00:00
Saleem Abdulrasool ec1ec1b416 ARM: enable tail call optimisation on Thumb 2
Tail call optimisation was previously disabled on all targets other than
iOS5.0+.  This enables the tail call optimisation on all Thumb 2 capable
platforms.

The test adjustments are to remove the IR hint "tail" to function invocation.
The tests were designed assuming that tail call optimisations would not kick in
which no longer holds true.

llvm-svn: 203575
2014-03-11 15:09:44 +00:00
Erik Verbruggen 3f5dcc97e0 Fix crash in PRE.
After r203553 overflow intrinsics and their non-intrinsic (normal)
instruction get hashed to the same value. This patch prevents PRE from
moving an instruction into a predecessor block, and trying to add a phi
node that gets two different types (the intrinsic result and the
non-intrinsic result), resulting in a failing assert.

llvm-svn: 203574
2014-03-11 15:07:32 +00:00
Tim Northover e94a518a22 IR: add a second ordering operand to cmpxhg for failure
The syntax for "cmpxchg" should now look something like:

	cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic

where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).

rdar://problem/15996804

llvm-svn: 203559
2014-03-11 10:48:52 +00:00
Erik Verbruggen e2d437148a GVN: merge overflow intrinsics with non-overflow instructions.
When an overflow intrinsic is followed by a non-overflow instruction,
replace the latter with an extract. For example:

  %sadd = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
  %sadd3 = add i32 %a, %b

Here the add statement will be replaced by an extract.

When an overflow intrinsic follows a non-overflow instruction, a clone
of the intrinsic is inserted before the normal instruction, which makes
it the same as the previous case. Subsequent runs of GVN can then clean
up the duplicate instructions and insert the extract.

This fixes PR8817.

llvm-svn: 203553
2014-03-11 09:36:48 +00:00
Jim Grosbach c94d993adf X86: Enable ISel of 16-bit MOVBE instructions.
When the MOVBE instructions are available, use them for 16-bit endian
swapping as well as for 32 and 64 bit.

The patterns were already present on the instructions, but weren't being
matched because the operation was unconditionally marked to 'Expand.'
Change that to be conditional on whether the MOVBE instructions are
available. Use 'rolw' to implement the in-register version (32 and 64
bit have the dedicated 'bswap' instruction for that).

Patch by Louis Gerbarg <lgg@apple.com>.

rdar://15479984

llvm-svn: 203524
2014-03-11 00:44:14 +00:00
Matt Arsenault 532db69984 Fix undefined behavior in vector shift tests.
These were all shifting the same amount as the bitwidth.

llvm-svn: 203519
2014-03-11 00:01:41 +00:00
Duncan P. N. Exon Smith 56cc990480 Module: Don't rename in getOrInsertFunction()
During LTO, user-supplied definitions of C library functions often
exist.  -instcombine uses Module::getOrInsertFunction() to get a handle
on library functions (e.g., @puts, when optimizing @printf).

Previously, Module::getOrInsertFunction() would rename any matching
functions with local linkage, and create a new declaration.  In LTO,
this is the opposite of desired behaviour, as it skips by the
user-supplied version of the library function and creates a new
undefined reference which the linker often cannot resolve.

After some discussing with Rafael on the list, it looks like it's
undesired behaviour.  If a consumer actually *needs* this behaviour, we
should add new API with a more explicit name.

I added two testcases: one specifically for the -instcombine behaviour
and one for the LTO flow.

<rdar://problem/16165191>

llvm-svn: 203513
2014-03-10 23:42:28 +00:00
Raul E. Silvera ce376c0fcb When analyzing vectors of element type that require legalization,
the legalization cost must be included to get an accurate
estimation of the total cost of the scalarized vector.
The inaccurate cost triggered unprofitable SLP vectorization on
32-bit X86.

Summary:
Include legalization overhead when computing scalarization cost

Reviewers: hfinkel, nadav

CC: chandlerc, rnk, llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2992

llvm-svn: 203509
2014-03-10 22:59:13 +00:00
Diego Novillo 92aa8c220a Use discriminator information in sample profiles.
Summary:
When the sample profiles include discriminator information,
use the discriminator values to distinguish instruction weights
in different basic blocks.

This modifies the BodySamples mapping to map <line, discriminator> pairs
to weights. Instructions on the same line but different blocks, will
use different discriminator values. This, in turn, means that the blocks
may have different weights.

Other changes in this patch:

- Add tests for positive values of line offset, discriminator and samples.
- Change data types from uint32_t to unsigned and int and do additional
  validation.

Reviewers: chandlerc

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2857

llvm-svn: 203508
2014-03-10 22:41:28 +00:00
Benjamin Kramer 3ef5e46b6d MemCpyOpt: When merging memsets also merge the trivial case of two memsets with the same destination.
The testcase is from PR19092, but I think the bug described there is actually a clang issue.

llvm-svn: 203489
2014-03-10 21:05:13 +00:00
Evan Cheng 0e8f4612a9 For functions with ARM target specific calling convention, when simplify-libcall
optimize a call to a llvm intrinsic to something that invovles a call to a C
library call, make sure it sets the right calling convention on the call.

e.g.
extern double pow(double, double);
double t(double x) {
  return pow(10, x);
}

Compiles to something like this for AAPCS-VFP:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
  %0 = call double @llvm.pow.f64(double 1.000000e+01, double %x)
  ret double %0
}

declare double @llvm.pow.f64(double, double) #1

Simplify libcall (part of instcombine) will turn the above into:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
  %__exp10 = call double @__exp10(double %x) #1
  ret double %__exp10
}

declare double @__exp10(double)

The pre-instcombine code works because calls to LLVM builtins are special.
Instruction selection will chose the right calling convention for the call.
However, the code after instcombine is wrong. The call to __exp10 will use
the C calling convention.

I can think of 3 options to fix this.

1. Make "C" calling convention just work since the target should know what CC
   is being used.

   This doesn't work because each function can use different CC with the "pcs"
   attribute.

2. Have Clang add the right CC keyword on the calls to LLVM builtin.

   This will work but it doesn't match the LLVM IR specification which states
   these are "Standard C Library Intrinsics".

3. Fix simplify libcall so the resulting calls to the C routines will have the
   proper CC keyword. e.g.
   %__exp10 = call arm_aapcs_vfpcc double @__exp10(double %x) #1

   This works and is the solution I implemented here.

Both solutions #2 and #3 would work. After carefully considering the pros and
cons, I decided to implement #3 for the following reasons.

1. It doesn't change the "spec" of the intrinsics.
2. It's a self-contained fix.

There are a couple of potential downsides.
1. There could be other places in the optimizer that is broken in the same way
   that's not addressed by this.
2. There could be other calling conventions that need to be propagated by
   simplify-libcall that's not handled.

But for now, this is the fix that I'm most comfortable with.

llvm-svn: 203488
2014-03-10 20:49:45 +00:00
Eli Bendersky d47a5c2d3f Followup to r203483 - add test.
[forgot to 'svn add' before committing r203483]

llvm-svn: 203485
2014-03-10 20:36:04 +00:00
Sasa Stankovic 5fddf61089 [mips] Implement NaCl sandboxing of loads, stores and SP changes:
* Add masking instructions before loads and stores (in MC layer).
  * Add masking instructions after SP changes (in MC layer).
  * Forbid loads, stores and SP changes in delay slots (in MI layer).

Differential Revision: http://llvm-reviews.chandlerc.com/D2904

llvm-svn: 203484
2014-03-10 20:34:23 +00:00
Adam Nemet 47492919c6 [bugpoint] Add testcase for r203343.
llvm-svn: 203472
2014-03-10 16:58:54 +00:00
Reed Kotler 96b7402bac Fix regression with -O0 for mips .
llvm-svn: 203469
2014-03-10 16:31:25 +00:00
JF Bastien 76086c667d Add test for LinkModules warning on triple, modified by r203009. Datalayout is already tested.
llvm-svn: 203468
2014-03-10 15:54:49 +00:00
Matheus Almeida 64459d296b [mips] Assembly parser must invoke the target streamer to handle .set reorder macro.
llvm-svn: 203459
2014-03-10 13:21:10 +00:00
Tim Northover 2a661f3f73 AArch64: fix LowerCONCAT_VECTORS for new CodeGen.
The function was making too many assumptions about its input:

1. The NEON_VDUP optimisation was far too aggressive, assuming (I
think) that the input would always be BUILD_VECTOR.

2. We were treating most unknown concats as legal (by returning Op
rather than SDValue()). I think only concats of pairs of vectors are
actually legal.

http://llvm.org/PR19094

llvm-svn: 203450
2014-03-10 09:34:07 +00:00
Venkatraman Govindaraju f703132b09 [Sparc] Add support for decoding 'swap' instruction.
llvm-svn: 203424
2014-03-09 23:32:07 +00:00
NAKAMURA Takumi 1783e1e984 Revert r203230, "CodeGenPrep: sink extends of illegal types into use block."
It choked i686 stage2.

llvm-svn: 203386
2014-03-09 11:01:07 +00:00
David Majnemer c4ab61cb2f IR: Change inalloca's grammar a bit
The grammar for LLVM IR is not well specified in any document but seems
to obey the following rules:

 - Attributes which have parenthesized arguments are never preceded by
   commas.  This form of attribute is the only one which ever has
   optional arguments.  However, not all of these attributes support
   optional arguments: 'thread_local' supports an optional argument but
   'addrspace' does not.  Interestingly, 'addrspace' is documented as
   being a "qualifier".  What constitutes a qualifier?  I cannot find a
   definition.

 - Some attributes use a space between the keyword and the value.
   Examples of this form are 'align' and 'section'.  These are always
   preceded by a comma.

 - Otherwise, the attribute has no argument.  These attributes do not
   have a preceding comma.

Sometimes an attribute goes before the instruction, between the
instruction and it's type, or after it's type.  'atomicrmw' has
'volatile' between the instruction and the type while 'call' has 'tail'
preceding the instruction.

With all this in mind, it seems most consistent for 'inalloca' on an
'inalloca' instruction to occur before between the instruction and the
type.  Unlike the current formulation, there would be no preceding
comma.  The combination 'alloca inalloca' doesn't look particularly
appetizing, perhaps a better spelling of 'inalloca' is down the road.

llvm-svn: 203376
2014-03-09 06:41:58 +00:00
Adam Nemet 4203039760 Update comment from r203315 based on review
llvm-svn: 203361
2014-03-08 21:51:55 +00:00
David Blaikie 078278fe3a DebugInfo: further improvements to test following up on r203329
llvm-svn: 203337
2014-03-08 02:45:53 +00:00
David Blaikie f528f054d0 DebugInfo: Fix test fallout from r203323
Will fix this harder in a moment.

llvm-svn: 203329
2014-03-08 01:32:51 +00:00
David Blaikie 26ab6c6dd5 DebugInfo: Use DW_FORM_data4 for DW_AT_high_pc in DW_TAG_lexical_blocks
Suggested by Adrian Prantl in code review for r203187

llvm-svn: 203323
2014-03-08 00:58:20 +00:00
Eric Christopher 4f17ee09f9 Add support for hashing location information for CU level hashes.
Add a testcase based on sret.cpp where we can now hash the entire
compile unit.

llvm-svn: 203319
2014-03-08 00:29:41 +00:00
Adam Nemet 5117f5dffc [DAGCombiner] Recognize another rotation idiom
This is the new idiom:

  x<<(y&31) | x>>((0-y)&31)

which is recognized as:

  x ROTL (y&31)

The change refines matchRotateSub.  In
Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is
Pos' & (OpSize - 1) we can just use Pos' instead of Pos.

llvm-svn: 203315
2014-03-07 23:56:28 +00:00
Arnold Schwaighofer d33e942958 ISel: Make VSELECT selection terminate in cases where the condition type has to
be split and the result type widened.

When the condition of a vselect has to be split it makes no sense widening the
vselect and thereby widening the condition. We end up in an endless loop of
widening (vselect result type) and splitting (condition mask type) doing this.
Instead, split both the condition and the vselect and widen the result.

I ran this over the test suite with i686 and mattr=+sse and saw no regressions.

Fixes PR18036.

llvm-svn: 203311
2014-03-07 23:25:55 +00:00
Adrian Prantl 887e70786a Remove unnecessary test for Darwin and update testcase to be a little less
horrible/fragile.
rdar://problem/16264854

llvm-svn: 203309
2014-03-07 23:07:21 +00:00
Sasa Stankovic 1e50b46bf9 Moved test file from test/MC/Mips to test/CodeGen/Mips.
llvm-svn: 203298
2014-03-07 22:08:46 +00:00
David Blaikie 555e79a304 DebugInfo: Use DW_FORM_data4 for DW_AT_high_pc in inlined functions
Suggested by Adrian Prantl in code review for r203187.

llvm-svn: 203296
2014-03-07 22:00:56 +00:00
David Blaikie 3e4ff7a92a DebugInfo: Update test to cover linux (with a FIXME...) too
llvm-svn: 203295
2014-03-07 22:00:49 +00:00
Tom Stellard e28859f8fa R600/SI: Using SGPRs is illegal for instructions that read carry-out from VCC
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 203281
2014-03-07 20:12:39 +00:00
Tom Stellard 1c8788ef5a R600/SI: Custom lower i1 stores
These are sometimes created by the shrink to boolean optimization in the
globalopt pass.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 203280
2014-03-07 20:12:33 +00:00
David Blaikie d723f5186e DebugInfo: Restrict DW_AT_high_pc encoding as data4 offset to DWARF 4 as per spec
Code review feedback to r203187 from Oliver Stannard. Thanks!

llvm-svn: 203256
2014-03-07 18:04:24 +00:00
Duncan P. N. Exon Smith 29db0eb855 ARM: Make .unreq directives case-insensitive
Be case-insensitive when processing .unreq directives.

Patch by Lin Zuojian!

llvm-svn: 203251
2014-03-07 16:16:52 +00:00
Tim Northover ad3d81d320 CodeGenPrep: sink extends of illegal types into use block.
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.

Patch by Manuel Jacob.

llvm-svn: 203230
2014-03-07 11:04:30 +00:00
Tim Northover fad2761ca0 InstCombine: form shuffles from wider range of insert/extractelements
Sequences of insertelement/extractelements are sometimes used to build
vectorsr; this code tries to put them back together into shuffles, but
could only produce a completely uniform shuffle types (<N x T> from two
<N x T> sources).

This should allow shuffles with different numbers of elements on the
input and output sides as well.

llvm-svn: 203229
2014-03-07 10:24:44 +00:00
Rafael Espindola b1f25f1b93 Replace PROLOG_LABEL with a new CFI_INSTRUCTION.
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.

The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.

The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.

I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.

The net result is that we don't create temporary labels that are never used.

llvm-svn: 203204
2014-03-07 06:08:31 +00:00
Karthik Bhat b67688a87c Allow constant folding of round function whenever feasible
llvm-svn: 203198
2014-03-07 04:36:21 +00:00
David Blaikie 479323a62b DebugInfo: Limit r203187 to non-darwin as lldb can't handle this yet
llvm-svn: 203192
2014-03-07 02:19:41 +00:00
David Blaikie 48b1bdcf28 DebugInfo: Emit DW_TAG_subprogram's DW_AT_high_pc as an offset from the low_pc
This removes a relocation from each subprogram, reducing link times,
etc.

llvm-svn: 203187
2014-03-07 01:30:55 +00:00
David Blaikie f5040a64bb DebugInfo: Refactor test to not rely on fixed DIE offsets
llvm-svn: 203186
2014-03-07 01:19:31 +00:00
David Blaikie b9a0265cc1 DebugInfo: Improve test to not depend on the specific naming of temporary symbols
llvm-svn: 203184
2014-03-07 00:23:38 +00:00
Rafael Espindola 3b30cb41a9 Remove shouldEmitUsedDirectiveFor.
Clang now uses llvm.compiler.used for these cases.

llvm-svn: 203174
2014-03-06 22:47:08 +00:00
Rafael Espindola 123256a4aa Convert test to FileCheck.
llvm-svn: 203173
2014-03-06 22:21:43 +00:00
Andrea Di Biagio 6292a140ee [X86] Teach the DAGCombiner how to fold a OR of two shufflevector nodes.
This patch teaches the DAGCombiner how to fold a binary OR between two
shufflevector into a single shuffle vector when possible.

The rules are:
  1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1)
  2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2)

The DAGCombiner can take advantage of the fact that OR is commutative and
compute two possible shuffle masks (Mask1 and Mask2) for the resulting
shuffle node.

Before folding a dag according to either rule 1 or 2, DAGCombiner verifies
that the resulting shuffle mask is legal for the target.
DAGCombiner would firstly try to fold according to 1.; If not possible
then it will try to fold according to 2.
If both Mask1 and Mask2 are illegal then we conservatively don't fold
the OR instruction.

llvm-svn: 203156
2014-03-06 20:19:52 +00:00
Rafael Espindola 1194e69fe6 Fix the printing of n_type.
Despite the name, n_type contains the type of the symbol, but also if it is
extern or private extern.

llvm-svn: 203154
2014-03-06 20:13:41 +00:00
Matt Arsenault f9a995d68c R600: Fix extloads from i8 / i16 to i64.
This appears to only be working for global loads. Private
and local break for other reasons.

llvm-svn: 203135
2014-03-06 17:34:12 +00:00
Matt Arsenault 9fe669c522 R600/SI: Expand selects on vectors.
llvm-svn: 203134
2014-03-06 17:34:03 +00:00
Matt Arsenault a236ea551c Teach lint about address spaces
llvm-svn: 203132
2014-03-06 17:33:55 +00:00
Richard Osborne 47155af5eb [XCore] Add support for the "m" inline asm constraint.
Summary:
This provides support for CP and DP relative global accesses in inline
asm.

Reviewers: robertlytton

Reviewed By: robertlytton

Differential Revision: http://llvm-reviews.chandlerc.com/D2943

llvm-svn: 203129
2014-03-06 16:37:48 +00:00
Chad Rosier 86a8f72041 [AArch64] This is a work in progress to provide a machine description
for the Cortex-A53 subtarget in the AArch64 backend.

This patch lays the ground work to annotate each AArch64 instruction
(no NEON yet) with a list of SchedReadWrite types. The patch also
provides the Cortex-A53 processor resources, maps those the the default
SchedReadWrites, and provides basic latency. NEON support will be added
in a subsequent patch with proper forwarding logic.

Verification was done by setting the pre-RA scheduler to linearize to
better gauge the effect of the MIScheduler. Even without modeling the
forward logic, the results show a modest improvement for Cortex-A53.

Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 203125
2014-03-06 16:04:00 +00:00
Elena Demikhovsky f7c1b16591 AVX-512: Added rrk, rrkz, rmk, rmkz, rmbk, rmbkz versions of AVX512 FP packed instructions, added encoding tests for them.
By Robert Khazanov.

llvm-svn: 203098
2014-03-06 08:45:30 +00:00
Elena Demikhovsky 8fae565f08 AVX-512: fixed comressed displacement - by Robert Khazanov
llvm-svn: 203096
2014-03-06 08:15:35 +00:00
David Blaikie 47c254beb7 DebugInfo: Tag units as having been indexed in GNU pubnames by using a DW_AT_GNU_pubnames of DW_FORM_flag(_present) rather than sec_offsets to the pubnames/types sections
This is consistent with GDB ToT and reduces the number of relocations in
(type and compile) units, substantially reducing relocations and debug
size in fission + type units builds.

llvm-svn: 203082
2014-03-06 05:47:39 +00:00
Karthik Bhat daa8cd10d9 Allow constant folding of copysign
llvm-svn: 203076
2014-03-06 05:32:52 +00:00
David Blaikie c3d9e9e55f DebugInfo: Shrink pubnames/pubtypes in the presence of type units by only emitting pub sections for compile units
llvm-svn: 203057
2014-03-06 01:42:00 +00:00
Hal Finkel 7f908e8ef4 Fixup PPC Darwin i1 argument handling
Like on other targets, we need to zero_extend/truncate i1 args before copying
them to GPRs.

llvm-svn: 203045
2014-03-06 00:45:19 +00:00
Hal Finkel 2a9d318e4a When using CR bit registers on PPC32, handle the i1 vaarg case
When copying an i1 value into a GPR for a vaarg call, we need to explicitly
zero-extend the i1 value (otherwise an invalid CRBIT -> GPR copy will be
generated).

llvm-svn: 203041
2014-03-06 00:23:33 +00:00
Raul E. Silvera b741b945c5 Change math intrinsic attributes from readonly to readnone. These
are operations that do not access memory but may be sensitive
to floating-point environment changes. LLVM does not attempt
to model FP environment changes, so this was unnecessarily conservative
and was getting on the way of some optimizations, in particular
SLP vectorization.

llvm-svn: 203037
2014-03-06 00:18:15 +00:00
Jack Carter 6b9cf961bd [Mips] Testcase typo fix. No functionality change.
llvm-svn: 203020
2014-03-05 22:54:56 +00:00
Hal Finkel 6a56b21729 With PPC CR bit registers, handle int_to_fp on older cores
On cores without fpcvt support, we cannot promote int_to_fp i1 operations,
because there is nothing to promote them to. The most straightforward
implementation of this uses a select to choose between the two possible
resulting floating-point values (and that's what is done here).

llvm-svn: 203015
2014-03-05 22:14:00 +00:00
JF Bastien d44807ca67 Fix datalayout test that I broke with my previous LinkModules warning improvement.
llvm-svn: 203011
2014-03-05 21:37:08 +00:00
Arnold Schwaighofer ab12363c02 LoopVectorizer: Preserve fast-math flags
Fixes PR19045.

llvm-svn: 203008
2014-03-05 21:10:47 +00:00
Rafael Espindola 8377085657 Always print the implicit .text at the start of an asm file.
Before llvm-mc would print it, but llc was assuming that it would produce
another section changing directive before one was needed. That assumption is
false with inline asm.

Fixes PR19049.

Another option would be to always create the section, but in the asm printer
avoid printing sections changes during initialization. That would work, but
* We do use the fact that llvm-mc prints it in testing. The tests can be changed
  if needed.
* A quick poll on IRC suggest that most developers prefer the implicit .text to
  be printed.

llvm-svn: 203001
2014-03-05 20:09:15 +00:00
Benjamin Kramer 061d147f74 ConstantFolding: Also fold the vector overloads of our math intrinsics.
llvm-svn: 202997
2014-03-05 19:41:48 +00:00
Cameron McInally 791ae9927c Lower AVX v4i64->v4i32 truncate to one shuffle.
llvm-svn: 202996
2014-03-05 19:41:16 +00:00
Oliver Stannard d55e115b58 ARM: Correctly align arguments after a byval struct is passed on the stack
llvm-svn: 202985
2014-03-05 15:25:27 +00:00
Vladimir Medic 27c398e38c This patch implements .set dsp directive and sets appropriate feature bits.This directive is a counterpart of -mattr=dsp command line option with the exception that it does not influence elf header flags. The usage example is gives in test file.
llvm-svn: 202966
2014-03-05 11:05:09 +00:00
Andrew Trick fbb278c541 Make stackmap machineinstrs clobber the scratch regs too.
Patchpoints already did this. Doing it for stackmaps is a convenience
for the runtime in the event that it needs to scratch register to
patch or perform a runtime call thunk.

Unlike patchpoints, we just assume the AnyRegCC calling
convention. This is the only language and target independent calling
convention specific to stackmaps so makes sense.  Although the calling
convention is not currently used to select the scratch registers.

llvm-svn: 202943
2014-03-05 07:08:16 +00:00
Hans Wennborg acb842d523 Check for dynamic allocas and inline asm that clobbers sp before building
selection dag (PR19012)

In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo
to make sure that ESI isn't used as a base pointer register before we choose to
emit rep movs (which clobbers esi).

The problem is that MachineFrameInfo wouldn't know about dynamic allocas or
inline asm that clobbers the stack pointer until SelectionDAGBuilder has
encountered them.

This patch fixes the problem by checking for such things when building the
FunctionLoweringInfo.

Differential Revision: http://llvm-reviews.chandlerc.com/D2954

llvm-svn: 202930
2014-03-05 02:43:26 +00:00
Raul E. Silvera 18ebc7cd0a Trivial test commit.
llvm-svn: 202924
2014-03-05 02:09:51 +00:00
Matt Arsenault 8377858c55 Allow constant folding of fma and fmuladd
llvm-svn: 202914
2014-03-05 00:02:00 +00:00
Rui Ueyama 595932f1b0 llvm-objdump: Indent unwind info contents.
Unwind info contents were indented at the same level as function table
contents. That's a bit confusing because the unwind info is pointed by
function table. In other places we usually increment indentation depth
by one when dereferncing a pointer.

This patch also removes extraneous newlines between function tables.

llvm-svn: 202879
2014-03-04 19:23:56 +00:00
Rui Ueyama 5aa88fe1e7 llvm-objdump: Fix typo in output.
llvm-svn: 202875
2014-03-04 19:03:42 +00:00
Richard Osborne 1b5fc39710 [XCore] Fix call of absolute address.
Previously for:

tail call void inttoptr (i64 65536 to void ()*)() nounwind

We would emit:

bl 65536

The immediate operand of the bl instruction is a relative offset so it is
wrong to use the absolute address here.

llvm-svn: 202860
2014-03-04 16:50:30 +00:00
NAKAMURA Takumi afd8d16bce [CMake] check-llvm: Include "bugpoint" in dependent list.
llvm-svn: 202858
2014-03-04 16:13:30 +00:00
Daniel Sanders d920770add [mips][msa] Correct the behaviour of the COPY_FW pseudo on lanes 2 and 3.
Summary:
Previously, attempting to extract lanes 2 and 3 would actually extract lane 1.
The MSA CodeGen tests only covered lanes 0 and 1.

Differential Revision: http://llvm-reviews.chandlerc.com/D2935

llvm-svn: 202848
2014-03-04 13:54:30 +00:00
Vladimir Medic 615b26e1cd This patch implements .set mips32r2 directive and sets appropriate feature bits. It also introduces helper functions that are used to set and clear feature bits as necessary. This directive is a counterpart of -mips32r2 command line options with the exception that it does not influence elf header flags. The usage example is gives in test file.
llvm-svn: 202807
2014-03-04 09:54:09 +00:00
Rui Ueyama 9c674e6851 llvm-objdump: Print x64 unwind info in executable.
The original code does not work correctly on executable files because the
code is written in such a way that only object files are assumed to be given
to llvm-objdump.

Contents of RuntimeFunction are different between executables and objects. In
executables, fields in RuntimeFunction have actual addresses to unwind info
structures. On the other hand, in object files, the fields have zero value,
but instead there are relocations pointing to the fields, so that Linker will
fill them at link-time.

So, when we are reading an object file, we need to use relocation info to
find the location of unwind info. When executable, we should just look at the
values in RuntimeFunction.

llvm-svn: 202785
2014-03-04 04:00:55 +00:00
Rui Ueyama 432bc1048f Make a test for llvm-objdump a little bit more readable.
llvm-svn: 202783
2014-03-04 03:23:19 +00:00
Kevin Qin b08c6746c4 [AArch64]Fix improper diagnostics about offset range of load/store instructions.
llvm-svn: 202775
2014-03-04 02:05:13 +00:00
Reid Kleckner d84e70ea1b MC: Fix Intel assembly parser for [global + offset]
We were dropping the displacement on the floor if we also had some
immediate offset.

Should fix PR19033.

llvm-svn: 202774
2014-03-04 00:33:17 +00:00
Chad Rosier 70cb2311ab Revert "[AArch64] This is a work in progress to provide a machine description"
This reverts commit ff717c8fc786a0cfa1602982b91895fa09e514fc.

llvm-svn: 202773
2014-03-04 00:32:07 +00:00
Chad Rosier fe45290566 [AArch64] This is a work in progress to provide a machine description
for the Cortex-A53 subtarget in the AArch64 backend.

This patch lays the ground work to annotate each AArch64 instruction
(no NEON yet) with a list of SchedReadWrite types. The patch also
provides the Cortex-A53 processor resources, maps those the the default
SchedReadWrites, and provides basic latency. NEON support will be added
in a subsequent patch with proper forwarding logic.

Verification was done by setting the pre-RA scheduler to linearize to
better gauge the effect of the MIScheduler. Even without modeling the
forward logic, the results show a modest improvement for Cortex-A53.

Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 202767
2014-03-03 23:32:47 +00:00
Diego Novillo f5041ce558 Pass to emit DWARF path discriminators.
DWARF discriminators are used to distinguish multiple control flow paths
on the same source location. When this happens, instructions across
basic block boundaries will share the same debug location.

This pass detects this situation and creates a new lexical scope to one
of the two instructions. This lexical scope is a child scope of the
original and contains a new discriminator value. This discriminator is
then picked up from MCObjectStreamer::EmitDwarfLocDirective to be
written on the object file.

This fixes http://llvm.org/bugs/show_bug.cgi?id=18270.

llvm-svn: 202752
2014-03-03 20:06:11 +00:00
Diego Novillo 282450d94c Add DWARF discriminator support to DILexicalBlocks.
This adds support for emitting discriminators from DILexicalBlocks.

llvm-svn: 202736
2014-03-03 18:53:17 +00:00
Daniel Sanders fa961d76f0 [mips] Prevent %lo relocation being used on MSA loads and stores.
Summary:
Parts of the compiler still believed MSA load/stores have a 16-bit offset when
it is actually 10-bit. Corrected this, and fixed a closely related issue this
uncovered where load/stores with 10-bit and 12-bit offsets (MSA and microMIPS
respectively) could not load/store using offsets from the stack/frame pointer.
They accepted frameindex+offset, but not frameindex by itself.

Reviewers: jacksprat, matheusalmeida

Reviewed By: jacksprat

Differential Revision: http://llvm-reviews.chandlerc.com/D2888

llvm-svn: 202717
2014-03-03 14:31:21 +00:00
Ed Maste 2a710d0a5b [mips] support FK_Data_2 and FK_Data_8 to fix big-endian debug data
This fixes invalid lengths in .debug_aranges on big-endian mips64
(lengths appear to be left-shifted by 32 bits) and in .debug_loc.

Differential Revision: http://llvm-reviews.chandlerc.com/D2517

llvm-svn: 202716
2014-03-03 14:27:49 +00:00
Evgeniy Stepanov 77be532f71 [msan] Handle X86 SIMD bitshift intrinsics.
llvm-svn: 202712
2014-03-03 13:47:42 +00:00
Vladimir Medic 43e978234a This patch implements jalx instruction for Mips architecture.This instruction executes a procedure call within the current 256 MB-aligned region and change the ISA Mode from MIPS32 to microMIPS32 or MIPS16e. Usage samples for assembler and dissasembler are provided as well.
llvm-svn: 202706
2014-03-03 13:12:59 +00:00
Saleem Abdulrasool 19dcc312ee AsmParser: add missed tests
The diagnostics tests were missing from the previous introduction of ifeqs.

llvm-svn: 202674
2014-03-03 06:35:00 +00:00
Venkatraman Govindaraju 925ec9b11e [Sparc] Add trap on integer condition codes (Ticc) instructions to Sparc backend.
llvm-svn: 202670
2014-03-02 23:39:07 +00:00
Venkatraman Govindaraju 07d3af2821 [Sparc] Add return/rett instruction to Sparc backend.
llvm-svn: 202666
2014-03-02 22:55:53 +00:00
Venkatraman Govindaraju 4fa2ab26f5 [Sparc] Add support for decoding jmpl/retl/ret instruction.
llvm-svn: 202663
2014-03-02 21:17:44 +00:00
Venkatraman Govindaraju c3084ad294 [Sparc] Add fcmpe* instructions to Sparc backend.
llvm-svn: 202661
2014-03-02 19:56:19 +00:00
Venkatraman Govindaraju f9a202a9ac [Sparc] Add VIS instructions to sparc backend.
llvm-svn: 202660
2014-03-02 19:31:21 +00:00
Hal Finkel 6aca2373f2 Add a PPC inline asm constraint type for single CR bits
Now that the PowerPC backend can track individual CR bits as first-class
registers, we should also have a way of allocating them for inline asm
statements. Because these registers are only one bit, if an output variable is
implicitly cast to a larger integer size, we'll get an any_extend to that
larger type (this is part of the existing target-independent logic). As a
result, regardless of the size of the output type, only the first bit is
meaningful.

The constraint identifier "wc" has been chosen for this purpose. Although gcc
does not currently support allocating individual CR bits, this identifier
choice has been coordinated with the gcc PowerPC team, and will be marked as
reserved for this purpose in the gcc constraints.md file.

llvm-svn: 202657
2014-03-02 18:23:39 +00:00
Michael Kuperstein 661e288a70 Ensure bitcode encoding of instructions and their operands stays stable.
This includes instructions that relate to memory access (load/store/GEP), comparison instructions and calls.

Work was done by lama.saba@intel.com.

llvm-svn: 202647
2014-03-02 15:26:36 +00:00
Venkatraman Govindaraju b745e67a64 [SparcV9] Adds support for branch on integer register instructions (BPr) and conditional moves on integer register (MOVr/FMOVr).
llvm-svn: 202628
2014-03-02 09:46:56 +00:00
Elena Demikhovsky 9737e3886b AVX-512: Fixed extract_vector_elt for v8i1 vector
llvm-svn: 202624
2014-03-02 09:19:44 +00:00
Venkatraman Govindaraju 600f390bb9 [Sparc] Add support for parsing branches and conditional move instructions with %fcc1-%fcc3 conditional registers.
llvm-svn: 202616
2014-03-02 06:28:15 +00:00
Venkatraman Govindaraju 81aae57282 [Sparc] Add support for parsing fcmp with %fcc registers.
llvm-svn: 202610
2014-03-02 03:39:39 +00:00
Venkatraman Govindaraju c86e0f3873 [SparcV9] Add support for parsing branch instructions with prediction.
llvm-svn: 202602
2014-03-01 22:03:07 +00:00
Matt Arsenault 2430958182 R600: Add failing control flow tests.
Simple cases hit a variety of problems at -O0.

llvm-svn: 202601
2014-03-01 21:45:41 +00:00
Hal Finkel 46043edc56 Remove extra truncs/exts around i32 bit operations on PPC64
This generalizes the code to eliminate extra truncs/exts around i1 bit
operations to also do the same on PPC64 for i32 bit operations. This eliminates
a fairly prevalent code wart:

int foo(int a) {
  return a == 5 ? 7 : 8;
}

On PPC64, because of the extension implied by the ABI, this would generate:

	cmplwi 0, 3, 5
	li 12, 8
	li 4, 7
	isel 3, 4, 12, 2
	rldicl 3, 3, 0, 32
	blr

where the 'rldicl 3, 3, 0, 32', the extension, is completely unnecessary. At
least for the single-BB case (which is all that the DAG combine mechanism can
handle), this unnecessary extension is no longer generated.

llvm-svn: 202600
2014-03-01 21:36:57 +00:00
Venkatraman Govindaraju 2286874119 [Sparc] Add support for parsing annulled branch instructions.
llvm-svn: 202599
2014-03-01 20:08:48 +00:00
Venkatraman Govindaraju e0c5bff720 [Sparc] Add support for parsing sparcv9 instructions addc/subc/addccc/subccc.
llvm-svn: 202598
2014-03-01 18:54:52 +00:00
Venkatraman Govindaraju 2a9c430677 [Sparc] Add missing ALU instruction patterns.
llvm-svn: 202597
2014-03-01 17:51:00 +00:00
Sasa Stankovic 075e339373 Add missing FileCheck in test command line.
llvm-svn: 202594
2014-03-01 16:14:29 +00:00
Venkatraman Govindaraju 256735d485 [Sparc] Add support to decode unimp instruction.
llvm-svn: 202581
2014-03-01 09:28:18 +00:00
Venkatraman Govindaraju 484ca1a030 [Sparc] Add support to decode negative simm13 operands in the sparc disassembler.
llvm-svn: 202578
2014-03-01 09:11:57 +00:00
Venkatraman Govindaraju 78df2dec0c [Sparc] Add support for decoding call instructions in the sparc disassembler.
llvm-svn: 202577
2014-03-01 08:30:58 +00:00
Venkatraman Govindaraju fb54821398 [Sparc] Add support to disassemble sparc memory instructions.
llvm-svn: 202575
2014-03-01 07:46:33 +00:00
Venkatraman Govindaraju bf70566a45 Add support for parsing sun-style section flags in ELFAsmParser.
llvm-svn: 202573
2014-03-01 06:21:00 +00:00
Venkatraman Govindaraju 2b1682bcd4 [Sparc] Implement writeNopData. Emit actual NOP instruction instead of just filling with zeroes.
llvm-svn: 202572
2014-03-01 05:45:09 +00:00
Venkatraman Govindaraju 9fc29098df [Sparc] Teach SparcAsmParser to emit correct relocations for PIC code.
llvm-svn: 202571
2014-03-01 05:07:21 +00:00
Venkatraman Govindaraju 6f2e08c8e1 [Sparc] Add support for parsing directives in SparcAsmParser.
llvm-svn: 202564
2014-03-01 02:18:04 +00:00
Venkatraman Govindaraju f7eecf80c4 [Sparc] Emit 'restore' instead of 'restore %g0, %g0, %g0'. This improves the readability of the generated code.
llvm-svn: 202563
2014-03-01 01:04:26 +00:00
Manman Ren 709c951b42 SpillPlacement: fix a bug in iterate.
Inside iterate, we scan backwards then scan forwards in a loop. When iteration
is not zero, the last node was just updated so we can skip it. But when
iteration is zero, we can't skip the last node.

For the testing case, fixing this will save a spill and move register copies
from hot path to cold path.

llvm-svn: 202557
2014-02-28 23:05:31 +00:00
Tom Stellard d61a1c3360 R600/SI: Expand all v16[if]32 operations
llvm-svn: 202543
2014-02-28 21:36:37 +00:00
Justin Bogner 02b958422c CommandLine: Exit successfully for -version and -help
Tools that use the CommandLine library currently exit with an error
when invoked with -version or -help. This is unusual and non-standard,
so we'll fix them to exit successfully instead.

I don't expect that anyone relies on the current behaviour, so this
should be a fairly safe change.

llvm-svn: 202530
2014-02-28 19:08:01 +00:00
Adam Nemet 6586e5d6ac Test commit
llvm-svn: 202528
2014-02-28 18:44:39 +00:00
Zoran Jovanovic 285cc289e8 Fixed operand of SC microMIPS instruction.
llvm-svn: 202526
2014-02-28 18:22:56 +00:00
Zoran Jovanovic 7c6c36d92d Fixed encoding of SYSCALL microMIPS instruction.
llvm-svn: 202523
2014-02-28 18:17:08 +00:00
Zoran Jovanovic d0a289003d Revert revision 202518 because of wrong commit message.
llvm-svn: 202521
2014-02-28 18:14:16 +00:00
Zoran Jovanovic 9874a2b1ef Fix operand of SC instruction.
llvm-svn: 202518
2014-02-28 18:02:17 +00:00
Rafael Espindola 11ac853774 With rpaths being set correctly, SHLIBPATH_VAR is not needed anymore.
llvm-svn: 202510
2014-02-28 16:16:51 +00:00
Sasa Stankovic 8c5736b921 [mips] Implement NaCl sandboxing of indirect jumps:
* Align targets of indirect jumps to instruction bundle boundaries (in MI layer).
  * Add masking instructions before indirect jumps (in MC layer).

Differential Revision: http://llvm-reviews.chandlerc.com/D2847

llvm-svn: 202479
2014-02-28 10:00:38 +00:00
Hal Finkel b998915ee1 Swap PPC isel operands to allow for 0-folding
The PPC isel instruction can fold 0 into the first operand (thus eliminating
the need to materialize a zero-containing register when the 'true' result of
the isel is 0). When the isel is fed by a bit register operation that we can
invert, do so as part of the bit-register-operation peephole routine.

llvm-svn: 202469
2014-02-28 06:11:16 +00:00
Rafael Espindola a51f0f8367 Now that it is possible, use the mangler in IRObjectFile.
A really simple patch marks the end of a lot of yak shaving :-)

llvm-svn: 202463
2014-02-28 02:17:23 +00:00
Hal Finkel 940ab934d4 Add CR-bit tracking to the PowerPC backend for i1 values
This change enables tracking i1 values in the PowerPC backend using the
condition register bits. These bits can be treated on PowerPC as separate
registers; individual bit operations (and, or, xor, etc.) are supported.
Tracking booleans in CR bits has several advantages:

 - Reduction in register pressure (because we no longer need GPRs to store
   boolean values).

 - Logical operations on booleans can be handled more efficiently; we used to
   have to move all results from comparisons into GPRs, perform promoted
   logical operations in GPRs, and then move the result back into condition
   register bits to be used by conditional branches. This can be very
   inefficient, because the throughput of these CR <-> GPR moves have high
   latency and low throughput (especially when other associated instructions
   are accounted for).

 - On the POWER7 and similar cores, we can increase total throughput by using
   the CR bits. CR bit operations have a dedicated functional unit.

Most of this is more-or-less mechanical: Adjustments were needed in the
calling-convention code, support was added for spilling/restoring individual
condition-register bits, and conditional branch instruction definitions taking
specific CR bits were added (plus patterns and code for generating bit-level
operations).

This is enabled by default when running at -O2 and higher. For -O0 and -O1,
where the ability to debug is more important, this feature is disabled by
default. Individual CR bits do not have assigned DWARF register numbers,
and storing values in CR bits makes them invisible to the debugger.

It is critical, however, that we don't move i1 values that have been promoted
to larger values (such as those passed as function arguments) into bit
registers only to quickly turn around and move the values back into GPRs (such
as happens when values are returned by functions). A pair of target-specific
DAG combines are added to remove the trunc/extends in:
  trunc(binary-ops(binary-ops(zext(x), zext(y)), ...)
and:
  zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)
In short, we only want to use CR bits where some of the i1 values come from
comparisons or are used by conditional branches or selects. To put it another
way, if we can do the entire i1 computation in GPRs, then we probably should
(on the POWER7, the GPR-operation throughput is higher, and for all cores, the
CR <-> GPR moves are expensive).

POWER7 test-suite performance results (from 10 runs in each configuration):

SingleSource/Benchmarks/Misc/mandel-2: 35% speedup
MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup
SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup
SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup
SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup

SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown
MultiSource/Applications/lemon/lemon: 8% slowdown

llvm-svn: 202451
2014-02-28 00:27:01 +00:00
Roman Divacky 7a9c6549ba Lower FNEG just like FABS to fneg[ds] and fmov[ds], thus avoiding
expensive libcall. Also, Qp_neg is not implemented on at least
FreeBSD. This is also what gcc is doing.

llvm-svn: 202422
2014-02-27 19:26:29 +00:00
Adrian Prantl 7072073cc9 Debug info: Remove ARMAsmPrinter::EmitDwarfRegOp(). AsmPrinter can now
scan the register file for sub- and super-registers.
No functionality change intended.

(Tests are updated because the comments in the assembler output are
different.)

llvm-svn: 202416
2014-02-27 17:56:08 +00:00
Richard Osborne 521bdf211d [XCore] Support functions returning more than 4 words.
If a function returns a large struct by value return the first 4 words
in registers and the rest on the stack in a location reserved by the
caller. This is needed to support the xC language which supports
functions returning an arbitrary number of return values. This is
r202397 reapplied with a fix to avoid an uninitialized read of a member.

llvm-svn: 202414
2014-02-27 17:47:54 +00:00
Richard Osborne 527aa5052d Revert r202396, r202397.
These are causing test failures, revert for now.

llvm-svn: 202398
2014-02-27 14:24:13 +00:00
Richard Osborne e82bf0988e [XCore] Support functions returning more than 4 words.
Summary:
If a function returns a large struct by value return the first 4 words
in registers and the rest on the stack in a location reserved by the
caller. This is needed to support the xC language which supports
functions returning an arbitrary number of return values.

Reviewers: robertlytton

Reviewed By: robertlytton

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2889

llvm-svn: 202397
2014-02-27 14:00:40 +00:00
Richard Osborne a283d24ad9 [XCore] Target optimized library function __memcpy_4()
Summary:
If the src, dst and size of a memcpy are known to be 4 byte aligned we
can call __memcpy_4() instead of memcpy().

Reviewers: robertlytton

Reviewed By: robertlytton

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2871

llvm-svn: 202395
2014-02-27 13:39:07 +00:00
Richard Osborne d6e85018c5 [XCore] Add dag combines for instructions that ignore some input bits.
These instructions ignore the high bits of one of their input operands -
try and use this to simplify the code.

llvm-svn: 202394
2014-02-27 13:20:11 +00:00
Richard Osborne 2d3a2bee41 [XCore] Provide information about known zero bits of resource instructions.
llvm-svn: 202393
2014-02-27 13:20:06 +00:00
Daniel Sanders 9f088ba322 Stop test/CodeGen/X86/v4i32load-crash.ll targeting non-X86-64 targets.
Summary:
Fixes an issue where a test attempts to use -mcpu=x86-64 on non-X86-64 targets.
This triggers an assertion in the MIPS backend since it doesn't know what ABI to
use by default for unrecognized processors.

CC: llvm-commits, rafael

Differential Revision: http://llvm-reviews.chandlerc.com/D2877

llvm-svn: 202369
2014-02-27 09:24:31 +00:00
Eric Christopher a9a1d27677 Don't emit anything into the debug_ranges section if we aren't emitting
any ranges - this includes CU ranges where we were previously emitting an
end list marker even if we didn't have a list.

Testcase includes a test for line table only code emission as the problem
was noticed while writing this test.

llvm-svn: 202357
2014-02-27 07:44:45 +00:00
Juergen Ributzka 95d11dee8b Revert "Use count 0."
This reverts commit r202283, because when we use GuardMalloc the test will fail
due to additional output to std err.

llvm-svn: 202341
2014-02-27 03:10:10 +00:00
Michel Danzer 9e61c4b6cd R600/SI: Optimize SI_KILL for constant operands
If the SI_KILL operand is constant, we can either clear the exec mask if
the operand is negative, or do nothing otherwise.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 202337
2014-02-27 01:47:09 +00:00
Michel Danzer 6f273c57db R600/SI: Allow SI_KILL for geometry shaders
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 202336
2014-02-27 01:47:02 +00:00
Eric Christopher 740a833a3b If we're only emitting line tables for a particular CU then don't add
any ranges to the list of ranges for the CU as we don't want to emit
them anyway. This ensures that we will still emit ranges if we have
a compile unit compiled with only line tables and one compiled with
full debug info requested (we'll emit for the one with full debug info).

Update testcase metadata accordingly to continue emitting ranges.

llvm-svn: 202333
2014-02-27 01:25:00 +00:00
Eric Christopher 75d49db19b Add a debug info code generation level to the compile unit metadata
and update everything accordingly. This can be used to conditionalize
the amount of output in the backend based on the amount of debug
requested/metadata emission scheme by a front end (e.g. clang).

Paired with a commit to clang.

llvm-svn: 202332
2014-02-27 01:24:56 +00:00
Andrew Trick 9f240f742b Use regnum regex in an XCore test case.
llvm-svn: 202315
2014-02-26 23:22:49 +00:00
Andrew Trick 2560d11e72 Very temporarily XFAILing a test. Will be fixed shortly.
llvm-svn: 202310
2014-02-26 22:39:59 +00:00
Nico Rieck 0a0c674b7a Fix broken FileCheck prefixes
llvm-svn: 202308
2014-02-26 22:29:11 +00:00
Andrew Trick 52a00936b4 Add a limit to the heuristic that register allocates instructions in local order.
This handles pathological cases in which we see 2x increase in spill
code for large blocks (~50k instructions). I don't have a unit test
for this behavior.

Fixes rdar://16072279.

llvm-svn: 202304
2014-02-26 22:07:26 +00:00
Quentin Colombet 85c9e16291 Lower unsigned vsetcc to psubus in certain cases
The current approach to lower a vsetult is to flip the sign bit of the
operands, swap the operands and then use a (signed) pcmpgt.  psubus (unsigned
saturating subtract) can be used to emulate a vsetult more efficiently:

+    case ISD::SETULT: {
+      // If the comparison is against a constant we can turn this into a
+      // setule.  With psubus, setule does not require a swap.  This is
+      // beneficial because the constant in the register is no longer
+      // destructed as the destination so it can be hoisted out of a loop.

I also enable lowering via psubus in a few other cases where it's clearly
beneficial: setule and setuge if minu/maxu cannot be used.
    
rdar://problem/14338765

Patch by Adam Nemet <anemet@apple.com>.

llvm-svn: 202301
2014-02-26 21:39:12 +00:00
Reid Kleckner 22869378d9 GlobalOpt: Apply fastcc to internal x86_thiscallcc functions
We should apply fastcc whenever profitable.  We can expand this list,
but there are lots of conventions with performance implications that we
don't want to change.

Differential Revision: http://llvm-reviews.chandlerc.com/D2705

llvm-svn: 202293
2014-02-26 19:57:30 +00:00
Nico Rieck 773a57958c Relax COFF string table check
COFF object files with 0 as string table size are currently rejected. This
prevents us from reading object files written by tools like cvtres that
violate the PECOFF spec and write 0 instead of 4 for the size of an empty
string table.

llvm-svn: 202292
2014-02-26 19:51:44 +00:00
Nico Rieck 5645b36306 Fix broken FileCheck prefix
llvm-svn: 202291
2014-02-26 19:51:08 +00:00
Rafael Espindola b556fcbdb5 Use count 0.
Thanks to Roman Divacky for the suggestion.

llvm-svn: 202283
2014-02-26 17:57:35 +00:00
Rafael Espindola ae593f1563 Compare DataLayout by Value, not by pointer.
This fixes spurious warnings in llvm-link about the datalayout not matching.

Thanks to Zalman Stern for reporting the bug!

llvm-svn: 202276
2014-02-26 17:02:08 +00:00
Andrew Trick 429e9edd08 Fix PR18165: LSR must avoid scaling factors that exceed the limit on truncated use.
Patch by Michael Zolotukhin!

llvm-svn: 202273
2014-02-26 16:31:56 +00:00
Alexey Samsonov a5f0768f5e llvm-symbolizer: use dynamic symbol table if the regular one is stripped.
llvm-svn: 202265
2014-02-26 13:10:01 +00:00