Commit Graph

11748 Commits

Author SHA1 Message Date
Tom Stellard 021053f500 R600/SI: Fix simple-loop.ll test
llvm-svn: 226596
2015-01-20 19:33:02 +00:00
Tom Stellard 8255af45cb R600/SI: Add kill flag when copying scratch offset to a register
This allows us to re-use the same register for the scratch offset
when accessing large private arrays.

llvm-svn: 226585
2015-01-20 17:49:45 +00:00
Tom Stellard 8058069529 R600/SI: Don't store scratch buffer frame index in MUBUF offset field
We don't have a good way of legalizing this if the frame index offset
is more than the 12-bits, which is size of MUBUF's offset field, so
now we store the frame index in the vaddr field.

llvm-svn: 226584
2015-01-20 17:49:43 +00:00
Kai Nacke e7a647886a [mips] Add registers and ALL check prefix to octeon test case.
No functional change.

Reviewed by D. Sanders

llvm-svn: 226574
2015-01-20 16:14:02 +00:00
Kai Nacke 63072f81b3 [mips] Add octeon branch instructions bbit0/bbit032/bbit1/bbit132
This commits adds the octeon branch instructions bbit0/bbit032/bbit1/bbit132.
It also includes patterns for instruction selection and test cases.

Reviewed by D. Sanders

llvm-svn: 226573
2015-01-20 16:10:51 +00:00
Simon Pilgrim 20bc37c7db [X86][AVX] Missing AVX1 memory folding float instructions
Now that we can create much more exhaustive X86 memory folding tests, this patch adds the missing AVX1/F16C floating point instruction stack foldings we can easily test for including the scalar intrinsics (add, div, max, min, mul, sub), conversions float/int to double, half precision conversions, rounding, dot product and bit test. The patch also adds a couple of obviously missing SSE instructions (more to follow once we have full SSE testing).

Now that scalar folding is working it broke a very old test (2006-10-07-ScalarSSEMiscompile.ll) - this test appears to make no sense as its trying to ensure that a scalar subtraction isn't folded as it 'would zero the top elts of the loaded vector' - this test just appears to be wrong to me.

Differential Revision: http://reviews.llvm.org/D7055

llvm-svn: 226513
2015-01-19 22:40:45 +00:00
Colin LeMahieu 0ee02fc9fe [Hexagon] Updating muxir/ri/ii intrinsics. Setting predicate registers as compatible with i32 rather than doing custom type conversion.
llvm-svn: 226500
2015-01-19 20:31:18 +00:00
Colin LeMahieu fcd4569af6 [Hexagon] Converting intrinsics combine imm/imm, simple shifts and extends.
llvm-svn: 226483
2015-01-19 18:56:19 +00:00
Colin LeMahieu 9327bdad2f [Hexagon] Converting remaining ALU32/ALU intrinsics.
llvm-svn: 226480
2015-01-19 18:33:58 +00:00
Colin LeMahieu 663419b008 [Hexagon] Converting ALU32/ALU intrinsics to new patterns.
llvm-svn: 226478
2015-01-19 18:22:19 +00:00
Greg Fitzgerald fa78d08675 [AArch64] Implement GHC calling convention
Original patch by Luke Iannini.  Minor improvements and test added by
Erik de Castro Lopo.

Differential Revision: http://reviews.llvm.org/D6877

From: Erik de Castro Lopo <erikd@mega-nerd.com>
llvm-svn: 226473
2015-01-19 17:40:05 +00:00
Colin LeMahieu 310bad8b7e [Hexagon] Converting halfword to double accumulating multiply intrinsics.
llvm-svn: 226472
2015-01-19 17:36:32 +00:00
Rafael Espindola 12ca34f53f Bring r226038 back.
No change in this commit, but clang was changed to also produce trivial comdats when
needed.

Original message:

Don't create new comdats in CodeGen.

This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

llvm-svn: 226467
2015-01-19 15:16:06 +00:00
Michael Kuperstein 54c61edee7 [MIScheduler] Slightly better handling of constrainLocalCopy when both source and dest are local
This fixes PR21792.

Differential Revision: http://reviews.llvm.org/D6823

llvm-svn: 226433
2015-01-19 07:30:47 +00:00
Hal Finkel af51993ee1 [PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2
Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call
instructions in order to represent the dependency on the TOC base pointer
value. Restricting this to ELF V2, however, does not seem to make sense: calls
under ELF V1 have the same dependence, and indirect calls have an r2 dependence
just as direct ones. Make sure the dependence is noted for all calls under both
ELF V1 and ELF V2.

llvm-svn: 226432
2015-01-19 07:20:27 +00:00
Matt Arsenault 4843f193ad R600: Remove redundant test
This is already covered in ftrunc.ll

llvm-svn: 226412
2015-01-18 19:30:32 +00:00
Simon Pilgrim 4cf275eab2 [X86][SSE] Added scalar min/max folding tests. NFC.
llvm-svn: 226406
2015-01-18 18:06:23 +00:00
Simon Pilgrim 1d6dcdcacb [X86][SSE] Added float extract and xmm extract/insert stack folding tests. NFC.
llvm-svn: 226405
2015-01-18 17:04:32 +00:00
Simon Pilgrim cd26d0b611 [X86][SSE] Added scalar conversion stack folding tests. NFC.
llvm-svn: 226404
2015-01-18 16:22:15 +00:00
Simon Pilgrim fe3bfb80c9 AVX1 stack folding tests. NFC.
Begun adding more exhaustive tests - all floating point instructions should now be either tested or have placeholders. We do seem to have a number of missing instructions, I will add a patch for review once the remaining working instructions are added.

I'll then move on to SSE tests and then the integer instructions.

llvm-svn: 226400
2015-01-18 12:56:39 +00:00
Hal Finkel f81b6dd7a2 [PowerPC] Initial PPC64 calling-convention changes for fastcc
The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is
designed to work with both prototyped and non-prototyped/varargs functions. As
a result, GPRs and stack space are allocated for every argument, even those
that are passed in floating-point or vector registers.

GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that
do not have their address taken) to use the 'fast' calling convention.

When functions are using the 'fast' calling convention, don't allocate GPRs for
arguments passed in other types of registers, and don't allocate stack space for
arguments passed in registers. Other changes for the fast calling convention
may be added in the future.

llvm-svn: 226399
2015-01-18 12:08:47 +00:00
Hal Finkel c19805a75d [PowerPC] Don't list R11 as a patchpoint scratch register
R11's status is the same under both the PPC64 ELF V1 and V2 ABIs: it is
reserved for use as an "environment pointer" for compilation models that
require such a thing. We don't, we also don't need a second scratch register,
and because we support only "local" patchpoint call targets, we might as well
let R11 be used for anyregcc patchpoints.

llvm-svn: 226369
2015-01-17 03:57:34 +00:00
Mehdi Amini 37f316afaf Improve DAG combine pass on certain IR vector patterns
Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
produced suboptimal code in AVX2 mode with certain IR combinations.

In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
(undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
but then mysteriously generated rather bad code; the movq/movhpd combination
didn't match.

The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
would get promoted to 4f32 by the type legalizer, eventually resulting
in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
these were both half the output size, concatted them and then produced
a shuffle. However, the resulting concat + shuffle was more complex than
it should be; in the case where the upper half of the output is undef, we
probably want to generate shuffle + concat instead.

This enhancement causes the vector_shuffle combine step to recognize this
suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR
in case the same suboptimal pattern occurs for other reasons.

This results in the optimizer correctly producing the optimal movq + movhpd
sequence for all three variations on this IR, even with AVX2.

I've included a test case.

Radar link: rdar://problem/19287012
Fix for PR 21943.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 226360
2015-01-17 01:35:56 +00:00
Matt Arsenault 76723d733b R600: Clean up floor tests
These were using different naming schemes,
not using multiple check prefixes and not using
-LABEL.

llvm-svn: 226333
2015-01-16 22:11:00 +00:00
Colin LeMahieu 823415b881 [Hexagon] Converting halfword to doubleword multiply intrinsics.
llvm-svn: 226326
2015-01-16 21:41:57 +00:00
Colin LeMahieu cd9b276966 [Hexagon] Converting accumulating halfword multiply intrinsics to patterns.
llvm-svn: 226324
2015-01-16 21:36:34 +00:00
Colin LeMahieu 3b047e0ee5 [Hexagon] Beginning converting intrinsics to patterns instead of duplicated definitions. Converting halfword multiply intrinsics.
llvm-svn: 226318
2015-01-16 20:38:54 +00:00
Adam Nemet 3e8b22bc1b [AVX512] Add intrinsics for masked aligned FP loads and stores
Similar to the unaligned cases.

Test was generated with update_llc_test_checks.py.

Part of <rdar://problem/17688758>

llvm-svn: 226296
2015-01-16 18:50:09 +00:00
Adam Nemet 9b8cfa212c [AVX512] Remove trailing whitespaces in this test
llvm-svn: 226295
2015-01-16 18:50:07 +00:00
Andrea Di Biagio ae47bc6ab9 [X86][DAG] Disable target specific combine on INSERTPS dag nodes at -O0.
This patch disables target specific combine on X86ISD::INSERTPS dag nodes
if optlevel is CodeGenOpt::None.

The backend currently implements a target specific combine rule that converts
a vector load used by an INSERTPS dag node into a scalar load plus a
scalar_to_vector. This allows ISel to select a single INSERTPSrm instead of
two instructions (i.e. a vector load plus INSERTPSrr).

However, the existing target combine rule on INSERTPS nodes only works under
the assumption that ISel will always be able to match an INSERTPSrm. This is
not true in general at -O0, since the backend only allows folding a load into
the memory operand of an instruction if the optimization level is not
CodeGenOpt::None.

In the example below:

//
__m128 test(__m128 a, __m128 *b) {
  __m128 c = _mm_insert_ps(a, *b, 1 << 6);
  return c;
}
//

Before this patch, at -O0, the backend would have canonicalized the load to 'b'
into a scalar load plus scalar_to_vector. Later on, ISel would have selected an
INSERTPSrr leaving the insertps mask in an inconsistent state:

  movss 4(%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # xmm0 = xmm1[1],xmm0[1,2,3].

With this patch, the backend avoids folding the vector load into the operand of
the INSERTPS. The new codegen at -O0 is:

  movaps (%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # %xmm1[1],xmm0[1,2,3].

llvm-svn: 226277
2015-01-16 14:55:26 +00:00
Simon Pilgrim 367db8eec6 [X86] Refactored stack memory folding tests to explicitly force register spilling
The current 'big vectors' stack folded reload testing pattern is very bulky and makes it difficult to test all instructions as big vectors will tend to use only the ymm instruction implementations.

This patch changes the tests to use a nop call that lists explicit xmm registers as sideeffects, with this we can force a partial register spill of the relevant registers and then check that the reload is correctly folded. The asm generated only adds the forced spill, a nop instruction and a couple of extra labels (a fraction of the current approach).

More exhaustive tests will follow shortly, I've added some extra tests (the xmm versions of some of the existing folding tests) as a starting point.

Differential Revision: http://reviews.llvm.org/D6932

llvm-svn: 226264
2015-01-16 09:32:54 +00:00
Timur Iskhodzhanov 60b721363c Revert r226242 - Revert Revert Don't create new comdats in CodeGen
This breaks AddressSanitizer (ninja check-asan) on Windows

llvm-svn: 226251
2015-01-16 08:38:45 +00:00
Hal Finkel 52f7c018d3 [PowerPC] Adjust PatchPoints for ppc64le
Bill Schmidt pointed out that some adjustments would be needed to properly
support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available
as a scratch register, so we need to use R12. R12 is also available under ELF
V1, so to maintain consistency, I flipped the order to make R12 the first
scratch register in the array under both ABIs.

llvm-svn: 226247
2015-01-16 04:40:58 +00:00
Rafael Espindola 67a79e72f5 Revert "Revert Don't create new comdats in CodeGen"
This reverts commit r226173, adding r226038 back.

No change in this commit, but clang was changed to also produce trivial comdats for
costructors, destructors and vtables when needed.

Original message:

Don't create new comdats in CodeGen.

This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

llvm-svn: 226242
2015-01-16 02:22:55 +00:00
Matt Arsenault eeb2a7e688 R600/SI: Add patterns for v_cvt_{flr|rpi}_i32_f32
llvm-svn: 226230
2015-01-15 23:58:35 +00:00
Matt Arsenault 268757ba60 R600/SI: Fix trailing comma with modifiers
Instructions with 1 operand can still use source modifiers,
so make sure we don't print an extra comma afterwards.

llvm-svn: 226226
2015-01-15 23:17:03 +00:00
Hal Finkel e2ab0f17cf [PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect calls
Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the
POWER7, A2 and earlier cores) are really pointers to a function descriptor, a
structure with three pointers: the actual pointer to the code to which to jump,
the pointer to the TOC needed by the callee, and an environment pointer. We
used to chain these loads, and make them opaque to the rest of the optimizer,
so that they'd always occur directly before the call. This is not necessary,
and in fact, highly suboptimal on embedded cores. Once the function pointer is
known, the loads can be performed ahead of time; in fact, they can be hoisted
out of loops.

Now these function descriptors are almost always generated by the linker, and
thus the contents of the descriptors are invariant. As a result, by default,
we'll mark the associated loads as invariant (allowing them to be hoisted out
of loops). I've added a target feature to turn this off, however, just in case
someone needs that option (constructing an on-stack descriptor, casting it to a
function pointer, and then calling it cannot be well-defined C/C++ code, but I
can imagine some JIT-compilation system doing so).

Consider this simple test:
  $ cat call.c

  typedef void (*fp)();
  void bar(fp x) {
    for (int i = 0; i < 1600000000; ++i)
      x();
  }

  $ cat main.c

  typedef void (*fp)();
  void bar(fp x);
  void foo() {}
  int main() {
    bar(foo);
  }

On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads
as invariant brings the execution time down to ~8 seconds from ~32 seconds with
the loads in the loop.

The difference on the POWER7 is smaller. Compiling with:

  gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2]

  clang -O3 -mcpu=native call.c main.c : ~5.3 seconds

  clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds
  (looks like we'd benefit from additional loop unrolling here, as a first
   guess, because this is faster with the extra loads)

The -mno-invariant-function-descriptors will be added to Clang shortly.

llvm-svn: 226207
2015-01-15 21:17:34 +00:00
Colin LeMahieu f87697f05e [Hexagon] Updating indexed load-extend patterns and changing test to new expected output.
llvm-svn: 226206
2015-01-15 21:07:52 +00:00
Hal Finkel 5ef58eb86d Revert "r226086 - Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers""
Reapply r226071 with fixes. Two fixes:

 1. We need to manually remove the old and create the new 'deaf defs'
    associated with physical register definitions when we move the definition of
    the physical register from the copy point to the point of the original vreg def.

    This problem was picked up by the machinstr verifier, and could trigger a
    verification failure on test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll, so I've
    turned on the verifier in the tests.

 2. When moving the def point of the phys reg up, we need to make sure that it
    is neither defined nor read in between the two instructions. We don't, however,
    extend the live ranges of phys reg defs to cover uses, so just checking for
    live-range overlap between the pair interval and the phys reg aliases won't
    pick up reads. As a result, we manually iterate over the range and check for
    reads.

    A test soon to be committed to the PowerPC backend will test this change.

Original commit message:

[RegisterCoalescer] Remove copies to reserved registers

This allows the RegisterCoalescer to join "non-flipped" range pairs with a
physical destination register -- which allows the RegisterCoalescer to remove
copies like this:

<vreg> = something (maybe a load, for example)
... (things that don't use PHYSREG)
PHYSREG = COPY <vreg>

(with all of the restrictions normally applied by the RegisterCoalescer: having
compatible register classes, etc. )

Previously, the RegisterCoalescer handled only the opposite case (copying
*from* a physical register). I don't handle the problem fully here, but try to
get the common case where there is only one use of <vreg> (the COPY).

An upcoming commit to the PowerPC backend will make this pattern much more
common on PPC64/ELF systems.

llvm-svn: 226200
2015-01-15 20:32:09 +00:00
Matt Arsenault 59b09ab9ef R600/SI: Improve fpext / fptrunc test coverage
llvm-svn: 226197
2015-01-15 19:39:42 +00:00
Marek Olsak c536850526 R600/SI: Use 64-bit encoding by default for opcodes that are VOP3-only on VI
llvm-svn: 226190
2015-01-15 18:43:01 +00:00
Ramkumar Ramachandra bb406c0b9a statepoint tests: use statepoint-example gc
Mechanical conversion of statepoint tests to use the example-statepoint
gc.

llvm-svn: 226183
2015-01-15 18:10:44 +00:00
Colin LeMahieu 2d1c14563e [Hexagon] Deleting old float comparison instruction and updating references to new ones.
llvm-svn: 226179
2015-01-15 17:28:14 +00:00
Colin LeMahieu 7959cac725 [Hexagon] Replacing old fadd/fsub instructions and updating references.
llvm-svn: 226176
2015-01-15 16:30:07 +00:00
Timur Iskhodzhanov f5adf13fac Revert Don't create new comdats in CodeGen
It breaks AddressSanitizer on Windows.

llvm-svn: 226173
2015-01-15 16:14:34 +00:00
Daniel Sanders 023c806109 [mips] Fix a typo in the compare patterns for MIPS32r6/MIPS64r6.
Summary: The patterns intended for the SETLE node were actually matching the SETLT node.

Reviewers: atanasyan, sstankovic, vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6997

llvm-svn: 226171
2015-01-15 15:41:03 +00:00
Hal Finkel dd669615dd Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers"
Reverting this while I investigate some bad behavior this is causing. As a
possibly-related issue, adding -verify-machineinstrs to one of the test cases
now fails because of this change:

  llc test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll -march=x86-64 -o - -verify-machineinstrs

*** Bad machine code: No instruction at def index ***
- function:    foo
- basic block: BB#0 return (0x10007e21f10) [0B;736B)
- liverange:   [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78
4r,784d:0)  0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r
- register:    %DS
Valno #3 is defined at 624r

*** Bad machine code: Live segment doesn't end at a valid instruction ***
- function:    foo
- basic block: BB#0 return (0x10007e21f10) [0B;736B)
- liverange:   [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78
4r,784d:0)  0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r
- register:    %DS
[624r,624d:3)
LLVM ERROR: Found 2 machine code errors.

where 624r corresponds exactly to the interval combining change:

624B    %RSP<def> = COPY %vreg16; GR64:%vreg16
        Considering merging %vreg16 with %RSP
                RHS = %vreg16 [608r,624r:0)  0@608r
                updated: 608B   %RSP<def> = MOV64rm <fi#3>, 1, %noreg, 0, %noreg; mem:LD8[%saved_stack.1]
        Success: %vreg16 -> %RSP
        Result = %RSP

llvm-svn: 226086
2015-01-15 03:08:59 +00:00
Hal Finkel 8299646236 [RegisterCoalescer] Remove copies to reserved registers
This allows the RegisterCoalescer to join "non-flipped" range pairs with a
physical destination register -- which allows the RegisterCoalescer to remove
copies like this:

<vreg> = something (maybe a load, for example)
... (things that don't use PHYSREG)
PHYSREG = COPY <vreg>

(with all of the restrictions normally applied by the RegisterCoalescer: having
compatible register classes, etc. )

Previously, the RegisterCoalescer handled only the opposite case (copying
*from* a physical register). I don't handle the problem fully here, but try to
get the common case where there is only one use of <vreg> (the COPY).

An upcoming commit to the PowerPC backend will make this pattern much more
common on PPC64/ELF systems.

llvm-svn: 226071
2015-01-15 01:25:28 +00:00
Philip Reames 8d5d68f1aa getMangledTypeStr: clarify how it mangles types, and add tests
"Write a set of tests that show how name mangling is done for overloaded intrinsics."  These happen to use gc.relocates to exercise the codepath in question, but is not a GC specific test.

Patch by: artagnon@gmail.com
Differential Revision: http://reviews.llvm.org/D6915

llvm-svn: 226056
2015-01-14 23:05:17 +00:00
Duncan P. N. Exon Smith 9885469922 IR: Move MDLocation into place
This commit moves `MDLocation`, finishing off PR21433.  There's an
accompanying clang commit for frontend testcases.  I'll attach the
testcase upgrade script I used to PR21433 to help out-of-tree
frontends/backends.

This changes the schema for `DebugLoc` and `DILocation` from:

    !{i32 3, i32 7, !7, !8}

to:

    !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8)

Note that empty fields (line/column: 0 and inlinedAt: null) don't get
printed by the assembly writer.

llvm-svn: 226048
2015-01-14 22:27:36 +00:00
Rafael Espindola fad1639a12 Don't create new comdats in CodeGen.
This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

llvm-svn: 226038
2015-01-14 20:55:48 +00:00
Chandler Carruth e3288147f0 [MBP] Add flags to disable the BadCFGConflict check in MachineBlockPlacement.
Some benchmarks have shown that this could lead to a potential
performance benefit, and so adding some flags to try to help measure the
difference.

A possible explanation. In diamond-shaped CFGs (A followed by either
B or C both followed by D), putting B and C both in between A and
D leads to the code being less dense than it could be. Always either
B or C have to be skipped increasing the chance of cache misses etc.
Moving either B or C to after D might be beneficial on average.

In the long run, but we should probably do a better job of analyzing the
basic block and branch probabilities to move the correct one of B or
C to after D. But even if we don't use this in the long run, it is
a good baseline for benchmarking.

Original patch authored by Daniel Jasper with test tweaks and a second
flag added by me.

Differential Revision: http://reviews.llvm.org/D6969

llvm-svn: 226034
2015-01-14 20:19:29 +00:00
Bill Schmidt 082cfc05f1 [PPC64] Add support for the ICBT instruction on POWER8.
Patch by Kit Barton.

Support for the ICBT instruction is currently present, but limited to
embedded processors. This change adds a new FeatureICBT that can be used
to identify whether the ICBT instruction is available on a specific processor.

Two new tests are added:
 * Positive test to ensure the icbt instruction is present when using
-mcpu=pwr8
 * Negative test to ensure the icbt instruction is not generated when
using -mcpu=pwr7

Both test cases use the Prefetch opcode in LLVM. They are based on the
ppc64-prefetch.ll test case.

llvm-svn: 226033
2015-01-14 20:17:10 +00:00
Olivier Sallenave d657321aef Check that the TLI callback enableAggressiveFMAFusion has the desired effect on FMA folding.
llvm-svn: 225987
2015-01-14 15:36:28 +00:00
Kai Nacke 755b6e8a42 [mips] Refine octeon instructions seq/seqi/sne/snei
This commit refines the pattern for the octeon seq/seqi/sne/snei instructions.
The target register is set to 0 or 1 according to the result of the comparison.
In C, this is something like

rd = (unsigned long)(rs == rt)

This commit adds a zext to bring the result to i64. With this change the
instruction is selected for this type of code. (gcc produces the same code for
the above C code.)

llvm-svn: 225968
2015-01-14 10:19:09 +00:00
Brad Smith dd6675cef9 Use the integrated assembler by default on SPARC.
llvm-svn: 225957
2015-01-14 07:53:39 +00:00
JF Bastien eeea8970b4 Revert "Insert random noops to increase security against ROP attacks (llvm)"
This reverts commit:
http://reviews.llvm.org/D3392

llvm-svn: 225948
2015-01-14 05:24:33 +00:00
NAKAMURA Takumi c16c427ebc Disable a couple of tests, CodeGen/X86/noop-insert.ll and CodeGen/X86/noop-insert-percentage.ll, in r225908, to unbreak tests.
llvm-svn: 225940
2015-01-14 04:21:33 +00:00
Tim Northover a203ca61af ARM: add test for crc32 instructions in CodeGen.
Somehow we seem to have ended up without any actual tests of the
CodeGen side. Easy enough to fix.

llvm-svn: 225930
2015-01-14 01:43:33 +00:00
Hal Finkel 2307a2f088 [PowerPC] Fix the noop-insert test
The form of nops used is CPU-specific (some CPUs, such as the POWER7, have
special group-terminating nops). We probably want a different callback for this
kind of nop insertion (something more like MCAsmBackend::writeNopData), or for
PPC to use a different mechanism for scheduling nops, but this will stop the
test from failing for now.

llvm-svn: 225928
2015-01-14 01:37:21 +00:00
Matt Arsenault edb6f03852 R600/SI: Remove some redudant load testcases.
This reduces coverage for Evergreen, since the more
complete tests have those run lines disabled.

llvm-svn: 225927
2015-01-14 01:35:26 +00:00
Matt Arsenault e698663687 R600/SI: Fix bad code with unaligned byte vector loads
Don't do the v4i8 -> v4f32 combine if the load will need to
be expanded due to alignment. This stops adding instructions
to repack into a single register that the v_cvt_ubyteN_f32
instructions read.

llvm-svn: 225926
2015-01-14 01:35:22 +00:00
Matt Arsenault bd22342322 Implement new way of expanding extloads.
Now that the source and destination types can be specified,
allow doing an expansion that doesn't use an EXTLOAD of the
result type. Try to do a legal extload to an intermediate type
and extend that if possible.

This generalizes the special case custom lowering of extloads
R600 has been using to work around this problem.

This also happens to fix a bug that would incorrectly use more
aligned loads than should be used.

llvm-svn: 225925
2015-01-14 01:35:17 +00:00
Hal Finkel 934361a4b8 Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""
This re-applies r225808, fixed to avoid problems with SDAG dependencies along
with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs.
These problems caused the original regression tests to assert/segfault on many
(but not all) systems.

Original commit message:

This commit does two things:

 1. Refactors PPCFastISel to use more of the common infrastructure for call
    lowering (this lets us take advantage of this common code for lowering some
    common intrinsics, stackmap/patchpoint among them).

 2. Adds support for stackmap/patchpoint lowering. For the most part, this is
    very similar to the support in the AArch64 target, with the obvious differences
    (different registers, NOP instructions, etc.). The test cases are adapted
    from the AArch64 test cases.

One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).

StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!

llvm-svn: 225909
2015-01-14 01:07:51 +00:00
JF Bastien dcdd5ad252 Insert random noops to increase security against ROP attacks (llvm)
A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks.

Command line options:
  -noop-insertion // Enable noop insertion.
  -noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion)
  -max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions.

In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future).
  -fdiversify

This is the llvm part of the patch.
clang part: D3393

http://reviews.llvm.org/D3392
Patch by Stephen Crane (@rinon)

llvm-svn: 225908
2015-01-14 01:07:26 +00:00
Reid Kleckner 0a57f65514 CodeGen support for x86_64 SEH catch handlers in LLVM
This adds handling for ExceptionHandling::MSVC, used by the
x86_64-pc-windows-msvc triple. It assumes that filter functions have
already been outlined in either the frontend or the backend. Filter
functions are used in place of the landingpad catch clause type info
operands. In catch clause order, the first filter to return true will
catch the exception.

The C specific handler table expects the landing pad to be split into
one block per handler, but LLVM IR uses a single landing pad for all
possible unwind actions. This patch papers over the mismatch by
synthesizing single instruction BBs for every catch clause to fill in
the EH selector that the landing pad block expects.

Missing functionality:
- Accessing data in the parent frame from outlined filters
- Cleanups (from __finally) are unsupported, as they will require
  outlining and parent frame access
- Filter clauses are unsupported, as there's no clear analogue in SEH

In other words, this is the minimal set of changes needed to write IR to
catch arbitrary exceptions and resume normal execution.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6300

llvm-svn: 225904
2015-01-14 01:05:27 +00:00
Adam Nemet d23c88db15 [AVX512] Add 16x32 unpck tests as well
Forgot this from r225838.

llvm-svn: 225850
2015-01-13 23:27:55 +00:00
Adam Nemet f07a0ba96c Fix function names in tests from r225838.
llvm-svn: 225840
2015-01-13 22:40:15 +00:00
Adam Nemet e5dbcb7fd0 [AVX512] Unpack support in new shuffle lowering
This now handles both 32 and 64-bit element sizes.

In this version, the test are in vector-shuffle-512-v8.ll, canonicalized by
Chandler's update_llc_test_checks.py.

Part of <rdar://problem/17688758>

llvm-svn: 225838
2015-01-13 22:20:18 +00:00
Matt Arsenault e93d06a579 R600: Implement getRsqrtEstimate
Only do for f32 since I'm unclear on both what this is expecting
for the refinement steps in terms of accuracy, and what
f64 instruction actually provides.

llvm-svn: 225827
2015-01-13 20:53:18 +00:00
Matt Arsenault b56d843348 R600: Make cttz / ctlz cheap to speculate
Speculating things is generally good. SI+ has instructions for these
for 32-bit values. This is still probably better even with the expansion
for 64-bit values, although it is odd that this callback doesn't have
the size as a parameter.

llvm-svn: 225822
2015-01-13 19:46:48 +00:00
Ulrich Weigand bd039299c0 Use the integrated assembler as default on SystemZ
This was already done in clang, this commit now uses the integrated
assembler as default when using LLVM tools directly.

A number of test cases deliberately using an invalid instruction in
inline asm now have to use -no-integrated-as.

llvm-svn: 225820
2015-01-13 19:45:16 +00:00
Ulrich Weigand 6b577e26f0 Use the integrated assembler as default on PowerPC
This was already done in clang, this commit now uses the integrated
assembler as default when using LLVM tools directly.

A number of test cases using inline asm had to be adapted, either by
updating the expected output, or by using -no-integrated-as (for such
tests that deliberately use an invalid instruction in inline asm).

llvm-svn: 225819
2015-01-13 19:43:45 +00:00
Hal Finkel 63fb928109 Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support"
Reverting this while I investiage buildbot failures (segfaulting in
GetCostForDef at ScheduleDAGRRList.cpp:314).

llvm-svn: 225811
2015-01-13 18:25:05 +00:00
Hal Finkel 821befd52b [PowerPC] Add StackMap/PatchPoint support
This commit does two things:

 1. Refactors PPCFastISel to use more of the common infrastructure for call
    lowering (this lets us take advantage of this common code for lowering some
    common intrinsics, stackmap/patchpoint among them).

 2. Adds support for stackmap/patchpoint lowering. For the most part, this is
    very similar to the support in the AArch64 target, with the obvious differences
    (different registers, NOP instructions, etc.). The test cases are adapted
    from the AArch64 test cases.

One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).

StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!

llvm-svn: 225808
2015-01-13 17:48:12 +00:00
Jozef Kolek e7cad7a1df [mips][microMIPS] Fix issue with 16b instructions in jr instruction delay slot
16 bit instructions are not allowed in jr delay slot. Same stands for
PseudoIndirectBranch and PseudoReturn.

Differential Revision: http://reviews.llvm.org/D6815

llvm-svn: 225798
2015-01-13 15:59:17 +00:00
Reid Kleckner 3542ace6ef Rename llvm.recoverframeallocation to llvm.framerecover
This name is less descriptive, but it sort of puts things in the
'llvm.frame...' namespace, relating it to frameallocate and
frameaddress. It also avoids using "allocate" and "allocation" together.

llvm-svn: 225752
2015-01-13 01:51:34 +00:00
Reid Kleckner e9b8931873 Add the llvm.frameallocate and llvm.recoverframeallocation intrinsics
These intrinsics allow multiple functions to share a single stack
allocation from one function's call frame. The function with the
allocation may only perform one allocation, and it must be in the entry
block.

Functions accessing the allocation call llvm.recoverframeallocation with
the function whose frame they are accessing and a frame pointer from an
active call frame of that function.

These intrinsics are very difficult to inline correctly, so the
intention is that they be introduced rarely, or at least very late
during EH preparation.

Reviewers: echristo, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D6493

llvm-svn: 225746
2015-01-13 00:48:10 +00:00
Matt Arsenault a982e4f82b Combine fcmp + select to fminnum / fmaxnum if no nans and legal
Also require unsafe FP math for no since there isn't a way to
test for signed zeros.

llvm-svn: 225744
2015-01-13 00:43:00 +00:00
Reid Kleckner bba20f06de musttail: Only set the inreg flag for fastcall and vectorcall
Otherwise we'll attempt to forward ECX, EDX, and EAX for cdecl and
stdcall thunks, leaving us with no scratch registers for indirect call
targets.

Fixes PR22052.

llvm-svn: 225729
2015-01-12 23:28:23 +00:00
Adrian Prantl b16d9ebb0c Debug info: Factor out the creation of DWARF expressions from AsmPrinter
into a new class DwarfExpression that can be shared between AsmPrinter
and DwarfUnit.

This is the first step towards unifying the two entirely redundant
implementations of dwarf expression emission in DwarfUnit and AsmPrinter.

Almost no functional change — Testcases were updated because asm comments
that used to be on two lines now appear on the same line, which is
actually preferable.

llvm-svn: 225706
2015-01-12 22:19:22 +00:00
Ahmed Bougacha 291833b959 [X86] Also create+widen FMIN/FMAX nodes for v2f32.
This happens in the HINT benchmark, where the SLP-vectorizer created
v2f32 fcmp/select code.  The "correct" solution would have been to
teach the vectorizer cost model that v2f32 isn't legal (because really,
it isn't), but if we can vectorize we might as well do so.

We legalize these v2f32 FMIN/FMAX nodes by widening to v4f32 later on.
v3f32 were already widened to v4f32 by the generic unroll-and-build-vector
legalization.

rdar://15763436
Differential Revision: http://reviews.llvm.org/D6557

llvm-svn: 225691
2015-01-12 20:31:30 +00:00
Ahmed Bougacha 66fde538ee [X86] Make SSE min/max testcases more explicit. NFC.
llvm-svn: 225687
2015-01-12 20:15:47 +00:00
Tom Stellard b6550529a6 R600/SI: Use RegisterOperands to specify which operands can accept immediates
There are some operands which can take either immediates or registers
and we were previously using different register class to distinguish
between operands that could take immediates and those that could not.

This patch switches to using RegisterOperands which should simplify the
backend by reducing the number of register classes and also make it
easier to implement the assembler.

llvm-svn: 225662
2015-01-12 19:33:18 +00:00
Hal Finkel 87deb0b8e3 [PowerPC] Fix calls to non-function objects
Looking at r225438 inspired me to see how the PowerPC backend handled the
situation (calling a bitcasted TLS global), and it turns out we also produced
an error (cannot select ...). What it means to "call" something that is not a
function is implementation and platform specific, but in the name of doing
something (besides crashing), this makes sure we do what GCC does (treat all
such calls as calls through a function pointer -- meaning that the pointer is
assumed, as is the convention on PPC, to point to a function descriptor
structure holding the actual code address along with the function's TOC pointer
and environment pointer). As GCC does, we now do the same for calling regular
(non-TLS) non-function globals too.

I'm not sure whether this is the most useful way to define the behavior, but at
least we won't be alone.

llvm-svn: 225617
2015-01-12 04:34:47 +00:00
David Majnemer 14141f941a Revert most of r225597
We can't rely on a DataLayout enlightened constant folder.

llvm-svn: 225599
2015-01-11 07:29:51 +00:00
David Majnemer 292d0c796b X86: Properly decode shuffle masks when the constant pool type is weird
It's possible for the constant pool entry for the shuffle mask to come
from a completely different operation.  This occurs when Constants have
the same bit pattern but have different types.

Make DecodePSHUFBMask tolerant of types which, after a bitcast, are
appropriately sized vector types.

This fixes PR22188.

llvm-svn: 225597
2015-01-11 05:08:57 +00:00
Saleem Abdulrasool 9cf2679d3b X86: teach X86TargetLowering about L,M,O constraints
Teach the ISelLowering for X86 about the L,M,O target specific constraints.
Although, for the moment, clang performs constraint validation and prevents
passing along inline asm which may have immediate constant constraints violated,
the backend should be able to cope with the invalid inline asm a bit better.

llvm-svn: 225596
2015-01-11 04:39:24 +00:00
Chandler Carruth c491f72e7a [x86] Remove some windows line endings that snuck into the tests here.
Folks on Windows, remember to set up your subversion to strip these when
submitting...

llvm-svn: 225593
2015-01-11 01:36:20 +00:00
Sanjoy Das 81401d4b19 Fix PR22179.
We were incorrectly inferring nsw for certain SCEVs. We can be more
aggressive here (see Richard Smith's comment on
http://llvm.org/bugs/show_bug.cgi?id=22179) but this change just
focuses on correctness.

Differential Revision: http://reviews.llvm.org/D6914

llvm-svn: 225591
2015-01-10 23:41:24 +00:00
Simon Pilgrim 94a4cc027a [X86][SSE] Improved (v)insertps shuffle matching
In the current code we only attempt to match against insertps if we have exactly one element from the second input vector, irrespective of how much of the shuffle result is zeroable.

This patch checks to see if there is a single non-zeroable element from either input that requires insertion. It also supports matching of cases where only one of the inputs need to be referenced.

We also split insertps shuffle matching off into a new lowerVectorShuffleAsInsertPS function.

Differential Revision: http://reviews.llvm.org/D6879

llvm-svn: 225589
2015-01-10 19:45:33 +00:00
Hal Finkel 5d5d1539cc [PowerPC] Mark zext of a small scalar load as free
This initial implementation of PPCTargetLowering::isZExtFree marks as free
zexts of small scalar loads (that are not sign-extending). This callback is
used by SelectionDAGBuilder's RegsForValue::getCopyToRegs, and thus to
determine whether a zext or an anyext is used to lower illegally-typed PHIs.
Because later truncates of zero-extended values are nops, this allows for the
elimination of later unnecessary truncations.

Fixes the initial complaint associated with PR22120.

llvm-svn: 225584
2015-01-10 08:21:59 +00:00
Justin Hibbits 654346e6f9 Fully fix Bug #22115.
Summary:
In the previous commit, the register was saved, but space was not allocated.
This resulted in the parameter save area potentially clobbering r30, leading to
nasty results.

Test Plan: Tests updated

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6906

llvm-svn: 225573
2015-01-10 01:57:21 +00:00
Simon Pilgrim ec1f2c2cab [X86][SSE] Avoid vector byte shuffles with zero by using pshufb to create zeros
pshufb can shuffle in zero bytes as well as bytes from a source vector - we can use this to avoid having to shuffle 2 vectors and ORing the result when the used inputs from a vector are all zeroable.

Differential Revision: http://reviews.llvm.org/D6878

llvm-svn: 225551
2015-01-09 22:03:19 +00:00
Daniel Sanders 1440bb2a26 [mips] Add support for accessing $gp as a named register.
Summary:
Mips Linux uses $gp to hold a pointer to thread info structure and accesses it
with a named register. This makes this work for LLVM.

The N32 ABI doesn't quite work yet since the frontend generates incorrect IR
for this case. It neglects to truncate the 64-bit GPR to a 32-bit value before
converting to a pointer. Given correct IR (as in the testcase in this patch),
it works correctly.

Reviewers: sstankovic, vmedic, atanasyan

Reviewed By: atanasyan

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6893

llvm-svn: 225529
2015-01-09 17:21:30 +00:00
Matthias Braun 7e87384592 RegisterCoalescer: Fix removeCopyByCommutingDef with subreg liveness
The code that eliminated additional coalescable copies in
removeCopyByCommutingDef() used MergeValueNumberInto() which internally
may merge A into B or B into A. In this case A and B had different Def
points, so we have to reset ValNo.Def to the intended one after merging.

llvm-svn: 225503
2015-01-09 03:01:31 +00:00
Hal Finkel 6c39269a4c [PowerPC] Fold [sz]ext with fp_to_int lowering where possible
On modern cores with lfiw[az]x, we can fold a sign or zero extension from i32
to i64 into the load necessary for an i64 -> fp conversion.

llvm-svn: 225493
2015-01-09 01:34:30 +00:00
Hal Finkel 3c0952b072 [PowerPC] Mark all instructions as non-cheap for MachineLICM
MachineLICM uses a callback named hasLowDefLatency to determine if an
instruction def operand has a 'low' latency. If all relevant operands have a
'low' latency, the instruction is considered too cheap to hoist out of loops
even in low-register-pressure situations. On PowerPC cores, both the embedded
cores and the others, there is no reason to believe that this is a good choice:
all instructions have a cost inside a loop, and hoisting them when not limited
by register pressure is a reasonable default.

llvm-svn: 225471
2015-01-08 22:11:49 +00:00
Akira Hatanaka 442b40c2eb [ARM] Fix a bug in constant island pass that was triggering an assertion.
The assert was being triggered when the distance between a constant pool entry
and its user exceeded the maximally allowed distance after thumb2 branch
shortening. A padding was inserted after a thumb2 branch instruction was shrunk,
which caused the user to be out of range. This is wrong as the padding should
have been inserted by the layout algorithm so that the distance between two
instructions doesn't grow later during thumb2 instruction optimization.

This commit fixes the code in ARMConstantIslands::createNewWater to call
computeBlockSize and set BasicBlock::Unalign when a branch instruction is
inserted to create new water after a basic block. A non-zero Unalign causes
the worst-case padding to be inserted when adjustBBOffsetsAfter is called to
recompute the basic block offsets.

rdar://problem/19130476

llvm-svn: 225467
2015-01-08 20:44:50 +00:00
Justin Hibbits 98a532dd8e Add saving and restoring of r30 to the prologue and epilogue, respectively
Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register).  This does that.

Test Plan: Tests updated.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6876

llvm-svn: 225450
2015-01-08 15:47:19 +00:00
Kristof Beyls 933de7aa06 Fix large stack alignment codegen for ARM and Thumb2 targets
This partially fixes PR13007 (ARM CodeGen fails with large stack
alignment): for ARM and Thumb2 targets, but not for Thumb1, as it
seems stack alignment for Thumb1 targets hasn't been supported at
all.

Producing an aligned stack pointer is done by zero-ing out the lower
bits of the stack pointer. The BIC instruction was used for this.
However, the immediate field of the BIC instruction only allows to
encode an immediate that can zero out up to a maximum of the 8 lower
bits. When a larger alignment is requested, a BIC instruction cannot
be used; llvm was silently producing incorrect code in this case.

This commit fixes code generation for large stack aligments by
using the BFC instruction instead, when the BFC instruction is
available.  When not, it uses 2 instructions: a right shift,
followed by a left shift to zero out the lower bits.

The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code
that unconditionally uses BIC to realign the stack pointer, so it
very likely has the same problem. However, I wasn't able to
produce a test case for that. This commit adds an assert so that
the compiler will fail the assert instead of silently generating
wrong code if this is ever reached.

llvm-svn: 225446
2015-01-08 15:09:14 +00:00
Tom Stellard 654d669e56 R600/SI: Remove SIISelLowering::legalizeOperands()
Its functionality has been replaced by calling
SIInstrInfo::legalizeOperands() from
SIISelLowering::AdjstInstrPostInstrSelection() and running the
SIFoldOperands and SIShrinkInstructions passes.

llvm-svn: 225445
2015-01-08 15:08:17 +00:00
Elena Demikhovsky 285fbd551a Masked Load/Store - fixed a bug in type legalization.
llvm-svn: 225441
2015-01-08 12:29:19 +00:00
Michael Kuperstein 381dc08bc1 Fix a think-o in the test for r225438.
llvm-svn: 225440
2015-01-08 12:05:02 +00:00
Michael Kuperstein 46f7d525c3 [X86] Don't try to generate direct calls to TLS globals
The call lowering assumes that if the callee is a global, we want to emit a direct call.
This is correct for regular globals, but not for TLS ones.

Differential Revision: http://reviews.llvm.org/D6862

llvm-svn: 225438
2015-01-08 11:50:58 +00:00
Craig Topper 0c4d51b779 Fix test case I missed in r225432.
llvm-svn: 225434
2015-01-08 07:57:27 +00:00
Quentin Colombet a799e2e014 [RegAllocGreedy] Introduce a late pass to repair broken hints.
A broken hint is a copy where both ends are assigned different colors. When a
variable gets evicted in the neighborhood of such copies, it is likely we can
reconcile some of them.


** Context **

Copies are inserted during the register allocation via splitting. These split
points are required to relax the constraints on the allocation problem. When
such a point is inserted, both ends of the copy would not share the same color
with respect to the current allocation problem. When variables get evicted,
the allocation problem becomes different and some split point may not be
required anymore. However, the related variables may already have been colored.

This usually shows up in the assembly with pattern like this:
def A
...
save A to B
def A
use A
restore A from B
...
use B

Whereas we could simply have done:
def B
...
def A
use A
...
use B


** Proposed Solution **

A variable having a broken hint is marked for late recoloring if and only if
selecting a register for it evict another variable. Indeed, if no eviction
happens this is pointless to look for recoloring opportunities as it means the
situation was the same as the initial allocation problem where we had to break
the hint.

Finally, when everything has been allocated, we look for recoloring
opportunities for all the identified candidates.
The recoloring is performed very late to rely on accurate copy cost (all
involved variables are allocated).
The recoloring is simple unlike the last change recoloring. It propagates the
color of the broken hint to all its copy-related variables. If the color is
available for them, the recoloring uses it, otherwise it gives up on that hint
even if a more complex coloring would have worked.

The recoloring happens only if it is profitable. The profitability is evaluated
using the expected frequency of the copies of the currently recolored variable
with a) its current color and b) with the target color. If a) is greater or
equal than b), then it is profitable and the recoloring happen.


** Example **

Consider the following example:
BB1:
  a =
  b =
BB2:
  ...
   = b
   = a
Let us assume b gets split:
BB1:
  a =
  b =
BB2:
  c = b
  ...
  d = c
  = d
  = a
Because of how the allocation work, b, c, and d may be assigned different
colors. Now, if a gets evicted to make room for c, assuming b and d were
assigned to something different than a.
We end up with:
BB1:
  a =
  st a, SpillSlot
  b =
BB2:
  c = b
  ...
  d = c
  = d
  e = ld SpillSlot
  = e
This is likely that we can assign the same register for b, c, and d,
getting rid of 2 copies.


** Performances **

Both ARM64 and x86_64 show performance improvements of up to 3% for the
llvm-testsuite + externals with Os and O3. There are a few regressions too that
comes from the (in)accuracy of the block frequency estimate.

<rdar://problem/18312047>

llvm-svn: 225422
2015-01-08 01:16:39 +00:00
Matthias Braun d55e6ddacf RegisterCoalescer: Fix valuesIdentical() in some subrange merge cases.
I got confused and assumed SrcIdx/DstIdx of the CoalescerPair is a
subregister index in SrcReg/DstReg, but they are actually subregister
indices of the coalesced register that get you back to SrcReg/DstReg
when applied.

Fixed the bug, improved comments and simplified code accordingly.

Testcase by Tom Stellard!

llvm-svn: 225415
2015-01-07 23:58:38 +00:00
Philip Reames 76ebd15437 [GC] improve testing around gc.relocate and fix a test
Patch by: Ramkumar Ramachandra <artagnon@gmail.com>

"This patch started out as an exploration of gc.relocate, and an attempt
to write a simple test in call-lowering. I then noticed that the
arguments of gc.relocate were not checked fully, so I went in and fixed
a few things. Finally, the most important outcome of this patch is that
my new error handling code caught a bug in a callsite in
stackmap-format."

Differential Revision: http://reviews.llvm.org/D6824

llvm-svn: 225412
2015-01-07 22:48:01 +00:00
Tom Stellard 0599297cb4 R600/SI: Commute instructions to enable more folding opportunities
llvm-svn: 225410
2015-01-07 22:44:19 +00:00
Tom Stellard 26cc18df43 R600/SI: Only fold immediates that have one use
Folding the same immediate into multiple instruction will increase
program size, which can hurt performance.

llvm-svn: 225405
2015-01-07 22:18:27 +00:00
Olivier Sallenave 0451532996 More FMA folding opportunities.
llvm-svn: 225380
2015-01-07 20:54:17 +00:00
Tom Stellard 4842c05216 R600/SI: Add a V_MOV_B64 pseudo instruction
This is used to simplify the SIFoldOperands pass and make it easier to
fold immediates.

llvm-svn: 225373
2015-01-07 20:27:25 +00:00
Tom Stellard ef3b864a07 R600/SI: Teach SIFoldOperands to split 64-bit constants when folding
This allows folding of sequences like:

s[0:1] = s_mov_b64 4
v_add_i32 v0, s0, v0
v_addc_u32 v1, s1, v1

into

v_add_i32 v0, 4, v0
v_add_i32 v1, 0, v1

llvm-svn: 225369
2015-01-07 19:56:17 +00:00
Philip Reames 4ac17a3026 Introduce an example statepoint GC strategy
This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.)

Most of the validation logic added here as proof of concept will soon move in to the Verifier.  That move is dependent on http://reviews.llvm.org/D6811

There was discussion in the review thread about addrspace(1) being reserved for something.  I'm going to follow up on a seperate llvmdev thread.  If needed, I'll update all the code at once.

Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others.

Differential Revision: http://reviews.llvm.org/D6808

llvm-svn: 225365
2015-01-07 19:07:50 +00:00
David Majnemer 4d77fdf311 X86: Allow the stack probe size to be configurable per function
LLVM emits stack probes on Windows targets to ensure that the stack is
correctly accessed.  However, the amount of stack allocated before
emitting such a probe is hardcoded to 4096.

It is desirable to have this be configurable so that a function might
opt-out of stack probes.  Our level of granularity is at the function
level instead of, say, the module level to permit proper generation of
code after LTO.

Patch by Andrew H!

N.B.  The inliner needs to be updated to properly consider what happens
after inlining a function with a specific stack-probe-size into another
function with a different stack-probe-size.

llvm-svn: 225360
2015-01-07 18:14:07 +00:00
Ahmed Bougacha aa2d290997 [X86] Teach FCOPYSIGN lowering to recognize constant magnitudes.
For code like:
    float foo(float x) { return copysign(1.0, x); }
We used to generate:
    andps  <-0.000000e+00,0,0,0>, %xmm0
    movss  <1.000000e+00>, %xmm1
    andps  <nan>, %xmm1
    orps   %xmm0, %xmm1
Basically doing an abs(1.0f) in the two middle instructions.

We now generate:
    andps  <-0.000000e+00,0,0,0>, %xmm0
    orps   <1.000000e+00,0,0,0>, %xmm0

Builds on cleanups r223415, r223542.
rdar://19049548
Differential Revision: http://reviews.llvm.org/D6555

llvm-svn: 225357
2015-01-07 17:33:03 +00:00
Charlie Turner 06f22f4678 [ARM] Add missing Tag_DIV_use tests.
llvm-svn: 225348
2015-01-07 11:37:40 +00:00
Karthik Bhat 9ba55334dc Revert r225165 and r225169
Even thouh gcc produces simialr instructions as Owen pointed out the two patterns aren’t equivalent in the case
where the original subtraction could have caused an overflow.
Reverting the same.

llvm-svn: 225341
2015-01-07 06:34:34 +00:00
Matt Arsenault d0101a2dfd R600/SI: Add combine for isinfinite pattern
llvm-svn: 225310
2015-01-06 23:00:46 +00:00
Matt Arsenault 6f6233dc58 R600/SI: Pattern match isinf to v_cmp_class instructions
llvm-svn: 225307
2015-01-06 23:00:41 +00:00
Matt Arsenault f2290336b7 R600/SI: Add basic DAG combines for fp_class
llvm-svn: 225306
2015-01-06 23:00:39 +00:00
Matt Arsenault 4831ce5491 R600/SI: Add class intrinsic
llvm-svn: 225305
2015-01-06 23:00:37 +00:00
Rafael Espindola 83a362cde8 Change the .ll syntax for comdats and add a syntactic sugar.
In order to make comdats always explicit in the IR, we decided to make
the syntax a bit more compact for the case of a GlobalObject in a
comdat with the same name.

Just dropping the $name causes problems for

@foo = globabl i32 0, comdat
$bar = comdat ...

and

declare void @foo() comdat
$bar = comdat ...

So the syntax is changed to

@g1 = globabl i32 0, comdat($c1)
@g2 = globabl i32 0, comdat

and

declare void @foo() comdat($c1)
declare void @foo() comdat

llvm-svn: 225302
2015-01-06 22:55:16 +00:00
Hal Finkel ed844c4ad1 [PowerPC] Reuse a load operand in int->fp conversions
int->fp conversions on PPC must be done through memory loads and stores. On a
modern core, this process begins by storing the int value to memory, then
loading it using a (sometimes special) FP load instruction. Unfortunately, we
would do this even when the value to be converted was itself a load, and we can
just use that same memory location instead of copying it to another first.
There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs,
because the fp_to_int operand has not been lowered when the int_to_fp is being
lowered. We handle this specially by invoking fp_to_int's lowering logic
(partially) and getting the necessary memory location (some trivial refactoring
was done to make this possible).

This is all somewhat ugly, and it would be nice if some later CodeGen stage
could just clean this stuff up, but because doing so would involve modifying
target-specific nodes (or instructions), it is not immediately clear how that
would work.

Also, remove a related entry from the README.txt for which we now generate
reasonable code.

llvm-svn: 225301
2015-01-06 22:31:02 +00:00
Tom Stellard 9d6797ae58 R600/SI: Insert s_waitcnt before s_barrier instructions.
This ensures that all memory operations are complete when all threads
reach the barrier.

llvm-svn: 225290
2015-01-06 19:52:07 +00:00
Tom Stellard 49f8bfdcb7 R600/SI: Add a stub GCNTargetMachine
This is equivalent to the AMDGPUTargetMachine now, but it is the
starting point for separating R600 and GCN functionality into separate
targets.

It is recommened that users start using the gcn triple for GCN-based
GPUs, because using the r600 triple for these GPUs will be deprecated in
the future.

llvm-svn: 225277
2015-01-06 18:00:21 +00:00
Andrea Di Biagio f807a6f297 [CodeGenPrepare] Improved logic to speculate calls to cttz/ctlz.
This patch improves the logic added at revision 224899 (see review D6728) that
teaches the backend when it is profitable to speculate calls to cttz/ctlz.

The original algorithm conservatively avoided speculating more than one
instruction from a basic block in a control flow grap modelling an if-statement.
In particular, the only allowed instruction (excluding the terminator) was a
call to cttz/ctlz. However, there are cases where we could be less conservative
and still be able to speculate a call to cttz/ctlz.

With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the
result is zero extended/truncated in the same basic block, and the zext/trunc
instruction is "free" for the target.

Added new test cases to CodeGen/X86/cttz-ctlz.ll

Differential Revision: http://reviews.llvm.org/D6853

llvm-svn: 225274
2015-01-06 17:41:18 +00:00
Hal Finkel ef35ed0892 [PowerPC] Add a regression test for r225251
In r225251, I removed an old entry from the README.txt file. While there are
several contributing factors (including pieces in Clang's ABI code), upon
further reflection, the backend part deserves a regression test.

llvm-svn: 225268
2015-01-06 16:46:37 +00:00
Colin LeMahieu 1445553474 [Hexagon] Adding dealloc_return encoding and absolute address stores.
llvm-svn: 225267
2015-01-06 16:15:15 +00:00
David Majnemer 29c52f7449 X86: Don't make illegal GOTTPOFF relocations
"ELF Handling for Thread-Local Storage" specifies that R_X86_64_GOTTPOFF
relocation target a movq or addq instruction.

Prohibit the truncation of such loads to movl or addl.

This fixes PR22083.

Differential Revision: http://reviews.llvm.org/D6839

llvm-svn: 225250
2015-01-06 07:12:52 +00:00
Hal Finkel 5efb918844 [PowerPC] Improve int_to_fp(fp_to_int(x)) combining
The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x))
without a load/store pair is updated here with support for unsigned integers,
and to support single-precision values without a third rounding step, on newer
cores with the appropriate instructions.

llvm-svn: 225248
2015-01-06 06:01:57 +00:00
Hal Finkel f93f56d619 [PowerPC] Fix test to pass on Darwin hosts
llvm-svn: 225220
2015-01-05 23:17:43 +00:00
Hal Finkel 9187711f08 [PowerPC] Convert a README.txt entry into a better test
We now produce the desired code as noted in the README.txt file (no spurious
or). Remove the README entry and improve the regression test.

llvm-svn: 225214
2015-01-05 21:53:52 +00:00
Hal Finkel c7d35bb5b1 [PowerPC] Add a test for truncating a shifted load
We now produce the desired code as noted in the README.txt file. Remove the
README entry and add a regression test.

llvm-svn: 225209
2015-01-05 21:33:14 +00:00
Hal Finkel a4750dec99 [PowerPC] Add another test for load/store with update
We now produce the desired code as noted in the README.txt file. Remove the
README entry and add a regression test.

llvm-svn: 225205
2015-01-05 21:22:42 +00:00
Hal Finkel 200d2ad188 [PowerPC] Fold i1 extensions with other ops
Consider this function from our README.txt file:

  int foo(int a, int b) { return (a < b) << 4; }

We now explicitly track CR bits by default, so the comment in the README.txt
about not really having a SETCC is no longer accurate, but we did generate this
somewhat silly code:

        cmpw 0, 3, 4
        li 3, 0
        li 12, 1
        isel 3, 12, 3, 0
        sldi 3, 3, 4
        blr

which generates the zext as a select between 0 and 1, and then shifts the
result by a constant amount. Here we preprocess the DAG in order to fold the
results of operations on an extension of an i1 value into the SELECT_I[48]
pseudo instruction when the resulting constant can be materialized using one
instruction (just like the 0 and 1). This was not implemented as a DAGCombine
because the resulting code would have been anti-canonical and depends on
replacing chained user nodes, which does not fit well into the lowering
paradigm. Now we generate:

        cmpw 0, 3, 4
        li 3, 0
        li 12, 16
        isel 3, 12, 3, 0
        blr

which is less silly.

llvm-svn: 225203
2015-01-05 21:10:24 +00:00
Hal Finkel 49557f1b42 [PowerPC] Remove zexts after i32 ctlz
The 64-bit semantics of cntlzw are not special, the 32-bit population count is
stored as a 64-bit value in the range [0,32]. As a result, it is always zero
extended, and it can be added to the PPCISelDAGToDAG peephole optimization as a
frontier instruction for the removal of unnecessary zero extensions.

llvm-svn: 225192
2015-01-05 18:52:29 +00:00
Hal Finkel 4e2c78228a [PowerPC] Remove zexts after byte-swapping loads
lhbrx and lwbrx not only load their data with byte swapping, but also clear the
upper 32 bits (at least). As a result, they can be added to the PPCISelDAGToDAG
peephole optimization as frontier instructions for the removal of unnecessary
zero extensions.

llvm-svn: 225189
2015-01-05 18:09:06 +00:00
Ahmed Bougacha d54c448d34 [AArch64] Improve codegen of store lane instructions by avoiding GPR usage.
We used to generate code similar to:

  umov.b        w8, v0[2]
  strb  w8, [x0, x1]

because the STR*ro* patterns were preferred to ST1*.
Instead, we can avoid going through GPRs, and generate:

  add   x8, x0, x1
  st1.b { v0 }[2], [x8]

This patch increases the ST1* AddedComplexity to achieve that.

rdar://16372710
Differential Revision: http://reviews.llvm.org/D6202

llvm-svn: 225183
2015-01-05 17:10:26 +00:00
Ahmed Bougacha f964df3640 [AArch64] Improve codegen of store lane 0 instructions by directly storing the subregister.
For 0-lane stores, we used to generate code similar to:

  fmov w8, s0
  str w8, [x0, x1, lsl #2]

instead of:

  str s0, [x0, x1, lsl #2]

To correct that: for store lane 0 patterns, directly match to STR <subreg>0.

Byte-sized instructions don't have the special case for a 0 index,
because FPR8s are defined to have untyped content.

rdar://16372710
Differential Revision: http://reviews.llvm.org/D6772

llvm-svn: 225181
2015-01-05 17:02:28 +00:00
Karthik Bhat 93f27ce886 Select lower fsub,fabs pattern to fabd on AArch64
This patch lowers patterns such as-
  fsub   v0.4s, v0.4s, v1.4s
  fabs   v0.4s, v0.4s
to
  fabd  v0.4s, v0.4s, v1.4s
on AArch64.

Review: http://reviews.llvm.org/D6791
llvm-svn: 225169
2015-01-05 13:57:59 +00:00
Charlie Turner 8b2caa458f Emit the build attribute Tag_conformance.
Claim conformance to version 2.09 of the ARM ABI.

This build attribute must be emitted first amongst the build attributes when
written to an object file. This is to simplify conformance detection by
consumers.

Change-Id: If9eddcfc416bc9ad6e5cc8cdcb05d0031af7657e
llvm-svn: 225166
2015-01-05 13:12:17 +00:00
Karthik Bhat 8ec742c2f9 Select lower sub,abs pattern to sabd on AArch64
This patch lowers patterns such as-
  sub	v0.4s, v0.4s, v1.4s
  abs	v0.4s, v0.4s
to
  sabd	v0.4s, v0.4s, v1.4s
on AArch64.

Review: http://reviews.llvm.org/D6781
llvm-svn: 225165
2015-01-05 13:11:07 +00:00
Hal Finkel 9bb61de1be [PowerPC] Enable speculation of cttz/ctlz
PPC has an instruction for ctlz with defined zero behavior, and our lowering of
cttz (provided by DAGCombine) is also efficient and branchless, so speculating
these makes sense.

llvm-svn: 225150
2015-01-05 05:24:42 +00:00
Hal Finkel 2f61879ff4 [PowerPC] Materialize i64 constants using rotation with masking
r225135 added the ability to materialize i64 constants using rotations in order
to reduce the instruction count. Sometimes we can use a rotation only with some
extra masking, so that we take advantage of the fact that generating a bunch of
extra higher-order 1 bits is easy using li/lis.

llvm-svn: 225147
2015-01-05 03:41:38 +00:00
Simon Pilgrim b65a6ee831 [X86][SSE] Added vector packing test for pr12412
llvm-svn: 225138
2015-01-04 19:08:03 +00:00
Simon Pilgrim a1540c11ec [X86][SSE] Added vector integer truncation tests - based off pr15524
llvm-svn: 225137
2015-01-04 17:52:00 +00:00
Hal Finkel 241ba79f95 [PowerPC] Materialize i64 constants using rotation
Materializing full 64-bit constants on PPC64 can be expensive, requiring up to
5 instructions depending on the locations of the non-zero bits. Sometimes
materializing a rotated constant, and then applying the inverse rotation, requires
fewer instructions than the direct method. If so, do that instead.

In r225132, I added support for forming constants using bit inversion. In
effect, this reverts that commit and replaces it with rotation support. The bit
inversion is useful for turning constants that are mostly ones into ones that
are mostly zeros (thus enabling a more-efficient shift-based materialization),
but the same effect can be obtained by using negative constants and a rotate,
and that is at least as efficient, if not more.

llvm-svn: 225135
2015-01-04 15:43:55 +00:00
Hal Finkel ca6375fb75 [PowerPC] Materialize i64 constants using bit inversion
Materializing full 64-bit constants on PPC64 can be expensive, requiring up to
5 instructions depending on the locations of the non-zero bits. Sometimes
materializing the bit-reversed constant, and then flipping the bits, requires
fewer instructions than the direct method. If so, do that instead.

llvm-svn: 225132
2015-01-04 12:35:03 +00:00