Commit Graph

7283 Commits

Author SHA1 Message Date
Hal Finkel a4d074863a Add the PPC64 popcntd instruction
PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and
tell TTI about it so that popcount-loop recognition will know about it.

llvm-svn: 178233
2013-03-28 13:29:47 +00:00
Hal Finkel 035b4825ce Cleanup PPC CR-spill kill flags and 32- vs. 64-bit instructions
There were a few places where kill flags were not being set correctly, and
where 32-bit instruction variants were being used with 64-bit registers. After
r178180, this code was being triggered causing llc to assert.

llvm-svn: 178220
2013-03-28 03:38:16 +00:00
David Blaikie 5692e72f30 Revert "Adding DIImportedModules to DIScopes."
This reverts commit 342d92c7a0adeabc9ab00f3f0d88d739fe7da4c7.

Turns out we're going with a different schema design to represent
DW_TAG_imported_modules so we won't need this extra field.

llvm-svn: 178215
2013-03-28 02:44:59 +00:00
Preston Gurd d6be4bf87f This patch follows is a follow up to r178171, which uses the register
form of call in preference to memory indirect on Atom.

In this case, the patch applies the optimization to the code for reloading
spilled registers.

The patch also includes changes to sibcall.ll and movgs.ll, which were
failing on the Atom buildbot after the first patch was applied.

This patch by Sriram Murali.

llvm-svn: 178193
2013-03-27 23:16:18 +00:00
Preston Gurd 663e6f9558 For the current Atom processor, the fastest way to handle a call
indirect through a memory address is to load the memory address into
a register and then call indirect through the register.

This patch implements this improvement by modifying SelectionDAG to
force a function address which is a memory reference to be loaded
into a virtual register.

Patch by Sriram Murali.

llvm-svn: 178171
2013-03-27 19:14:02 +00:00
Christian Konig 08f5929942 R600/SI: add SETO/SETUO patterns
6 more piglit tests.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 178145
2013-03-27 15:27:31 +00:00
Hal Finkel 687143557d Print PPC ZERO as 0 (not r0) even on Darwin
It seems that the Darwin PPC assembler requires r0 to be written as 0 when it
means 0 (at least in lwarx/stwcx.). Fixes PR15605.

llvm-svn: 178142
2013-03-27 13:20:52 +00:00
Silviu Baranga dc45336d09 Enabling the generation of dependency breakers for partial updates on Cortex-A15. Also fixing a small bug in getting the update clearence for VLD1LNd32.
llvm-svn: 178134
2013-03-27 12:38:44 +00:00
Christian Konig 3c14580acb R600/SI: add cummuting of rev instructions
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 178127
2013-03-27 09:12:59 +00:00
Christian Konig 70a5032c1b R600/SI: add mulhu/mulhs patterns
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 178126
2013-03-27 09:12:51 +00:00
Christian Konig 20a7e6b764 R600/SI: add srl/sha patterns for SI
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 178125
2013-03-27 09:12:44 +00:00
Hal Finkel 0f77861d9f Allocate r0 on PPC
The R0 register can now be allocated because instructions
that cannot use R0 as a GPR have been appropriately marked.

llvm-svn: 178123
2013-03-27 06:52:27 +00:00
Bill Schmidt a1b72d0f6a Remove the link register from the GPR classes on PowerPC.
Some implementation detail in the forgotten past required the link
register to be placed in the GPRC and G8RC register classes.  This is
just wrong on the face of it, and causes several extra intersection
register classes to be generated.  I found this was having evil
effects on instruction scheduling, by causing the wrong register class
to be consulted for register pressure decisions.

No code generation changes are expected, other than some minor changes
in instruction order.  Seven tests in the test bucket required minor
tweaks to adjust to the new normal.

llvm-svn: 178114
2013-03-27 02:40:14 +00:00
David Blaikie a26d70358f Adding DIImportedModules to DIScopes.
This is just the basic groundwork for supporting DW_TAG_imported_module but I
wanted to commit this before pushing support further into Clang or LLVM so that
this rather churny change is isolated from the rest of the work. The major
churn here is obviously adding another field (within the common DIScope prefix)
to all DIScopes (files, classes, namespaces, lexical scopes, etc). This should
be the last big churny change needed for DW_TAG_imported_module/using directive
support/PR14606.

llvm-svn: 178099
2013-03-27 00:07:26 +00:00
Hal Finkel a7b0630ba8 Don't spill PPC VRSAVE on non-Darwin (even in SjLj)
As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore
VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE (and I've
added some asserts to make sure that we're not).

As it turns out, we're not currently handling the Darwin case correctly (I've
added a FIXME in the test case). I've tried adding various implied register
definitions/uses to force the spill without success, so I'll need to address
this later.

llvm-svn: 178096
2013-03-27 00:02:20 +00:00
Michael Liao 03f9ad0e67 Add XTEST codegen support
llvm-svn: 178083
2013-03-26 22:47:01 +00:00
Jakob Stoklund Olesen 1ac7e662d4 Enable SandyBridgeModel for all modern Intel P6 descendants.
All Intel CPUs since Yonah look a lot alike, at least at the granularity
of the scheduling models. We can add more accurate models for
processors that aren't Sandy Bridge if required. Haswell will probably
need its own.

The Atom processor and anything based on NetBurst is completely
different. So are the non-Intel chips.

llvm-svn: 178080
2013-03-26 22:19:12 +00:00
Hal Finkel 0dfbb05aff Use multiple virtual registers in PPC CR spilling
Now that the register scavenger can support multiple spill slots, and PEI can
use virtual-register-based scavenging for multiple simultaneous registers, we
can use a virtual register for the transfer register in the CR spilling code.

This should eliminate the last place (outside of the prologue/epilogue) where
we depend on the unconditional availability of the r0 register. We will soon be
able to allocate it (in a somewhat restricted sense) as a GPR.

llvm-svn: 178060
2013-03-26 18:57:22 +00:00
Hal Finkel 4e05788cc3 Update PEI's virtual-register-based scavenging to support multiple simultaneous mappings
The previous algorithm could not deal properly with scavenging multiple virtual
registers because it kept only one live virtual -> physical mapping (and
iterated through operands in order). Now we don't maintain a current mapping,
but rather use replaceRegWith to completely remove the virtual register as
soon as the mapping is established.

In order to allow the register scavenger to return a physical register killed
by an instruction for definition by that same instruction, we now call
RS->forward(I) prior to eliminating virtual registers defined in I. This
requires a minor update to forward to ignore virtual registers.

These new features will be tested in forthcoming commits.

llvm-svn: 178058
2013-03-26 18:56:54 +00:00
Michael Liao 4a44e556cc Fix PRFCHW test on non-x86 builds
- 'prefetch' intrinsics are only lowered when SSE is available. On non-X86
  builds, 'generic' CPU is used and stops lowering any prefetch intrinsics.

llvm-svn: 178046
2013-03-26 18:15:45 +00:00
Michael Liao 5173ee03af Add PREFETCHW codegen support
- Add 'PRFCHW' feature defined in AVX2 ISA extension

llvm-svn: 178040
2013-03-26 17:47:11 +00:00
Jyotsna Verma 15957b129f Hexagon: Use multiclass for aslh, asrh, sxtb, sxth, zxtb and zxth.
llvm-svn: 178032
2013-03-26 15:43:57 +00:00
Christian Konig 727d06de1d R600/SI: mark most intrinsics as readnone v2
They read from constant register space anyway.

v2: fix lit tests

Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178020
2013-03-26 14:03:57 +00:00
Michael Liao 5fbcd81793 Revise alignment checking/calculation on 256-bit unaligned memory access
- It's still considered aligned when the specified alignment is larger
  than the natural alignment;
- The new alignment for the high 128-bit vector should be min(16,
  alignment) as the pointer is advanced by 16, a power-of-2 offset.

llvm-svn: 177947
2013-03-25 23:50:10 +00:00
Michael Liao bb05a1d7b5 Enhance folding of (extract_subvec (insert_subvec V1, V2, IIdx), EIdx)
- Handle the case where the result of 'insert_subvect' is bitcasted
  before 'extract_subvec'. This removes the redundant insertf128/extractf128
  pair on unaligned 256-bit vector load/store on vectors of non 64-bit integer.

llvm-svn: 177945
2013-03-25 23:47:35 +00:00
Jakob Stoklund Olesen 52576719e0 Add an -mcpu option to a test that is apparently scheduler-sensitive.
This should fix the clang-atom-d2700-ubuntu-rel buildbot.

llvm-svn: 177943
2013-03-25 23:43:23 +00:00
Shuxin Yang 93b1f12ac1 Disable some unsafe-fp-math DAG-combine transformation after legalization.
For instance, following transformation will be disabled:
    x + x + x => 3.0f * x;

The problem of these transformations is that it introduces a FP constant, which
following Instruction-Selection pass cannot handle.

Reviewed by Nadav, thanks a lot!

rdar://13445387

llvm-svn: 177933
2013-03-25 22:52:29 +00:00
NAKAMURA Takumi 8c0d63c120 llvm/test/CodeGen/X86/atomic{32|64}.ll: Unmark them out of XFAIL:win32.
I know it is incorrect and they'd fail with +Asserts for win32 targets, though.
I'll try to fix them tonight.

llvm-svn: 177914
2013-03-25 21:07:53 +00:00
Jyotsna Verma 18ca8101ee XFAIL some of the generic CodeGen tests for Hexagon.
test/CodeGen/Generic/2008-02-20-MatchingMem.ll: Test contains inline assembly not supported by Hexagon.

Following tests are XFAILed due to multiple return values which Hexagon doesn't support.

test/CodeGen/Generic/multiple-return-values-cross-block-with-invoke.ll
test/CodeGen/Generic/select-cc.ll
test/CodeGen/Generic/vector.ll

llvm-svn: 177912
2013-03-25 21:04:16 +00:00
Chad Rosier 1ad494d35b Remove unnecessary attributes from test case.
llvm-svn: 177882
2013-03-25 18:36:19 +00:00
Yiannis Tsiouris dbb4adf134 Add a GC plugin for Erlang
llvm-svn: 177867
2013-03-25 13:47:46 +00:00
Justin Holewinski e988409423 [NVPTX] Fix handling of vector arguments
llvm-svn: 177847
2013-03-24 21:17:47 +00:00
Owen Anderson c81616b0a9 Remove the type legality check from the SelectionDAGBuilder when it lowers @llvm.fmuladd to ISD::FMA nodes.
Performing this check unilaterally prevented us from generating FMAs when the incoming IR contained illegal vector types which would eventually be legalized to underlying types that *did* support FMA.
For example, an @llvm.fmuladd on an OpenCL float16 should become a sequence of float4 FMAs, not float4 fmul+fadd's.

NOTE: Because we still call the target-specific profitability hook, individual targets can reinstate the old behavior, if desired, by simply performing the legality check inside their callback hook.  They can also perform more sophisticated legality checks, if, for example, some illegal vector types can be productively implemented as FMAs, but not others.
llvm-svn: 177820
2013-03-23 08:26:53 +00:00
Jyotsna Verma fdc660bf2e Hexagon: Add and enable memops setbit, clrbit, &,|,+,- for byte, short, and word.
llvm-svn: 177747
2013-03-22 18:41:34 +00:00
David Blaikie 30ce0788e7 Refactor out the DIFile parameter to DILexicalBlock to refer to the raw file/directory pair
llvm-svn: 177742
2013-03-22 17:33:20 +00:00
Michel Danzer 3de8ae38e6 R600: Fix up test/CodeGen/R600/llvm.pow.ll for r177730
llvm-svn: 177736
2013-03-22 15:24:16 +00:00
David Blaikie f333dc9571 Reorder the DIFile field in DILexicalBlock to become a prefix common with other DIScopes
llvm-svn: 177703
2013-03-22 05:47:44 +00:00
Hal Finkel 891671afe5 Fix a register-class comparison bug in PPCCTRLoops
Thanks to Jakob for isolating the underlying problem from the
test case in r177423. The original commit had introduced
asymmetric copy operations, but these turned out to be a work-around
to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops).

llvm-svn: 177679
2013-03-21 23:23:34 +00:00
David Blaikie 0d7d62e4b2 Move the DIFile in DISubprogram to the beginning to be a common prefix along with other DIScopes
llvm-svn: 177674
2013-03-21 22:29:36 +00:00
Hal Finkel 756810fe36 Implement builtin_{setjmp/longjmp} on PPC
This implements SJLJ lowering on PPC, making the Clang functions
__builtin_{setjmp/longjmp} functional on PPC platforms. The implementation
strategy is similar to that on X86, with the exception that a branch-and-link
variant is used to get the right jump address. Credit goes to Bill Schmidt for
suggesting the use of the unconditional bcl form (instead of the regular bl
instruction) to limit return-address-cache pollution.

Benchmarking the speed at -O3 of:

static jmp_buf env_sigill;

void foo() {
                __builtin_longjmp(env_sigill,1);
}

main() {
	...

        for (int i = 0; i < c; ++i) {
                if (__builtin_setjmp(env_sigill)) {
                        goto done;
                } else {
                        foo();
                }

done:;
        }

	...
}

vs. the same code using the libc setjmp/longjmp functions on a P7 shows that
this builtin implementation is ~4x faster with Altivec enabled and ~7.25x
faster with Altivec disabled. This comparison is somewhat unfair because the
libc version must also save/restore the VSX registers which we don't yet
support.

llvm-svn: 177666
2013-03-21 21:37:52 +00:00
Renato Golin a4d563526e Fix Darwin NEON FP and increase coverage
llvm-svn: 177664
2013-03-21 21:30:49 +00:00
David Blaikie cc8d090163 Remove unused field in DISubprogram
llvm-svn: 177661
2013-03-21 20:28:52 +00:00
Hal Finkel a1431df540 Add support for spilling VRSAVE on PPC
Although there is only one Altivec VRSAVE register, it is a member of
a register class, and we need the ability to spill it. Because this
register is normally callee-preserved and handled by special code this
has never before been necessary. However, this capability will be required by
a forthcoming commit adding SjLj support.

llvm-svn: 177654
2013-03-21 19:03:21 +00:00
Hal Finkel aa03c03a2d Correct PPC FRAMEADDR lowering using a pseudo-register
The old code used to lower FRAMEADDR tried to replicate the logic in the real
frame-lowering code that determines whether or not the frame pointer (r31) will
be used. When it seemed as through the frame pointer would not be used, the
stack pointer (r1) was used instead. Unfortunately, because the stack size is
not yet known, this does not work. Instead, this change introduces new
always-reserved pseudo-registers (FP and FP8) that are replaced during prologue
insertion with the real frame-pointer register (either r1 or r31).

It is important that this intrinsic always return a valid frame address because
it is used by Clang to store the frame address as part of code generation for
__builtin_setjmp.

llvm-svn: 177653
2013-03-21 19:03:19 +00:00
Renato Golin b4dd6c5945 Avoid NEON SP-FP unless unsafe-math or Darwin
NEON is not IEEE 754 compliant, so we should avoid lowering single-precision
floating point operations with NEON unless unsafe-math is turned on. The
equivalent VFP instructions are IEEE 754 compliant, but in some cores they're
much slower, so some archs/OSs might still request it to be on by default,
such as Swift and Darwin.

llvm-svn: 177651
2013-03-21 18:47:47 +00:00
David Blaikie efb0d65ed7 Debug info: refactor the first field of DICompileUnit to be a raw file/directory pair
This removes the DICompileUnit special case from DIScope.

llvm-svn: 177610
2013-03-20 23:58:12 +00:00
Nadav Rotem 4536d582fd When computing the demanded bits of Load SDNodes, make sure that we are looking at the loaded-value operand and not the ptr result (in case of pre-inc loads).
rdar://13348420

llvm-svn: 177596
2013-03-20 22:53:44 +00:00
David Blaikie 3b88852a2d Debug Info: Swap the 2nd and 3rd parameters to DICompileUnit to match the common DIScope prefix
llvm-svn: 177595
2013-03-20 22:52:54 +00:00
David Blaikie 43a729d165 Remove unused field in DICompileUnit
llvm-svn: 177590
2013-03-20 22:34:33 +00:00
Hao Liu a7521131da Add a test case for PR15318 fixed in r177472
llvm-svn: 177489
2013-03-20 06:18:06 +00:00
Michael Liao 0f4ea0c4a9 Fix PR15296
- Move SRA/SRL/SHL lowering support from DAG combination to DAG lowering
  to support extended 256-bit integer in AVX but not AVX2.

llvm-svn: 177478
2013-03-20 02:33:21 +00:00
David Blaikie 200b6ed80f Refactor the DIFile (2nd) parameter to DITypes to be an MDNode reference to a raw directory/file pair
This makes DIType's first non-tag parameter the same as DIFile's, allowing them
to both share the common implementation of getFilename/getDirectory in DIScope.

llvm-svn: 177467
2013-03-20 00:26:26 +00:00
Justin Holewinski d068943809 Propagate DAG node ordering during type legalization and instruction selection
A node's ordering is only propagated during legalization if (a) the new node does
not have an ordering (is not a CSE'd node), or (b) the new node has an ordering
that is higher than the node being legalized.

llvm-svn: 177465
2013-03-20 00:10:32 +00:00
David Blaikie abec74b38e Move the DIFile operand to DITypes from the 4th operand to the 2nd.
This is another step along the way to making all DIScopes have a common prefix
which can be added to in a general manner to support using directives
(DW_TAG_imported_module).

llvm-svn: 177462
2013-03-19 23:25:22 +00:00
Hal Finkel 559754a482 Add a comment to the CodeGen/PowerPC/asym-regclass-copy.ll test
llvm-svn: 177434
2013-03-19 20:22:32 +00:00
Ulrich Weigand d850167a19 Rewrite pre-increment store patterns to use standard memory operands.
Currently, pre-increment store patterns are written to use two separate
operands to represent address base and displacement:

  stwu $rS, $ptroff($ptrreg)

This causes problems when implementing the assembler parser, so this
commit changes the patterns to use standard (complex) memory operands
like in all other memory access instruction patterns:

  stwu $rS, $dst

To still match those instructions against the appropriate pre_store
SelectionDAG nodes, the patch uses the new feature that allows a Pat
to match multiple DAG operands against a single (complex) instruction
operand.

Approved by Hal Finkel.

llvm-svn: 177429
2013-03-19 19:52:04 +00:00
Hal Finkel 638a9fa43e Prepare to make r0 an allocatable register on PPC
Currently the PPC r0 register is unconditionally reserved. There are two reasons
for this:

 1. r0 is treated specially (as the constant 0) by certain instructions, and so
    cannot be used with those instructions as a regular register.

 2. r0 is used as a temporary register in the CR-register spilling process
    (where, under some circumstances, we require two GPRs).

This change addresses the first reason by introducing a restricted register
class (without r0) for use by those instructions that treat r0 specially. These
register classes have a new pseudo-register, ZERO, which represents the r0-as-0
use. This has the side benefit of making the existing target code simpler (and
easier to understand), and will make it clear to the register allocator that
uses of r0 as 0 don't conflict will real uses of the r0 register.

Once the CR spilling code is improved, we'll be able to allocate r0.

Adding these extra register classes, for some reason unclear to me, causes
requests to the target to copy 32-bit registers to 64-bit registers. The
resulting code seems correct (and causes no test-suite failures), and the new
test case covers this new kind of asymmetric copy.

As r0 is still reserved, no functionality change intended.

llvm-svn: 177423
2013-03-19 18:51:05 +00:00
Nadav Rotem 0f1bc60d51 Optimize sext <4 x i8> and <4 x i16> to <4 x i64>.
Patch by Ahmad, Muhammad T <muhammad.t.ahmad@intel.com>

llvm-svn: 177421
2013-03-19 18:38:27 +00:00
Hal Finkel 6681486375 Cleanup PPC64 unaligned i64 load/store
Remove an accidentally-added instruction definition and add a comment in the
test case. This is in response to a post-commit review by Bill Schmidt.

No functionality change intended.

llvm-svn: 177404
2013-03-19 15:23:39 +00:00
Renato Golin 227eb6fc5f Improve long vector sext/zext lowering on ARM
The ARM backend currently has poor codegen for long sext/zext
operations, such as v8i8 -> v8i32. This patch addresses this
by performing a custom expansion in ARMISelLowering. It also
adds/changes the cost of such lowering in ARMTTI.

This partially addresses PR14867.

Patch by Pete Couperus

llvm-svn: 177380
2013-03-19 08:15:38 +00:00
Hal Finkel d9e10d51fa Don't reserve R31 on PPC64 unless the frame pointer is needed
llvm-svn: 177379
2013-03-19 08:09:38 +00:00
Hal Finkel fc9aad6436 Fix a sign-extension bug in PPCCTRLoops
Don't sign extend the immediate value from the OR instruction in
an LIS/OR pair.

llvm-svn: 177361
2013-03-18 23:58:28 +00:00
Hal Finkel b09680b0f7 Fix PPC unaligned 64-bit loads and stores
PPC64 supports unaligned loads and stores of 64-bit values, but
in order to use the r+i forms, the offset must be a multiple of 4.
Unfortunately, this cannot always be determined by examining the
immediate itself because it might be available only via a TOC entry.

In order to get around this issue, we additionally predicate the
selection of the r+i form on the alignment of the load or store
(forcing it to be at least 4 in order to select the r+i form).

llvm-svn: 177338
2013-03-18 23:00:58 +00:00
Quentin Colombet 8fc340976d Extend global merge pass to optionally consider global constant variables.
Also add some checks to not merge globals used within landing pad instructions or marked as "used".

llvm-svn: 177331
2013-03-18 22:30:07 +00:00
Bill Schmidt 45bdbe7ec0 Change test cases to handle unaligned references.
Hal Finkel recently added code to allow unaligned memory references
for PowerPC.  Two tests were temporarily modified with
-disable-ppc-unaligned to keep them from failing.  This patch adjusts
the expected code generation for the unaligned references.

llvm-svn: 177328
2013-03-18 22:12:04 +00:00
David Blaikie 3053029310 Remove unnecessary leading comment characters in lit-only file
llvm-svn: 177327
2013-03-18 22:08:16 +00:00
David Blaikie 99067af791 Include '.test' suffix in target specific lit configs that need it
Apparently my final cleanup to use a relevant suffix for these tests before
committing r176831 caused them to stop running since lit wasn't configured to
run tests with that suffix in those directories (why don't we just have a
global suffix list?). So, add the suffix to the relevant directories & fix the
test that has bitrotted over the last week due to my debug info schema changes.

llvm-svn: 177315
2013-03-18 20:31:44 +00:00
Hal Finkel 21f2a43ab4 Fix large count and negative constant count handling in PPCCTRLoops
This commit fixes an assert that would occur on loops with large constant counts
(like looping for ((uint32_t) -1) iterations on PPC64). The existing code did
not handle counts that it computed to be negative (asserting instead), but
these can be created with valid inputs.

This bug was discovered by bugpoint while I was attempting to isolate a
completely different problem.

Also, in writing test cases for the negative-count problem, I discovered that
the ori/lsi handling was broken (there was a typo which caused the logic that
was supposed to detect these pairs and extract the iteration count to always
fail). This has now also been corrected (and is covered by one of the new test
cases).

llvm-svn: 177295
2013-03-18 17:40:44 +00:00
Hal Finkel 12337e4e7d Cleanup initial-value constants in PPCCTRLoops
Because the initial-value constants had not been added to the list
of instructions considered for DCE the resulting code had redundant
constant-materialization instructions.

llvm-svn: 177294
2013-03-18 17:40:27 +00:00
David Blaikie 8fb8224578 Split out filename & directory from DIFile to start generalizing over DIScopes
This is the first step to making all DIScopes have a common metadata prefix (so
that things (using directives, for example) that can appear in any scope can be
added to that common prefix). DIFile is itself a DIScope so the common prefix
of all DIScopes cannot be a DIFile - instead it's the raw filename/directory
name pair.

llvm-svn: 177239
2013-03-17 21:13:55 +00:00
Hal Finkel fcc51d4ff1 Improve PPC VR (Altivec) register spilling
This change cleans up two issues with Altivec register spilling:

  1. The spilling code was inefficient (using two instructions, and add and a
     load, when just one would do)

  2. The code assumed that r0 would always be available (true for now, but this
     will change)

The new code handles VR spilling just like GPR spills but forced into r+r mode.
As a result, when any VR spills are present, we must now always allocate the
register-scavenger spill slot.

llvm-svn: 177231
2013-03-17 04:43:44 +00:00
Hal Finkel 57080382e6 Remove FIXMEs in PPC test cases related to unaligned loads/stores
As pointed out by Bill in response to r177160, these two FIXMEs
can also be removed.

llvm-svn: 177229
2013-03-16 23:02:31 +00:00
Craig Topper 612f7bfa4d Add X86 code emitter support AVX encoded MRMDestReg instructions.
Previously we weren't skipping the VVVV encoded register. Based on patch by Michael Liao.

llvm-svn: 177221
2013-03-16 03:44:31 +00:00
Arnold Schwaighofer 9d7a3827e4 ARM cost model: Fix costs for some vector selects
I was too pessimistic in r177105. Vector selects that fit into a legal register
type lower just fine. I was mislead by the code fragment that I was using. The
stores/loads that I saw in those cases came from lowering the conditional off
an address.

Changing the code fragment to:

%T0_3 = type <8 x i18>
%T1_3 = type <8 x i1>

define void @func_blend3(%T0_3* %loadaddr, %T0_3* %loadaddr2,
                         %T1_3* %blend, %T0_3* %storeaddr) {
  %v0 = load %T0_3* %loadaddr
  %v1 = load %T0_3* %loadaddr2
==> FROM:
  ;%c = load %T1_3* %blend
==> TO:
  %c = icmp slt %T0_3 %v0, %v1
==> USE:
  %r = select %T1_3 %c, %T0_3 %v0, %T0_3 %v1

  store %T0_3 %r, %T0_3* %storeaddr
  ret void
}

revealed this mistake.

radar://13403975

llvm-svn: 177170
2013-03-15 18:31:01 +00:00
Silviu Baranga 82dd6ac3bc Adding an A15 specific optimization pass for interactions between S/D/Q registers. The pass handles all the required transformations pre-regalloc.
llvm-svn: 177169
2013-03-15 18:28:25 +00:00
Benjamin Kramer 2f5457141a ARM: Fix an old refacto.
Fixes PR15520.

llvm-svn: 177167
2013-03-15 17:27:39 +00:00
Hal Finkel 8d7fbc9dad Enable unaligned memory access on PPC for scalar types
Unaligned access is supported on PPC for non-vector types, and is generally
more efficient than manually expanding the loads and stores.

A few of the existing test cases were using expanded unaligned loads and stores
to test other features (like load/store with update), and for these test cases,
unaligned access remains disabled.

llvm-svn: 177160
2013-03-15 15:27:13 +00:00
Hal Finkel b0fac42987 Protect PPC Altivec patterns with a predicate
In preparation for the addition of other SIMD ISA extensions (such as QPX) we
need to make sure that all Altivec patterns are properly predicated on having
Altivec support.

No functionality change intended (one test case needed to be updated b/c it
assumed that Altivec intrinsics would be supported without enabling Altivec
support).

llvm-svn: 177152
2013-03-15 13:21:21 +00:00
Hal Finkel bb420f10e9 Allocate the RS spill slot for any PPC function with spills and a large stack frame
For spills into a large stack frame, the FI-elimination code uses the register
scavenger to obtain a free GPR for use with an r+r-addressed load or store.
When there are no available GPRs, the scavenger gets one by using its spill
slot. Previously, we were not always allocating that spill slot and the RS
would assert when the spill slot was needed.

I don't currently have a small test that triggered the assert, but I've
created a small regression test that verifies that the spill slot is now
added when the stack frame is sufficiently large.

llvm-svn: 177140
2013-03-15 05:06:04 +00:00
Nadav Rotem 4a4827ce21 Add a triple to the test.
llvm-svn: 177131
2013-03-15 00:10:23 +00:00
Nadav Rotem adfa5eaf8c Unaligned loads should use the VMOVUPS opcode.
llvm-svn: 177130
2013-03-14 23:49:44 +00:00
Chad Rosier 4b54f594b4 [fast-isel] The X86FastISel::FastLowerArguments function doesn't properly handle
the win64 calling convention.
rdar://13423768

llvm-svn: 177113
2013-03-14 21:25:04 +00:00
Hal Finkel e987a311ba Not all PPC functions with a frame pointer need a RS spill slot
We used to add a spill slot for the register scavenger whenever the function
has a frame pointer. This is unnecessarily conservative: We may need the spill
slot for dynamic stack allocations, and functions with dynamic stack
allocations always have a FP, but we might also have a FP for other reasons
(such as the user explicitly disabling frame-pointer elimination), and we don't
necessarily need a spill slot for those functions.

The structsinregs test needed adjustment because it disables FP elimination.

llvm-svn: 177106
2013-03-14 19:34:32 +00:00
Arnold Schwaighofer 8070b382ec ARM cost model: Increase cost of some vector selects we do terrible on
By terrible I mean we store/load from the stack.

This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update)
where we decide to vectorize a loop with a VF of 8 resulting in a 25%
degradation on a cortex-a8.

LV: Found an estimated cost of 2 for VF 8 For instruction:   icmp slt i32
LV: Found an estimated cost of 2 for VF 8 For instruction:   select i1, i32, i32

The bug that tracks the CodeGen part is PR14868.

radar://13403975

llvm-svn: 177105
2013-03-14 19:17:02 +00:00
Jyotsna Verma ec613665c2 Hexagon: Removed asserts regarding alignment and offset.
We are warning the user about the alignment, so we should not assert.

llvm-svn: 177103
2013-03-14 19:08:03 +00:00
Vincent Lejeune 0a22bc4156 R600: Factorize code handling Const Read Port limitation
llvm-svn: 177078
2013-03-14 15:50:45 +00:00
Michael Liao 20d287044c Fix PR15309
- Fix the typo on type checking

llvm-svn: 177010
2013-03-14 06:57:42 +00:00
Jiong Wang 5bbb96d7df test commit: remove blank line.
llvm-svn: 177009
2013-03-14 05:43:59 +00:00
David Blaikie aabfe4f997 Simplify file/directory name handling in DILexicalBlock
llvm-svn: 176993
2013-03-13 22:52:59 +00:00
David Blaikie 0d221159a0 Remove the unused 4th operand for DIFile debug info metadata
llvm-svn: 176983
2013-03-13 22:05:21 +00:00
Arnold Schwaighofer 9f2e0fa52e ARM cost model: Add test case to make sure we would notice a change in CodeGen
In r176898 I updated the cost model to reflect the fact that sext/zext/cast on
v8i32 <-> v8i8 and v16i32 <-> v16i8 are expensive.

This test case is so that we make sure to update the cost model once we fix
CodeGen.

llvm-svn: 176955
2013-03-13 16:25:55 +00:00
David Blaikie 1ca2f36289 Refactor filename/directory in DICompileUnit into a DIFile
This is the next step towards making the metadata for DIScopes have a common
prefix rather than having to delegate based on their tag type.

llvm-svn: 176913
2013-03-13 00:01:35 +00:00
David Blaikie 452c3ff649 Remove unused "isMain" field from DICompileUnit
llvm-svn: 176910
2013-03-12 22:43:04 +00:00
David Blaikie a4f770d51c Update debug info test cases with empty SplitDebugFilename field.
This could be 'null' or the empty string, DIDescriptor::getStringField
coalesces the two cases anyway so it's just a matter of legible/efficient
representation.

The change in behavior of the DICompileUnit::get* functions could be
subsumed by the full verification check - but ideally that should just be an
assertion if we could front-load the actual debug info metadata failure paths.

llvm-svn: 176907
2013-03-12 22:25:36 +00:00
Jan Wen Voung 6dc3076080 Revert the test moves from 176733. Use "REQUIRES: asserts" instead.
llvm-svn: 176873
2013-03-12 16:27:52 +00:00
Hal Finkel 01271c6022 Don't reserve R2 on Darwin/PPC
Now that only the register-scavenger version of the CR spilling code remains,
we no longer need the Darwin R2 hack. Darwin can use R0 as a spare register in
any case where the System V ABI uses it (R0 is special architecturally, and so
is reserved under all common ABIs).

A few test cases needed to be updated to reflect the register-allocation changes.

llvm-svn: 176868
2013-03-12 15:18:14 +00:00
NAKAMURA Takumi e781913ac4 llvm/test/CodeGen/R600/schedule-*.ll: Let them require +Asserts.
llvm-svn: 176835
2013-03-11 23:16:30 +00:00
David Blaikie 47922fb006 Upgrading debug info test cases to be (more) compatible with the current debug info format.
These cases were found by further work to remove support for debug info
versioning. Common cleanups (other than changing the version info in the tag
field) included adding the last parameter to compile_units (recently added for
fission support) and other cases of trailing fields in lexical blocks, compile
units, and subprograms.

llvm-svn: 176834
2013-03-11 22:37:40 +00:00
David Blaikie 789beb5300 Remove duplicate test contents.
llvm-svn: 176831
2013-03-11 22:10:14 +00:00
Nick Lewycky 48beb21185 Fix a crasher newly introduced in r176659/r176649, where fast-isel tries to
lower an expect intrinsic that is a constant expression.

llvm-svn: 176830
2013-03-11 21:44:37 +00:00
Vincent Lejeune e5ecf10a02 R600: Fix JUMP handling so that MachineInstr verification can occur
This allows R600 Target to use the newly created -verify-misched llc flag

llvm-svn: 176819
2013-03-11 18:15:06 +00:00
NAKAMURA Takumi a60c7a0f4b llvm/test/CodeGen/X86/handle-move.ll: Mark it as XFAIL:cygming. Investigating.
llvm-svn: 176808
2013-03-11 16:30:26 +00:00
NAKAMURA Takumi 1e02e73c30 Suppress atomic(32|64).ll as XFAIL on win32 codegen. Investigating.
llvm-svn: 176798
2013-03-11 08:39:48 +00:00
Lang Hames 82d48e7fb0 Remove date from test case file name. The PR number provides a unique ID already.
llvm-svn: 176796
2013-03-11 03:49:23 +00:00
Lang Hames be3d971143 Don't glue users to extract_subreg when selecting the llvm.arm.ldrexd
intrinsic - it can cause impossible-to-schedule subgraphs to be
introduced.

PR15053.

llvm-svn: 176777
2013-03-09 22:56:09 +00:00
Benjamin Kramer 01b75cc0f2 Test case hygiene.
llvm-svn: 176772
2013-03-09 18:25:40 +00:00
Jan Wen Voung 7857a64909 Disable statistics on Release builds and move tests that depend on -stats.
Summary:
Statistics are still available in Release+Asserts (any +Asserts builds),
and stats can also be turned on with LLVM_ENABLE_STATS.

Move some of the FastISel stats that were moved under DEBUG()
back out of DEBUG(), since stats are disabled across the board now.

Many tests depend on grepping "-stats" output.  Move those into
a orig_dir/Stats/. so that they can be marked as unsupported
when building without statistics.

Differential Revision: http://llvm-reviews.chandlerc.com/D486

llvm-svn: 176733
2013-03-08 22:56:31 +00:00
Jakob Stoklund Olesen 8d1aaf21cf Rewrite the physreg part of findLastUseBefore().
To find the last use of a register unit, start from the bottom and scan
upwards until a user is found.

<rdar://problem/13353090>

llvm-svn: 176706
2013-03-08 18:08:57 +00:00
Tom Stellard 5e524897ed R600: Optimize another selectcc case
fold selectcc (selectcc x, y, a, b, cc), b, a, b, setne ->
     selectcc x, y, a, b, cc

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176700
2013-03-08 15:37:11 +00:00
Tom Stellard 2add82de09 R600: Improve custom lowering of select_cc
Two changes:
1. Prefer SET* instructions when possible
2. Handle the CND*_INT case with floating-point args

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176699
2013-03-08 15:37:09 +00:00
Tom Stellard 492ebeabe9 R600: Change operation action from Custom to Expand for BR_CC
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176698
2013-03-08 15:37:07 +00:00
Tom Stellard e8f9f2877b R600: Change operation action from Custom to Expand for SETCC
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176697
2013-03-08 15:37:05 +00:00
Tom Stellard d93ef7afaa LegalizeDAG: Respect the result of TLI.getBooleanContents() when expanding SETCC
llvm-svn: 176695
2013-03-08 15:37:02 +00:00
Vincent Lejeune 2bc2730765 R600: Change addresspace in fold-kcache.ll
AddressSpace definition has changed in a previous commit, reflect it
to avoid false failure.

llvm-svn: 176693
2013-03-08 15:34:07 +00:00
Tim Northover c8f1a5de9f AArch64: specify full triple in test as only Linux works for now.
llvm-svn: 176692
2013-03-08 15:27:30 +00:00
Christian Konig 21442994a7 R600/SI: adjust test to recent changes
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176691
2013-03-08 14:44:00 +00:00
Jyotsna Verma 7825e064b9 Hexagon: Add patterns for zero extended loads from i1->i64.
llvm-svn: 176689
2013-03-08 14:15:15 +00:00
Tim Northover 95f4892d4c AArch64: expand sincos operations, we don't support them.
Patch based on Mans Rullgard's.

llvm-svn: 176688
2013-03-08 13:55:07 +00:00
David Blaikie e7838a4c53 Another test fix for r176671.
llvm-svn: 176679
2013-03-08 02:27:40 +00:00
David Blaikie d17cfedf8b Couple of test fixes for r176671.
Not sure why these aren't failing on my linux machine, but this should cover
it.

llvm-svn: 176678
2013-03-08 02:26:16 +00:00
Bill Wendling 2d915e2c15 Revert r176154 in favor of a better approach.
Code generation makes some basic assumptions about the IR it's been given. In
particular, if there is only one 'invoke' in the function, then that invoke
won't be going away. However, with the advent of the `llvm.donothing' intrinsic,
those invokes may go away. If all of them go away, the landing pad no longer has
any users. This confuses the back-end, which asserts.

This happens with SjLj exceptions, because that's the model that modifies the IR
based on there being invokes, etc. in the function.

Remove any invokes of `llvm.donothing' during SjLj EH preparation. This will
give us a CFG that the back-end won't be confused about. If all of the invokes
in a function are removed, then the SjLj EH prepare pass won't insert the bogus
code the relies upon the invokes being there.
<rdar://problem/13228754&13316637>

llvm-svn: 176677
2013-03-08 02:21:08 +00:00
David Blaikie e5a2f704a4 Upgrade tests to the latest debug info format.
Mostly this is just changing the named metadata (llvm.dbg.sp, llvm.dbg.gv,
llvm.dbg.<func>.lv, etc -> llvm.dbg.cu), adding a few fields to older records
(DIVariable: flags/inlined-at, DICompileUnit: sp/gv/types,
DISubprogram: local variables list)

The tests to update were discovered by a change I'm working on to remove debug
info version support - so any tests using old debug info versions I haven't
updated probably are bad tests or just not actually designed to test debug
info.

llvm-svn: 176671
2013-03-08 00:23:31 +00:00
Chad Rosier 9c1796f877 [fast-isel] Add support for the expect intrinsic.
rdar://13370942

llvm-svn: 176649
2013-03-07 20:42:17 +00:00
Jyotsna Verma c7dcc2fbc5 Hexagon: Handle i8, i16 and i1 Var Args.
llvm-svn: 176647
2013-03-07 20:28:34 +00:00
Jyotsna Verma 2ba0c0b927 Hexagon: Add support to lower block address.
llvm-svn: 176637
2013-03-07 19:10:28 +00:00
Benjamin Kramer 5e5fd6bb4f Move testcase, this is testing extraction not inserting.
llvm-svn: 176635
2013-03-07 18:51:02 +00:00
Benjamin Kramer 2c3d0df8ee X86: Fold EXTRACT_SUBVECTORs of a BUILD_VECTOR into a smaller BUILD_VECTOR.
That can usually be lowered efficiently and is common in sandybridge code.
It would be nice to do this in DAGCombiner but we can't insert arbitrary
BUILD_VECTORs this late.

Fixes PR15462.

llvm-svn: 176634
2013-03-07 18:48:40 +00:00
Jim Grosbach 48a91abc10 SDAG: Handle scalarizing an extend of a <1 x iN> vector.
Just scalarize the element and rebuild a vector of the result type
from that.

rdar://13281568

llvm-svn: 176614
2013-03-07 05:47:54 +00:00
Michael Liao d5cac37dc5 Fix two remaining issue after fixing PR15355 when CMOV is not available
- Phi nodes should be replaced/updated after lowering CMOV into branch
  because 'mainMBB' updating operand in Phi node is changed.
- Add EFLAGS in livein before lowering the 2nd CMOV. It's necessary as
  we will reuse the EFLAGS generated before the 1st lowered CMOV, which
  won't clobber EFLAGS. However, we need explicitly specify that.
- '-attr=-cmov' test case are added.

llvm-svn: 176598
2013-03-07 01:01:29 +00:00
Akira Hatanaka 0f693a8a77 [mips] Custom-legalize BR_JT.
In N64-static, GOT address is needed to compute the branch address.

llvm-svn: 176580
2013-03-06 21:32:03 +00:00
Akira Hatanaka a9cf03fbd7 [mips] Add a line which checks function name. Rename file.
llvm-svn: 176543
2013-03-06 01:58:03 +00:00
Michael Liao da22b30be5 Fix PR15355
- Clear 'mayStore' flag when loading from the atomic variable before the
  spin loop
- Clear kill flag from one use to multiple use in registers forming the
  address to that atomic variable
- don't use a physical register as live-in register in BB (neither entry
  nor landing pad.) by copying it into virtual register

(patch by Cameron Zwarich)

llvm-svn: 176538
2013-03-06 00:17:04 +00:00
Akira Hatanaka 1454ed8ad3 [mips] Remove android calling convention.
This calling convention was added just to handle functions which return vector
of floats. The fix committed in r165585 solves the problem.

llvm-svn: 176530
2013-03-05 23:22:30 +00:00
Akira Hatanaka e092f72956 [mips] Fix MipsCC::analyzeReturn so that, in soft-float mode, fp128 gets
returned in registers $2 and $4.

llvm-svn: 176527
2013-03-05 22:54:59 +00:00
Akira Hatanaka 5f3ba9e595 [mips] Fix MipsTargetLowering::LowerCallResult and LowerReturn to correctly
handle fp128 returns.

llvm-svn: 176523
2013-03-05 22:41:55 +00:00
Akira Hatanaka 3b7391d140 [mips] Fix MipsTargetLowering::LowerCall to pass fp128 arguments in floating
point registers.

llvm-svn: 176521
2013-03-05 22:20:28 +00:00
Akira Hatanaka 4b634fa3b3 [mips] Correct handling of fp128 (long double) formals and read long double
parameters from floating point registers if target is mips64 hard float.

llvm-svn: 176520
2013-03-05 22:13:04 +00:00
Jyotsna Verma 457801f7ab reverting patch 176508.
llvm-svn: 176513
2013-03-05 20:29:23 +00:00
Jyotsna Verma 7179e712dd Hexagon: Add support for lowering block address.
llvm-svn: 176508
2013-03-05 19:37:46 +00:00
Jyotsna Verma 0eeea14e3e Hexagon: Expand addc, adde, subc and sube.
llvm-svn: 176505
2013-03-05 19:04:47 +00:00
Jyotsna Verma f4e324f4fb Hexagon: Add encoding bits to the TFR64 instructions.
Set imMoveImm, isAsCheapAsAMove flags for TFRI instructions.

llvm-svn: 176499
2013-03-05 18:42:28 +00:00
Vincent Lejeune 3b6f20e944 R600: Turn BUILD_VECTOR into Reg_Sequence
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 176487
2013-03-05 15:04:49 +00:00
Vincent Lejeune a199d01e4d R600: Use MUL_IEEE for trig/fdiv intrinsic
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 176485
2013-03-05 15:04:37 +00:00
NAKAMURA Takumi 3ae057c6ab llvm/test/CodeGen/Mips/mips64-f128.ll: Add explicit -mtriple=mips64el-unknown-unknown to appease win32.
FIXME: Is it expected for win32 to affect mips targets?
llvm-svn: 176471
2013-03-05 02:18:59 +00:00
NAKAMURA Takumi cc91d50289 llvm/test/CodeGen/Thumb/iabs.ll: Add explicit -mtriple=thumb-unknown-unknown to appease win32 hosts.
llvm-svn: 176470
2013-03-05 02:18:52 +00:00
Akira Hatanaka c7828356aa [mips] Print move instructions.
"move $4, $5" is printed instead of "or $4, $5, $zero".

llvm-svn: 176455
2013-03-04 22:25:01 +00:00
Jack Carter 0e149b04f6 Mips specific inline assembler constraint 'R'
'R' An address that can be sued in a non-macro load or store.
This patch includes a positive test case.

llvm-svn: 176452
2013-03-04 21:33:15 +00:00
Eli Bendersky 4e1db8d7f7 Reapply r176381, writing the CHECKs in a more forgiving manner to account for
running llvm-objdump on Darwin.

llvm-svn: 176443
2013-03-04 18:20:31 +00:00
Preston Gurd 485296d1e8 Bypass Slow Divides
* Only apply divide bypass optimization when not optimizing for size. 
* Fixed bug caused by constant for 0 value of type Int32,
  used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
  64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.

Patch by Tyler Nowicki!

llvm-svn: 176442
2013-03-04 18:13:57 +00:00
Jim Grosbach a3c5c769d6 ARM: Creating a vector from a lane of another.
The VDUP instruction source register doesn't allow a non-constant lane
index, so make sure we don't construct a ARM::VDUPLANE node asking it to
do so.

rdar://13328063
http://llvm.org/bugs/show_bug.cgi?id=13963

llvm-svn: 176413
2013-03-02 20:16:24 +00:00
Arnold Schwaighofer 99cba9697a ARM NEON: Fix v2f32 float intrinsics
Mark them as expand, they are not legal as our backend does not match them.

llvm-svn: 176410
2013-03-02 19:38:33 +00:00
Michael Gottesman ee45c03fec Revert "Rewrite a test to count emitted instructions without using -stats"
This reverts commit aac7922b8fe7ae733d3fe6697e6789fd730315dc. I am reverting the
commit since it broke the phase 1 public buildbot for a few hours.

http://lab.llvm.org:8013/builders/clang-x86_64-darwin11-nobootstrap-RA/builds/2137

llvm-svn: 176394
2013-03-02 00:53:20 +00:00
Akira Hatanaka ece459bb66 [mips] Fix inefficient code generation.
This patch eliminates the need to emit a constant move instruction when this
pattern is matched:

(select (setgt a, Constant), T, F)

The pattern above effectively turns into this:

(conditional-move (setlt a, Constant + 1), F, T)

llvm-svn: 176384
2013-03-01 21:52:08 +00:00
Eli Bendersky 0091e2ff00 Rewrite a test to count emitted instructions without using -stats
Also removed the comments of "should produce..." because they completely
don't match the actually produced output.

llvm-svn: 176381
2013-03-01 21:34:37 +00:00
Akira Hatanaka 3d055580a9 Set properties for f128 type.
llvm-svn: 176378
2013-03-01 21:11:44 +00:00
Michael Liao d10584e38b Add regression tests (WORKSFORME)
- These tests wont't crash on trunk but would be better to add them so that
  they don't break again in the future.

llvm-svn: 176369
2013-03-01 19:23:37 +00:00
Chad Rosier b3864609cf Generate an error message instead of asserting or segfaulting when we can't
handle indirect register inputs.
rdar://13322011

llvm-svn: 176367
2013-03-01 19:12:05 +00:00
Michael Liao 6af16fc3b7 Fix PR10475
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364
2013-03-01 18:40:30 +00:00
Chad Rosier 9660343b42 Add support for using non-pic code for arm and thumb1 when emitting the sjlj
dispatch code.  As far as I can tell the thumb2 code is behaving as expected.
I was able to compile and run the associated test case for both arm and thumb1.
rdar://13066352

llvm-svn: 176363
2013-03-01 18:30:38 +00:00
Christian Konig 3c54770365 R600/SI: fix sampler tests after fixing wait insertions
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176359
2013-03-01 17:39:05 +00:00
Jyotsna Verma 8425643728 Hexagon: Add constant extender support framework.
llvm-svn: 176358
2013-03-01 17:37:13 +00:00
Akira Hatanaka e9e588dd72 [mips] Remove unused option. Fix 80-column violations.
llvm-svn: 176330
2013-03-01 02:17:02 +00:00
Akira Hatanaka 8f7bfb39be [mips] Add the capability to search delay slot filling instructions in
successor basic blocks.

Currently this is off by default.

llvm-svn: 176329
2013-03-01 02:03:51 +00:00
Akira Hatanaka e01ff9dc60 [mips] Add capability to search in the forward direction for instructions that
can fill the delay slot.

Currently, this is off by default.

llvm-svn: 176320
2013-03-01 00:50:52 +00:00
Akira Hatanaka eb33ced08f [mips] Define class MemDefsUses.
This class tracks dependence between memory instructions using underlying
objects of memory operands. 

llvm-svn: 176313
2013-03-01 00:16:31 +00:00
Tim Northover c3c5c0971d AArch64: be more careful resorting to inefficient addressing for weak vars.
If an otherwise weak var is actually defined in this unit, it can't be
undefined at runtime so we can use normal global variable sequences (ADRP/ADD)
to access it.

llvm-svn: 176259
2013-02-28 14:36:31 +00:00
Tim Northover b9d4fd210b AArch64: don't drop GlobalAddress offset when handling extern_weak decls.
llvm-svn: 176258
2013-02-28 14:36:24 +00:00
Tim Northover 9fafdf6d5a AArch64: Use cbnz instead of cmp/b.ne pair for atomic operations.
llvm-svn: 176253
2013-02-28 13:52:07 +00:00
Jim Grosbach 5f21587648 ARM: FMA is legal only if VFP4 is available.
rdar://13306723

llvm-svn: 176212
2013-02-27 21:31:12 +00:00
Manman Ren 683f59b36c SelectionDAG: If llvm.donothing has a landingpad, we should clear
CurrentCallSite to avoid an assertion failure:
assert(MMI.getCurrentCallSite() == 0 && "Overlapping call sites!");

rdar://problem/13228754

llvm-svn: 176154
2013-02-27 02:11:57 +00:00
Bill Schmidt 8ea7af8e44 Fix PR15332 (patch by Florian Zeitz).
There's no need to generate a stack frame for PPC32 SVR4 when there are
no local variables assigned to the stack, i.e., when no red zone is needed.
(PPC64 supports a red zone, but PPC32 does not.)

llvm-svn: 176124
2013-02-26 21:28:57 +00:00
Chad Rosier 31a9bbcd4a Add a test case for r176066.
llvm-svn: 176119
2013-02-26 20:22:30 +00:00
Chad Rosier d2686ffa56 Remove a few unused arguments.
llvm-svn: 176109
2013-02-26 18:39:31 +00:00
Bill Schmidt 441907dc09 Fix PR15359.
The PowerPC TLS relocation types were not previously added to the
necessary list in MCELFStreamer::fixSymbolsInTLSFixups().  Now they are!

llvm-svn: 176094
2013-02-26 16:41:03 +00:00
Kostya Serebryany cf880b9443 Unify clang/llvm attributes for asan/tsan/msan (LLVM part)
These are two related changes (one in llvm, one in clang).
LLVM: 
- rename address_safety => sanitize_address (the enum value is the same, so we preserve binary compatibility with old bitcode)
- rename thread_safety => sanitize_thread
- rename no_uninitialized_checks -> sanitize_memory

CLANG: 
- add __attribute__((no_sanitize_address)) as a synonym for __attribute__((no_address_safety_analysis))
- add __attribute__((no_sanitize_thread))
- add __attribute__((no_sanitize_memory))

for S in address thread memory
If -fsanitize=S is present and __attribute__((no_sanitize_S)) is not
set llvm attribute sanitize_S

llvm-svn: 176075
2013-02-26 06:58:09 +00:00
Michael Liao ab97668061 Fix PR10499
- Check whether SSE is available before lowering all 1s vector building with
  PCMPEQD, which is only available from SSE2

llvm-svn: 176058
2013-02-25 23:01:03 +00:00
Chad Rosier 0adc042392 Remove extraneous attribute number.
llvm-svn: 176053
2013-02-25 22:06:05 +00:00
Chad Rosier a92ef4ba5b [fast-isel] Add X86FastIsel::FastLowerArguments to handle functions with 6 or
fewer scalar integer (i32 or i64) arguments. It completely eliminates the need
for SDISel for trivial functions.

Also, add the new llc -fast-isel-abort-args option, which is similar to
-fast-isel-abort option, but for formal argument lowering.

llvm-svn: 176052
2013-02-25 21:59:35 +00:00
Andrew Trick 7cf4361912 pre-RA-sched fix: only reevaluate physreg interferences when necessary.
Fixes rdar:13279013: scheduler was blowing up on select instructions.

llvm-svn: 176037
2013-02-25 19:11:48 +00:00
Bill Schmidt b454829981 Fix missing relocation for TLS addressing peephole optimization.
Report and fix due to Kai Nacke.  Testcase update by me.

llvm-svn: 176029
2013-02-25 16:44:35 +00:00
Chandler Carruth 05920b1847 Fix the root cause of PR15348 by correctly handling alignment 0 on
memory intrinsics in the SDAG builder.

When alignment is zero, the lang ref says that *no* alignment
assumptions can be made. This is the exact opposite of the internal API
contracts of the DAG where alignment 0 indicates that the alignment can
be made to be anything desired.

There is another, more explicit alignment that is better suited for the
role of "no alignment at all": an alignment of 1. Map the intrinsic
alignment to this early so that we don't end up generating aligned DAGs.

It is really terrifying that we've never seen this before, but we
suddenly started generating a large number of alignment 0 memcpys due to
the new code to do memcpy-based copying of POD class members. That patch
contains a bug that rounds bitfield alignments down when they are the
first field. This can in turn produce zero alignments.

This fixes weird crashes I've seen in library users of LLVM on 32-bit
hosts, etc.

llvm-svn: 176022
2013-02-25 14:20:21 +00:00
Nadav Rotem b532fca92c Revert r169638 because it broke Mesa llvmpipe tests.
Fix PR15239.

llvm-svn: 175985
2013-02-24 07:09:35 +00:00
Benjamin Kramer ee23dcb461 X86: Disable cmov-memory patterns on subtargets without cmov.
Fixes PR15115.

llvm-svn: 175962
2013-02-23 10:40:58 +00:00
Reed Kotler dacee2bb44 Expand pseudos/macros for Selt. This is the last of the complex
macros.The rest is some small misc. stuff.

llvm-svn: 175950
2013-02-23 03:09:56 +00:00
Akira Hatanaka 02b0e48f6a [mips] Emit call16 operator instead of got_disp. The former allows lazy binding.
llvm-svn: 175920
2013-02-22 21:10:03 +00:00
Peter Collingbourne e049fd2c31 Fix test by matching movaps instead of AVX-only vmovaps
llvm-svn: 175914
2013-02-22 19:53:30 +00:00
Peter Collingbourne 7b57621fb3 x86_64: designate most general purpose and SSE registers as callee save under coldcc
llvm-svn: 175911
2013-02-22 19:19:44 +00:00
Pete Cooper 23e8b6b8c9 Remove unused CHECK lines copied from another test
llvm-svn: 175905
2013-02-22 18:16:21 +00:00
Kristof Beyls 0ba797e8f7 Make ARMAsmPrinter generate the correct alignment specifier syntax in instructions.
The Printer will now print instructions with the correct alignment specifier syntax, like
    vld1.8  {d16}, [r0:64]

llvm-svn: 175884
2013-02-22 10:01:33 +00:00
Reed Kotler 4416cdadd5 Expand mips16 SelT form pseudso/macros.
llvm-svn: 175862
2013-02-22 05:10:51 +00:00
Pete Cooper 047f81a5df Fix isa<> check which could never be true.
It was incorrectly checking a Function* being an IntrinsicInst* which
isn't possible.  It should always have been checking the CallInst* instead.

Added test case for x86 which ensures we only get one constant load.
It was 2 before this change.

rdar://problem/13267920

llvm-svn: 175853
2013-02-22 01:50:38 +00:00
Anshuman Dasgupta d062c70444 Hexagon: Expand cttz, ctlz, and ctpop for now.
llvm-svn: 175783
2013-02-21 19:39:40 +00:00
Jakob Stoklund Olesen 2ff4dc0ff2 Make RAFast::UsedInInstr indexed by register units.
This fixes some problems with too conservative checking where we were
marking all aliases of a register as used, and then also checking all
aliases when allocating a register.

<rdar://problem/13249625>

llvm-svn: 175782
2013-02-21 19:35:21 +00:00
Bill Schmidt 27917785ae Large code model support for PowerPC.
Large code model is identical to medium code model except that the
addis/addi sequence for "local" accesses is never used.  All accesses
use the addis/ld sequence.

The coding changes are straightforward; most of the patch is taken up
with creating variants of the medium model tests for large model.

llvm-svn: 175767
2013-02-21 17:12:27 +00:00
Benjamin Kramer 3238dc0c61 DAGCombiner: Make the post-legalize vector op optimization more aggressive.
A legal BUILD_VECTOR goes in and gets constant folded into another legal
BUILD_VECTOR so we don't lose any legality here. The problematic PPC
optimization that made this check necessary was fixed recently.

llvm-svn: 175759
2013-02-21 15:24:35 +00:00
Tom Stellard 0d171c8877 R600: Fix for Unigine when MachineSched is enabled
Fixes for-loop.cl piglit test

Patch By: Vincent Lejeune

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 175742
2013-02-21 15:06:59 +00:00
Michel Danzer 7f02a8c7a7 R600/SI: Make sure M0 is loaded for V_INTERP_MOV_F32
NOTE: This is a candidate for the Mesa stable branch.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175733
2013-02-21 08:57:10 +00:00
Reed Kotler 97ba5f2772 Expand the sel pseudo/macro. This generates basic blocks where previously
there were inline br .+4 instructions. Soon everything can enjoy the
full instruction scheduling experience.

llvm-svn: 175718
2013-02-21 04:22:38 +00:00
Bill Schmidt f5b474c6c6 PPCDAGToDAGISel::PostprocessISelDAG()
This patch implements the PPCDAGToDAGISel::PostprocessISelDAG virtual
method to perform post-selection peephole optimizations on the DAG
representation.

One optimization is implemented here:  folds to clean up complex
addressing expressions for thread-local storage and medium code
model.  It will also be useful for large code model sequences when
those are added later.  I originally thought about doing this on the
MI representation prior to register assignment, but it's difficult to
do effective global dead code elimination at that point.  DCE is
trivial on the DAG representation.

A typical example of a candidate code sequence in assembly:

   addis 3, 2, globalvar@toc@ha
   addi  3, 3, globalvar@toc@l
   lwz   5, 0(3)

When the final instruction is a load or store with an immediate offset
of zero, the offset from the add-immediate can replace the zero,
provided the relocation information is carried along:

   addis 3, 2, globalvar@toc@ha
   lwz   5, globalvar@toc@l(3)

Since the addi can in general have multiple uses, we need to only
delete the instruction when the last use is removed.

llvm-svn: 175697
2013-02-21 00:38:25 +00:00
Bill Schmidt 4437ad7d94 Stabilize vec_constants.ll
llvm-svn: 175683
2013-02-20 22:43:03 +00:00