Commit Graph

25901 Commits

Author SHA1 Message Date
Robert Khasanov 46409eae8e [x86] Added _addcarry_ and _subborrow_ intrinsics
llvm-svn: 216164
2014-08-21 09:43:43 +00:00
Robert Khasanov 7c5a843646 [x86] Broadwell: ADOX/ADCX. Added _addcarryx_u{32|64} intrinsics to LLVM.
llvm-svn: 216162
2014-08-21 09:27:00 +00:00
Zinovy Nis 0a36cba29d [INDVARS] Extend using of widening of induction variables for the cases of "sub nsw" and "mul nsw" instructions.
Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like:

int N = 100;
float **A;

void foo(int x0, int x1)
{
        float * A_cur = &A[0][0];
        float * A_next = &A[1][0];
        for(int x = x0; x < x1; ++x).
        {
          // Currently only [x+N] case is widened. Others 2 cases lead to sext.
          // This patch fixes it, so all 3 cases do not need sext.
          const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N];
          A_next[x] = div;
        }
}
...
> clang++ test.cpp -march=core-avx2 -Ofast  -fno-unroll-loops -fno-tree-vectorize -S -o -

Differential Revision: http://reviews.llvm.org/D4695

llvm-svn: 216160
2014-08-21 08:25:45 +00:00
David Majnemer 5d1aeba2ea InstCombine: Fold ((A | B) & C1) ^ (B & C2) -> (A & C1) ^ B if C1^C2=-1
Adapted from a patch by Richard Smith, test-case written by me.

llvm-svn: 216157
2014-08-21 05:14:48 +00:00
Jiangning Liu 950844fadb Fix a bug around truncating vector in const prop.
In constant folding stage, "TRUNC" can't handle vector data type.

llvm-svn: 216149
2014-08-21 02:12:35 +00:00
Jiangning Liu deb4b5fc37 Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type".
llvm-svn: 216147
2014-08-21 01:59:30 +00:00
Quentin Colombet 689623009b [PeepholeOptimizer] Take advantage of the isInsertSubreg property in the
advanced copy optimization.

This is the final step patch toward transforming:
udiv    r0, r0, r2
udiv    r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16
bx      lr

into:
udiv    r0, r0, r2
udiv    r1, r1, r3
bx      lr

Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1

and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16

into simple generic GPR copies that the coalescer managed to remove.

<rdar://problem/12702965>

llvm-svn: 216144
2014-08-21 00:19:16 +00:00
Jonathan Roelofs 44937d98a3 Lower thumbv4t & thumbv5 lo->lo copies through a push-pop sequence
On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to
be avoided. This patch trades simplicity for implementation time at the expense
of performance... As they say: correctness first, then performance.

See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few
ideas on how to make this better.

llvm-svn: 216138
2014-08-20 23:38:50 +00:00
Yi Jiang 1a4e73d7bf New InstCombine pattern: (icmp ult/ule (A + C1), C3) | (icmp ult/ule (A + C2), C3) to (icmp ult/ule ((A & ~(C1 ^ C2)) + max(C1, C2)), C3) under certain condition
llvm-svn: 216135
2014-08-20 22:55:40 +00:00
Sanjay Patel bba72c7c1e Don't prevent a vselect of constants from becoming a single load (PR20648).
Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648

This patch checks the operands of a vselect to see if all values are constants.
If yes, bail out of any further attempts to create a blend or shuffle because
SelectionDAGLegalize knows how to turn this kind of vselect into a single load.

This already happens for machines without SSE4.1, so the added checks just send
more targets down that path.

Differential Revision: http://reviews.llvm.org/D4934

llvm-svn: 216121
2014-08-20 20:34:56 +00:00
Duncan P. N. Exon Smith 7bb10f8a85 X86: Add missing triples from r216119
llvm-svn: 216120
2014-08-20 19:58:59 +00:00
Duncan P. N. Exon Smith b18263531d X86: Align the stack on word boundaries in LowerFormalArguments()
The goal of the patch is to implement section 3.2.3 of the AMD64 ABI
correctly.  The controlling sentence is, "The size of each argument gets
rounded up to eightbytes.  Therefore the stack will always be eightbyte
aligned." The equivalent sentence in the i386 ABI page 37 says, "At all
times, the stack pointer should point to a word-aligned area."  For both
architectures, the stack pointer is not being rounded up to the nearest
eightbyte or word between the last normal argument and the first
variadic argument.

Patch by Thomas Jablin!

llvm-svn: 216119
2014-08-20 19:40:59 +00:00
Keno Fischer d750723d29 Do not insert a tail call when returning multiple values on X86
Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=19530.
The problem is that X86ISelLowering erroneously thought the third call
was eligible for tail call elimination.
It would have been if it's return value was actually the one returned
by the calling function, but here that is not the case and
additional values are being returned.

Test Plan: Test case from the original bug report is included.

Reviewers: rafael

Reviewed By: rafael

Subscribers: rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D4968

llvm-svn: 216117
2014-08-20 19:00:37 +00:00
Sanjay Patel f3cfeef2e9 critical-anti-dependency breaker: don't use reg def info from kill insts (PR20308)
In PR20308 ( http://llvm.org/bugs/show_bug.cgi?id=20308 ), the critical-anti-dependency breaker
caused a miscompile because it broke a WAR hazard using a register that it thinks is available
based on info from a kill inst. Until PR18663 is solved, we shouldn't use any def/use info from
a kill because they are really just nops.

This patch adds guard checks for kills around calls to ScanInstruction() where the DefIndices
array is set. For good measure, add an assert in ScanInstruction() so we don't hit this bug again.

The test case is a reduced version of the code from the bug report.

Differential Revision: http://reviews.llvm.org/D4977

llvm-svn: 216114
2014-08-20 18:03:00 +00:00
Quentin Colombet 03e43f8e68 [PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of
the isRegSequence property.

This is a follow-up of r215394 and r215404, which respectively introduces the
isRegSequence property and uses it for ARM.

Thanks to the property introduced by the previous commits, this patch is able
to optimize the following sequence:
vmov	d0, r2, r3
vmov	d1, r0, r1
vmov	r0, s0
vmov	r1, s2
udiv	r0, r1, r0
vmov	r1, s1
vmov	r2, s3
udiv	r1, r2, r1
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

into:
udiv	r0, r0, r2
udiv	r1, r1, r3
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

This patch refactors how the copy optimizations are done in the peephole
optimizer. Prior to this patch, we had one copy-related optimization that
replaced a copy or bitcast by a generic, more suitable (in terms of register
file), copy.

With this patch, the peephole optimizer features two copy-related optimizations:
1. One for rewriting generic copies to generic copies:
PeepholeOptimizer::optimizeCoalescableCopy.
2. One for replacing non-generic copies with generic copies:
PeepholeOptimizer::optimizeUncoalescableCopy.

The goals of these two optimizations are slightly different: one rewrite the
operand of the instruction (#1), the other kills off the non-generic instruction
and replace it by a (sequence of) generic instruction(s).

Both optimizations rely on the ValueTracker introduced in r212100.

The ValueTracker has been refactored to use the information from the
TargetInstrInfo for non-generic instruction. As part of the refactoring, we
switched the tracking from the index of the definition to the actual register
(virtual or physical). This one change is to provide better consistency with
register related APIs and to ease the use of the TargetInstrInfo.

Moreover, this patch introduces a new helper class CopyRewriter used to ease the
rewriting of generic copies (i.e., #1).

Finally, this patch adds a dead code elimination pass right after the peephole
optimizer to get rid of dead code that may appear after rewriting.

This is related to <rdar://problem/12702965>.

Review: http://reviews.llvm.org/D4874
llvm-svn: 216088
2014-08-20 17:41:48 +00:00
Juergen Ributzka e1bb055ed3 [FastISel][AArch64] Don't fold the sign-/zero-extend from i1 into the compare.
This fixes a bug I introduced in a previous commit (r216033). Sign-/Zero-
extension from i1 cannot be folded into the ADDS/SUBS instructions. Instead both
operands have to be sign-/zero-extended with separate instructions.

Related to <rdar://problem/17913111>.

llvm-svn: 216073
2014-08-20 16:34:15 +00:00
Jiangning Liu f841b3b79e Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type
legalization stage. With those two optimizations, fewer signed/zero extension
instructions can be inserted, and then we can expose more opportunities to
Machine CSE pass in back-end.

llvm-svn: 216066
2014-08-20 12:05:15 +00:00
Pavel Chupin 01a4e0a1ef [x32] Fix FrameIndex check in SelectLEA64_32Addr
Summary:
Fixes http://llvm.org/bugs/show_bug.cgi?id=20016 reproducible on new
lea-5.ll case.
Also use RSP/RBP for x32 lea to save 1 byte used for 0x67 prefix in
ESP/EBP case.

Test Plan: lea tests modified to include x32/nacl and new test added

Reviewers: nadav, dschuff, t.p.northover

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D4929

llvm-svn: 216065
2014-08-20 11:59:22 +00:00
Yi Kong c655f0c898 ARM: Fix codegen for rbit intrinsic
LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic.
According to ARM ARM, rbit only takes register as argument, not immediate.
The correct instruction should be rbit <Rd>, <Rm>.

The bug was originally introduced in r211057.

Differential Revision: http://reviews.llvm.org/D4980

llvm-svn: 216064
2014-08-20 10:40:20 +00:00
David Majnemer 42158f3eea InstCombine: Annotate sub with nuw when we prove it's safe
We can prove that a 'sub' can be a 'sub nuw' if the left-hand side is
negative and the right-hand side is non-negative.

llvm-svn: 216045
2014-08-20 07:17:31 +00:00
Peter Collingbourne f39430bd4a [dfsan] Treat vararg custom functions like unimplemented functions.
Because declarations of these functions can appear in places like autoconf
checks, they have to be handled somehow, even though we do not support
vararg custom functions. We do so by printing a warning and calling the
uninstrumented function, as we do for unimplemented functions.

llvm-svn: 216042
2014-08-20 01:40:23 +00:00
Juergen Ributzka 0781b860e4 [FastISel][AArch64] Use the proper FMOV instruction to materialize a +0.0.
Use FMOVWSr/FMOVXDr instead of FMOVSr/FMOVDr, which have the proper register
class to be used with the zero register. This makes the MachineInstruction
verifier happy again.

This is related to <rdar://problem/18027157>.

llvm-svn: 216040
2014-08-20 01:10:36 +00:00
David Majnemer 57d5bc8849 InstCombine: Annotate sub with nsw when we prove it's safe
We can prove that a 'sub' can be a 'sub nsw' under certain conditions:
- The sign bits of the operands is the same.
- Both operands have more than 1 sign bit.

The subtraction cannot be a signed overflow in either case.

llvm-svn: 216037
2014-08-19 23:36:30 +00:00
Juergen Ributzka c0886dd5b0 [FastISel][AArch64] Factor out ADDS/SUBS instruction emission and add support for extensions and shift folding.
Factor out the ADDS/SUBS instruction emission code into helper functions and
make the helper functions more clever to support most of the different ADDS/SUBS
instructions the architecture support. This includes better immedediate support,
shift folding, and sign-/zero-extend folding.

This fixes <rdar://problem/17913111>.

llvm-svn: 216033
2014-08-19 22:29:55 +00:00
Duncan P. N. Exon Smith 0a448fbca3 IR: Implement uselistorder assembly directives
Implement `uselistorder` and `uselistorder_bb` assembly directives,
which allow the use-list order to be recovered when round-tripping to
assembly.

This is the bulk of PR20515.

llvm-svn: 216025
2014-08-19 21:30:15 +00:00
Lang Hames df7be5105a [MCJIT] Add an i386 RuntimeDyldMachO test case.
llvm-svn: 216024
2014-08-19 21:26:36 +00:00
Duncan P. N. Exon Smith c8eccd1147 verify-uselistorder: Force -preserve-bc-use-list-order
llvm-svn: 216022
2014-08-19 21:08:27 +00:00
Juergen Ributzka 6afeffb078 [FastISel][AArch64] Extend floating-point materialization test.
This adds the missing test that I promised for r215753 to test the
materialization of the floating-point value +0.0.

Related to <rdar://problem/18027157>.

llvm-svn: 216019
2014-08-19 20:35:07 +00:00
Juergen Ributzka b46ea081ad Reapply [FastISel][AArch64] Add support for more addressing modes (r215597).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, #3     --> ldr x0, [x0, x1, lsl #3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, #3     --> ldr x0, [x0, x1, sxtw #3]
  ldr x0, [x0, x1]

llvm-svn: 216013
2014-08-19 19:44:17 +00:00
Juergen Ributzka e3698ab6e3 Reapply [FastISel][X86] Add large code model support for materializing floating-point constants (r215595).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
In the large code model for X86 floating-point constants are placed in the
constant pool and materialized by loading from it. Since the constant pool
could be far away, a PC relative load might not work. Therefore we first
materialize the address of the constant pool with a movabsq and then load
from there the floating-point value.

Fixes <rdar://problem/17674628>.

llvm-svn: 216012
2014-08-19 19:44:13 +00:00
Juergen Ributzka 89d187b387 Reapply [FastISel][X86] Use XOR to materialize the "0" value (r215594).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

llvm-svn: 216011
2014-08-19 19:44:10 +00:00
Juergen Ributzka 4952c35afd Reapply [FastISel][X86] Emit more efficient instructions for integer constant materialization (r215593).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
This mostly affects the i64 value type, which always resulted in an 15byte
mobavsq instruction to materialize any constant. The custom code checks the
value of the immediate and tries to use a different and smaller mov
instruction when possible.

This fixes <rdar://problem/17420988>.

llvm-svn: 216010
2014-08-19 19:44:06 +00:00
Juergen Ributzka 7e23f77d82 Reapply [FastISel][AArch64] Make use of the zero register when possible (r215591).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.

Fixes <rdar://problem/17924413>.

llvm-svn: 216009
2014-08-19 19:44:02 +00:00
Juergen Ributzka 4bf6c01cdb Reapply [FastISel] Let the target decide first if it wants to materialize a constant (215588).
Note: This was originally reverted to track down a buildbot error. This commit
exposed a latent bug that was fixed in r215753. Therefore it is reapplied
without any modifications.

I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new
regeressions.

Original commit message:
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.

On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.

On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.

On ARM it would generate unnecessary mov instructions or not use mvn.

This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.

Related to <rdar://problem/17420988>.

llvm-svn: 216006
2014-08-19 19:05:24 +00:00
Renato Golin 06d601fb3e Revert "Small refactor on VectorizerHint for deduplication"
This reverts commit r215994 because MSVC 2012 can't cope with its C++11 goodness.

llvm-svn: 215999
2014-08-19 18:08:50 +00:00
Juergen Ributzka 5460cbfda4 [FastISel][AArch64] Fix a few BuildMI callsites where the result register was added as an operand register.
This fixes a few BuildMI callsites where the result register was added by
using addReg, which is per default a use and therefore an operand register.

Also use the zero register as result register when emitting a compare
instruction (SUBS with unused result register).

llvm-svn: 215997
2014-08-19 17:41:53 +00:00
Renato Golin dd6394d833 Small refactor on VectorizerHint for deduplication
Previously, the hint mechanism relied on clean up passes to remove redundant
metadata, which still showed up if running opt at low levels of optimization.
That also has shown that multiple nodes of the same type, but with different
values could still coexist, even if temporary, and cause confusion if the
next pass got the wrong value.

This patch makes sure that, if metadata already exists in a loop, the hint
mechanism will never append a new node, but always replace the existing one.
It also enhances the algorithm to cope with more metadata types in the future
by just adding a new type, not a lot of code.

llvm-svn: 215994
2014-08-19 17:30:43 +00:00
Toma Tabacu 85618b3194 [mips] Add assembler support for .set arch=x directive.
Summary:
This directive is similar to ".set mipsX".
It is used to change the CPU target of the assembler, enabling it to accept instructions for a specific CPU.

This patch only implements the r4000 CPU (which is treated internally as generic mips3) and the generic ISAs.

Contains work done by Matheus Almeida.

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D4884

llvm-svn: 215978
2014-08-19 14:22:52 +00:00
Mayur Pandey 960507beb4 InstCombine: ((A & ~B) ^ (~A & B)) to A ^ B
Proof using CVC3 follows:
$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR((A & ~B),(~A & B)) = BVXOR(A,B);
$ cvc3 t.cvc
Valid.

Differential Revision: http://reviews.llvm.org/D4898

llvm-svn: 215974
2014-08-19 08:19:19 +00:00
Akira Hatanaka 452ea6698f [X86, X87 stackifier] Do not mark an operand of a debug instruction as kill.
<rdar://problem/16952634>

llvm-svn: 215962
2014-08-19 02:09:57 +00:00
Duncan P. N. Exon Smith 8e3669586b verify-uselistorder: Call verifyModule() and improve output
Call `verifyModule()` after parsing and after every transformation.
Also convert some `DEBUG(dbgs())` to `errs()` to increase visibility
into what's going on.

llvm-svn: 215951
2014-08-18 23:44:14 +00:00
Robin Morisset 9e98e7f7fc Answer to Philip Reames comments
- add check for volatile (probably unneeded, but I agree that we should be conservative about it).
- strengthen condition from isUnordered() to isSimple(), as I don't understand well enough Unordered semantics (and it also matches the comment better this way) to be confident in the previous behaviour (thanks for catching that one, I had missed the case Monotonic/Unordered).
- separate a condition in two.
- lengthen comment about aliasing and loads
- add tests in GVN/atomic.ll

llvm-svn: 215943
2014-08-18 22:18:14 +00:00
Robin Morisset 4ffe8aaa69 Weak relaxing of the constraints on atomics in MemoryDependencyAnalysis
Monotonic accesses do not have to kill the analysis, as long as the QueryInstr is not
itself atomic.

llvm-svn: 215942
2014-08-18 22:18:11 +00:00
Kevin Enderby ec5ca03674 Make llvm-objdump handle both arm and thumb disassembly from the same Mach-O
file with -macho, the Mach-O specific object file parser option.

After some discussion I chose to do this implementation contained in the logic
of llvm-objdump’s MachODump.cpp using a second disassembler for thumb when
needed and with updates mostly contained in the MachOObjectFile class.

llvm-svn: 215931
2014-08-18 20:21:02 +00:00
Oliver Stannard f5469bec97 Teach the AArch64 backend to handle f16
This allows the AArch64 backend to handle fadd, fsub, fmul and fdiv
operations on f16 (half-precision) types by promoting to f32.

llvm-svn: 215891
2014-08-18 14:22:39 +00:00
Oliver Stannard 12993dd916 [ARM,AArch64] Do not tail-call to an externally-defined function with weak linkage
Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.

llvm-svn: 215890
2014-08-18 12:42:15 +00:00
Elena Demikhovsky 34d2d76d25 AVX-512: Fixed a bug in emitting compare for MVT:i1 type.
Added a test.

llvm-svn: 215889
2014-08-18 11:59:06 +00:00
Saleem Abdulrasool 017bd57fce ARM: improve RTABI 4.2 conformance on Linux
The set of functions defined in the RTABI was separated for no real reason.
This brings us closer to proper utilisation of the functions defined by the
RTABI.  It also sets the ground for correctly emitting function calls to AEABI
functions on all AEABI conforming platforms.

The previously existing lie on the behaviour of __ldivmod and __uldivmod is
propagated as it is beyond the scope of the change.

The changes to the test are due to the fact that we now use the divmod functions
which return both the quotient and remainder and thus we no longer need to
invoke two functions on Linux (making it closer to EABI's behaviour).

llvm-svn: 215862
2014-08-17 22:51:02 +00:00
Daniel Sanders f28bf76d88 Revert: r215698 - Current implementation of c.cond.fmt instructions only accept default cc0 register...
It causes a number of regressions when -fintegrated-as is enabled. This happens
because there are codegen-only instructions that incorrectly uses the first
operand as the encoding for the $fcc register. The regressions do not occur when
-via-file-asm is also given.

llvm-svn: 215847
2014-08-17 19:47:47 +00:00
Saleem Abdulrasool 78c44725f8 ARM: correct toggling behaviour
This was a thinko.  The intent was to flip the explicit bits that need toggling
rather than all bits.  This would result in incorrect behaviour (which now is
tested).

Thanks to Nico Weber for pointing this out!

llvm-svn: 215846
2014-08-17 19:20:38 +00:00
Rafael Espindola c66d761b97 llvm-objdump: don't print relocations in non-relocatable files.
This matches the behavior of GNU objdump.

llvm-svn: 215844
2014-08-17 19:09:37 +00:00
Rafael Espindola e45c740370 Fix an off-by-one bug in the target independent llvm-objdump.
It would prevent the display of a single byte instruction before a label.

Patch by Steve King!

llvm-svn: 215837
2014-08-17 16:31:39 +00:00
Owen Anderson a4428aa484 Remove an InstCombine that transformed patterns like (x * uitofp i1 y) to (select y, x, 0.0) when the multiply has fast math flags set.
While this might seem like an obvious canonicalization, there is one subtle problem with it.  The result of the original expression
is undef when x is NaN (remember, fast math flags), but the result of the select is always defined when x is NaN.  This means that the
new expression is strictly more defined than the original one.  One unfortunate consequence of this is that the transform is not reversible!
It's always legal to make increase the defined-ness of an expression, but it's not legal to reduce it.  Thus, targets that prefer the original
form of the expression cannot reverse the transform to recover it.  Another way to think of it is that the transform has lost source-level
information (the fast math flags), which is undesirable.

llvm-svn: 215825
2014-08-17 03:51:29 +00:00
NAKAMURA Takumi 067d4c7c27 llvm/test/CodeGen/X86/fmul-combines.ll: Appease Windows x64. <4 x float> is passed by stack.
llvm-svn: 215821
2014-08-16 22:28:37 +00:00
Matt Arsenault 6cc00429ff Fix fmul combines with constant splat vectors
Fixes things like fmul x, 2 -> fadd x, x

llvm-svn: 215820
2014-08-16 10:14:19 +00:00
Chandler Carruth e020f117ce [x86] Teach lots of the new vector shuffle lowering to use UNPCK
instructions for blend operations at 128 bits. This was a serious hole
in our prior blend lowering.

llvm-svn: 215819
2014-08-16 09:42:15 +00:00
David Majnemer f9a095d606 InstCombine: Combine mul with div.
We can combne a mul with a div if one of the operands is a multiple of
the other:

%mul = mul nsw nuw %a, C1
%ret = udiv %mul, C2
  =>
%ret = mul nsw %a, (C1 / C2)

This can expose further optimization opportunities if we end up
multiplying or dividing by a power of 2.

Consider this small example:

define i32 @f(i32 %a) {
  %mul = mul nuw i32 %a, 14
  %div = udiv exact i32 %mul, 7
  ret i32 %div
}

which gets CodeGen'd to:

    imull       $14, %edi, %eax
    imulq       $613566757, %rax, %rcx
    shrq        $32, %rcx
    subl        %ecx, %eax
    shrl        %eax
    addl        %ecx, %eax
    shrl        $2, %eax
    retq

We can now transform this into:
define i32 @f(i32 %a) {
  %shl = shl nuw i32 %a, 1
  ret i32 %shl
}

which gets CodeGen'd to:

    leal        (%rdi,%rdi), %eax
    retq

This fixes PR20681.

llvm-svn: 215815
2014-08-16 08:55:06 +00:00
Nico Weber ae050bb057 arm asm: Let .fpu enable instructions, PR20447.
I'm not very happy with duplicating the fpu->feature mapping in ARMAsmParser.cpp
and in clang's driver. See the bug for a patch that doesn't do that, and the
review thread [1] for why this duplication exists.

1: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140811/231052.html
llvm-svn: 215811
2014-08-16 05:37:51 +00:00
Duncan P. N. Exon Smith 5a5fd7b1b3 BitcodeReader: Only create one basic block for each blockaddress
Block address forward-references are implemented by creating a
`BasicBlock` ahead of time that gets inserted in the `Function` when
it's eventually encountered.

However, if the same blockaddress was used in two separate functions
that were parsed *before* the referenced function (and the blockaddress
was never used at global scope), two separate basic blocks would get
created, one of which would be forgotten creating invalid IR.

This commit changes the forward-reference logic to create only one basic
block (and always return the same blockaddress).

llvm-svn: 215805
2014-08-16 01:54:37 +00:00
Duncan P. N. Exon Smith 0c39d40c8b IR: Don't add inbounds to GEPs of extern_weak variables
Global variables that have `extern_weak` linkage may be null, so it's
incorrect to add `inbounds` when constant folding.

This also fixes a bug when parsing global aliases, whose forward
reference placeholders are global variables with `extern_weak` linkage.
If GEPs to these aliases are encountered before the alias itself, the
GEPs would incorrectly gain the `inbounds` keyword as well.

llvm-svn: 215803
2014-08-16 01:54:32 +00:00
Andrea Di Biagio b23bad11e7 [DAGCombiner] Improve the folding of target independet shuffles to Undef.
When combining a pair of shuffle nodes, check if the combined shuffle mask is
trivially Undef. In case, immediately fold that pair of shuffles to Undef.

The lack of checks for undef masks was the root-cause of a poor-codegen bug
in the dag combiner.

Example:
  %1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6>
  %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6>
  %3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3>

Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire
sequence to Undef value and therefore we generated:
  shufps $-123, %xmm1, $xmm0
  pshufd $-46, %xmm0, %xmm0

With this patch, the entire shuffle sequence is folded to Undef and no
shuffles are generated in the output assembly.

Added new test cases to test 'combine-vec-shuffle-5.ll'.

llvm-svn: 215797
2014-08-16 00:29:44 +00:00
Hal Finkel 41a55ad0a5 [PowerPC] Mark fixed-offset byvals as pointed-to by IR values
A byval object, even if allocated at a fixed offset (prescribed by the ABI) is
pointed to by IR values. Most fixed-offset stack objects are not pointed-to by
IR values, so the default is to assume this is not possible. However, we need
to override the default in this case (instruction scheduling can cause
miscompiles otherwise).

Fixes PR20280.

llvm-svn: 215795
2014-08-16 00:17:05 +00:00
Chad Rosier b1bbf6f8ce [AArch32] Add support for FP rounding operations for ARMv8/AArch32.
Phabricator Revision: http://reviews.llvm.org/D4935

llvm-svn: 215772
2014-08-15 21:38:16 +00:00
Rafael Espindola 3931c28b03 Set comdats when lazily linking functions.
We were setting the comdat when functions were copied in the initial pass, but
not when they were linked only when we found out that they are needed.

llvm-svn: 215765
2014-08-15 20:17:08 +00:00
Matt Arsenault fabf545299 R600/SI: Move all fabs / fneg handling to patterns
llvm-svn: 215749
2014-08-15 18:42:22 +00:00
Matt Arsenault 13623d0e28 R600/SI: Use source modifiers for f64 fneg
llvm-svn: 215748
2014-08-15 18:42:18 +00:00
Matt Arsenault a147438e37 R600/SI: Use source modifier for f64 fabs
llvm-svn: 215747
2014-08-15 18:42:15 +00:00
Matt Arsenault b2baffaffd R600/SI: Fix offset folding in some cases with shifted pointers.
Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
is only done if the add has one use. If the resulting constant
add can be folded into an addressing mode, force this to happen
for the pointer operand.

This ends up happening a lot because of how LDS objects are allocated.
Since the globals are allocated next to each other, acessing the first
element of the second object is directly indexed by a shifted pointer.

llvm-svn: 215739
2014-08-15 17:49:05 +00:00
Chandler Carruth 03f456abbe [x86] Teach the new AVX v4f64 shuffle lowering to use UNPCK instructions
where applicable for blending.

llvm-svn: 215737
2014-08-15 17:42:00 +00:00
Matt Arsenault 2e7cc48baa R600/SI: Add intrinsic for ldexp
llvm-svn: 215734
2014-08-15 17:30:25 +00:00
Juergen Ributzka a6faae3c11 [FastISel][ARM] Fix unit test from r215682.
Thanks Jim for finding this.

llvm-svn: 215733
2014-08-15 17:23:20 +00:00
Matt Arsenault 5015a89aa5 R600/SI: Implement isLegalAddressingMode
The default assumes that a 16-bit signed offset is used.
LDS instruction use a 16-bit unsigned offset, so it wasn't
being used in some cases where it was assumed a negative offset
could be used.

More should be done here, but first isLegalAddressingMode needs
to gain an addressing mode argument. For now, copy most of the rest
of the default implementation with the immediate offset change.

llvm-svn: 215732
2014-08-15 17:17:07 +00:00
Moritz Roth 8f3765625e ARM: Fix and re-enable load/store optimizer for Thumb1.
In a previous iteration of the pass, we would try to compensate for
writeback by updating later instructions and/or inserting a SUBS to
reset the base register if necessary.
Since such a SUBS sets the condition flags it's not generally safe to do
this. For now, only merge LDR/STRs if there is no writeback to the base
register (LDM that loads into the base register) or the base register is
killed by one of the merged instructions. These cases are clear wins
both in terms of instruction count and performance.

Also add three new test cases, and update the existing ones accordingly.

llvm-svn: 215729
2014-08-15 17:00:30 +00:00
Amara Emerson 82da7d07a1 [AArch64] Narrow arguments passed in wrong position on the stack in
big-endian mode.

Patch by Asiri Rathnayake.

Differential Revision: http://reviews.llvm.org/D4922

llvm-svn: 215716
2014-08-15 14:29:57 +00:00
Rafael Espindola d610ba99cb Remove HasLEB128.
We already require CFI, so it should be safe to require .leb128 and .uleb128.

llvm-svn: 215712
2014-08-15 14:01:07 +00:00
Bill Schmidt 9bac9ca796 [PPC64] Add test case for r215685.
I had deferred adding this test case until I could get it down to a
reasonable size.  That's done now.

Thanks,
Bill

llvm-svn: 215711
2014-08-15 13:51:57 +00:00
Chandler Carruth f88612581e [x86] Add the initial skeleton of type-based dispatch for AVX vectors in
the new shuffle lowering and an implementation for v4 shuffles.

This allows us to handle non-half-crossing shuffles directly for v4
shuffles, both integer and floating point. This currently misses places
where we could perform the blend via UNPCK instructions, but otherwise
generates equally good or better code for the test cases included to the
existing vector shuffle lowering. There are a few cases that are
entertainingly better. ;]

llvm-svn: 215702
2014-08-15 11:01:40 +00:00
Tim Northover ee843ef0fa ARM: implement MRS/MSR (banked reg) system instructions.
These are system-only instructions for CPUs with virtualization
extensions, allowing a hypervisor easy access to all of the various
different AArch32 registers.

rdar://problem/17861345

llvm-svn: 215700
2014-08-15 10:47:12 +00:00
Vladimir Medic 8d380fa37a Current implementation of c.cond.fmt instructions only accept default cc0 register. This patch enables the instruction to accept other fcc registers. The aliases with default fcc0 registers are also defined.
llvm-svn: 215698
2014-08-15 09:29:30 +00:00
Chandler Carruth 17fd848bfa [x86] Fix the very broken formation of vpunpck instructions in the
target-specific shuffl DAG combines.

We were recognizing the paired shuffles backwards. This code needs to be
replaced anyways as we have the same functionality elsewhere, but I'll
do the refactoring in a follow-up, this is the minimal fix to the
behavior.

In addition to fixing miscompiles with the new vector shuffle lowering,
it also causes the canonicalization to kick in much better, selecting
the smaller encoding variants in lots of places in the new AVX path.
This still isn't quite ideal as we don't need both the shufpd and the
punpck instructions, but that'll get fixed in a follow-up patch.

llvm-svn: 215690
2014-08-15 03:54:49 +00:00
Chandler Carruth 372c143c2f [x86] Fix PR20540 where the x86 shuffle DAG combiner had completely
broken logic for merging shuffle masks in the face of SM_SentinelZero
mask operands.

While these are '-1' they don't mean 'undef' the way '-1' means in the
pre-legalized shuffle masks. Instead, they mean that the shuffle
operation is forcibly zeroing that lane. Reflect this and explicitly
handle it in a bunch of places. In one place the effect is equivalent
but much more clear. In the rest it was really weirdly broken.

Also, rewrite the entire merging thing to be a more directy operation
with a single loop and just doing math to map the indices through the
various masks.

Also add a bunch of asserts to try to make in extremely clear what the
different masks can possibly look like.

Finally, add some comments to clarify that we're merging shuffle masks
*up* here rather than *down* as we do everywhere else, and thus the
logic is quite confusing.

Thanks to several different people for sending test cases, and for
Robert Khasanov for an initial attempt at fixing.

llvm-svn: 215687
2014-08-15 02:43:18 +00:00
Juergen Ributzka 81db58e177 [FastISel][ARM] Fall-back to constant pool loads when materializing an i32 constant.
FastEmit_i won't always succeed to materialize an i32 constant and just fail.
This would trigger a fall-back to SelectionDAG, which is really not necessary.

This fix will first fall-back to a constant pool load to materialize the constant
before giving up for good.

This fixes <rdar://problem/18022633>.

llvm-svn: 215682
2014-08-14 23:29:49 +00:00
Hal Finkel 61c386126b Copy noalias metadata from call sites to inlined instructions
When a call site with noalias metadata is inlined, that metadata can be
propagated directly to the inlined instructions (only those that might access
memory because it is not useful on the others). Prior to inlining, the noalias
metadata could express that a call would not alias with some other memory
access, which implies that no instruction within that called function would
alias. By propagating the metadata to the inlined instructions, we preserve
that knowledge.

This should complete the enhancements requested in PR20500.

llvm-svn: 215676
2014-08-14 21:09:37 +00:00
Juergen Ributzka 790bacf232 Revert several FastISel commits to track down a buildbot error.
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."

llvm-svn: 215673
2014-08-14 19:56:28 +00:00
Adam Nemet 241c3252c3 [AVX512] Add test for FMA masking instrinsics
llvm-svn: 215665
2014-08-14 17:13:33 +00:00
Adam Nemet 6fa675754a [AVX512] Switch FMA intrinsics to the masking version
This does the renaming and updates the lowering logic.

Part of <rdar://problem/17688758>

llvm-svn: 215664
2014-08-14 17:13:30 +00:00
Juergen Ributzka 34ed422c42 Revert "[FastISel][AArch64] Add support for more addressing modes."
This reverts commits r215597, because it might have broken the build bots.

llvm-svn: 215659
2014-08-14 17:10:54 +00:00
Hal Finkel d2dee16c27 Add noalias metadata for general calls (not just memory intrinsics) during inlining
When preserving noalias function parameter attributes by adding noalias
metadata in the inliner, we should do this for general function calls (not just
memory intrinsics). The logic is very similar to what already existed (except
that we want to add this metadata even for functions taking no relevant
parameters). This metadata can be used by ModRef queries in the caller after
inlining.

This addresses the first part of PR20500. Adding noalias metadata during
inlining is still turned off by default.

llvm-svn: 215657
2014-08-14 16:44:03 +00:00
Chad Rosier 11ab941644 [Reassociation] Add support for reassociation with unsafe algebra.
Vector instructions are (still) not supported for either integer or floating
point.  Hopefully, that work will be landed shortly.

llvm-svn: 215647
2014-08-14 15:23:01 +00:00
Sanjay Patel 35d3133650 optimize vector fneg of bitcasted integer value
This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.

This patch is very similar to a fabs patch committed at r214892.

Differential Revision: http://reviews.llvm.org/D4852

llvm-svn: 215646
2014-08-14 15:15:28 +00:00
Rafael Espindola 36d3ee7c32 Delete support for AuroraUX.
auroraux.org is not resolving.

I will add this to the release notes as soon as I figure out where to put the
3.6 release notes :-)

llvm-svn: 215645
2014-08-14 15:15:09 +00:00
Toma Tabacu 726f1ea2c5 [mips] Improve robustness of some tests.
Summary:
This is done by removing some hardcoded registers like $at or expecting a single digit register to be selected.

Contains work done by Matheus Almeida.

Reviewers: matheusalmeida, dsanders

Reviewed By: dsanders

Subscribers: tomatabacu

Differential Revision: http://reviews.llvm.org/D4227

llvm-svn: 215640
2014-08-14 13:10:48 +00:00
Chandler Carruth a8311b3681 [x86] Begin stubbing out the AVX support in the new vector shuffle
lowering scheme.

Currently, this just directly bails to the fallback path of splitting
the 256-bit vector into two 128-bit vectors, operating there, and then
joining the results back together. While the results are far from
perfect, they are *shockingly* good for what we're doing here. I'll be
layering the rest of the functionality on top of this piece by piece and
updating tests as I go.

Note that 256-bit vectors in this mode are still somewhat WIP. While
I think the code paths that I'm adding here are clean and good-to-go,
there are still a lot of 128-bit assumptions that I'll need to stomp out
as I march through the functional spread here.

llvm-svn: 215637
2014-08-14 12:13:59 +00:00
Toma Tabacu 0d64b20c03 [mips] Add assembler support for the "la $reg,symbol" pseudo-instruction.
Summary:
This pseudo-instruction allows the programmer to load an address from a symbolic expression into a register.

Patch by David Chisnall.
His work was sponsored by: DARPA, AFRL

I've made some minor changes to the original, such as improving the formatting and adding some comments, and I've also added a test case.

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D4808

llvm-svn: 215630
2014-08-14 10:29:17 +00:00
Chandler Carruth 7cd15be784 [SDAG] Fix a bug in the DAG combiner where we would fail to return the
input node after manually adding it to the worklist and using CombineTo.

Once we use CombineTo the input node may have been deleted. Despite this
being *completely confusing* and somewhat broken, the only way to
"correctly" return from a DAG combine after potentially deleting the
input node is to return *that exact node*....

But really, this code should just never have used CombineTo. It won't do
what it wants (returning the node as mentioned above just causes the
combine to infloop). The correct way to combine away a casted load to
a load of the correct type is to RAUW the chain directly and then return
the loaded value to replace the actual value node.

I managed to find this with the vector shuffle fuzzer even though it
clearly has nothing at all to do with vector shuffles and rather those
happen to trigger a load of a constant pool that hits this combine *just
right*. I've included the test as it is small and a nice stress test
that the infrastructure isn't asserting.

llvm-svn: 215622
2014-08-14 08:18:34 +00:00
David Majnemer 698dca0b95 InstCombine: ((A | ~B) ^ (~A | B)) to A ^ B
Proof using CVC3 follows:
$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR((A | ~B),(~A |B)) = BVXOR(A,B);
$ cvc3 t.cvc
Valid.

Patch by Mayur Pandey!

Differential Revision: http://reviews.llvm.org/D4883

llvm-svn: 215621
2014-08-14 06:46:25 +00:00
David Majnemer f1eda23514 Added InstCombine Transform for ((B | C) & A) | B -> B | (A & C)
Transform ((B | C) & A) | B --> B | (A & C)

Z3 Link: http://rise4fun.com/Z3/hP6p

Patch by Sonam Kumari!

Differential Revision: http://reviews.llvm.org/D4865

llvm-svn: 215619
2014-08-14 06:41:38 +00:00
Saleem Abdulrasool bb67af44e1 MC: AsmLexer: handle multi-character CommentStrings correctly
As X86MCAsmInfoDarwin uses '##' as CommentString although a single '#' starts a
comment a workaround for this special case is added.

Fixes divisions in constant expressions for the AArch64 assembler and other
targets which use '//' as CommentString.

Patch by Janne Grunau!

llvm-svn: 215615
2014-08-14 02:51:43 +00:00
Chandler Carruth 8039b16de7 [SDAG] Fix a case where we would iteratively legalize a node during
combining by replacing it with something else but not re-process the
node afterward to remove it.

In a truly remarkable stroke of bad luck, this would (in the test case
attached) end up getting some other node combined into it without ever
getting re-processed. By adding it back on to the worklist, in addition
to deleting the dead nodes more quickly we also ensure that if it
*stops* being dead for any reason it makes it back through the
legalizer. Without this, the test case will end up failing during
instruction selection due to an and node with a type we don't have an
instruction pattern for.

It took many million runs of the shuffle fuzz tester to find this.

llvm-svn: 215611
2014-08-14 01:07:37 +00:00
Akira Hatanaka b74db09c97 [AArch64, fast-isel] Fall back to SelectionDAG to select tail calls.
Certain functions such as objc_autoreleaseReturnValue have to be called as
tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls,
we have to fall back to SelectionDAG to select calls that are marked as tail.

<rdar://problem/17991614>

llvm-svn: 215600
2014-08-13 23:23:58 +00:00
Juergen Ributzka 98347d902e [FastISel][AArch64] Add support for more addressing modes.
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, #3     --> ldr x0, [x0, x1, lsl #3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, #3     --> ldr x0, [x0, x1, sxtw #3]
  ldr x0, [x0, x1]

llvm-svn: 215597
2014-08-13 22:53:29 +00:00
Juergen Ributzka 0f8bc043c5 [FastISel][X86] Add large code model support for materializing floating-point constants.
In the large code model for X86 floating-point constants are placed in the
constant pool and materialized by loading from it. Since the constant pool
could be far away, a PC relative load might not work. Therefore we first
materialize the address of the constant pool with a movabsq and then load
from there the floating-point value.

Fixes <rdar://problem/17674628>.

llvm-svn: 215595
2014-08-13 22:25:35 +00:00
Juergen Ributzka ba8b79e932 [FastISel][X86] Use XOR to materialize the "0" value.
llvm-svn: 215594
2014-08-13 22:22:17 +00:00
Juergen Ributzka 230494b399 [FastISel][X86] Emit more efficient instructions for integer constant materialization.
This mostly affects the i64 value type, which always resulted in an 15byte
mobavsq instruction to materialize any constant. The custom code checks the
value of the immediate and tries to use a different and smaller mov
instruction when possible.

This fixes <rdar://problem/17420988>.

llvm-svn: 215593
2014-08-13 22:18:11 +00:00
Juergen Ributzka 24080d60fa [FastISel][AArch64] Make use of the zero register when possible.
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.

Fixes <rdar://problem/17924413>.

llvm-svn: 215591
2014-08-13 22:13:14 +00:00
Juergen Ributzka 7cee768e55 [FastISel] Let the target decide first if it wants to materialize a constant.
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.

On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.

On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.

On ARM it would generate unnecessary mov instructions or not use mvn.

This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.

Related to <rdar://problem/17420988>.

llvm-svn: 215588
2014-08-13 22:08:02 +00:00
Juergen Ributzka a5b083853c [FastISel][ARM] Use MOVT/MOVW if the subtarget requests it.
This change is also in preparation for a future change to make sure that
the constant materialization uses MOVT/MOVW when available and not a load
from the constant pool.

llvm-svn: 215584
2014-08-13 21:42:19 +00:00
Benjamin Kramer 46d1544d70 Fix (re-)creation of unittest lit.site.cfg for clang-tools-extra.
This has been hiding really well. Hopefully brings the builders suffering from
outdated lit.site.cfg files back to life.

llvm-svn: 215575
2014-08-13 20:41:26 +00:00
Jan Vesely 0cd3ec6cfa utils: Fix segfault in flattencfg
v2: continue iterating through the rest of the bb
    use for loop

v3: initialize FlattenCFG pass in ScalarOps
    add test

v4: split off initializing flattencfg to a separate patch
    add comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215574
2014-08-13 20:31:53 +00:00
Matt Arsenault 74ef277774 R600: Correctly set the src value offset for scalarized kernel args
This for some reason fixes v1i64 kernel arguments on pre-SI. This
currently breaks some other cases in the kernel-args.ll test for R600,
but I'm not particularly confident in the new output. VTX_READ_* are not
used for some of the scalarized cases, and the code reading from the
constant buffer doesn't make much sense to me.

llvm-svn: 215564
2014-08-13 18:14:11 +00:00
Andrea Di Biagio ace8e1e3d4 [DAGCombiner] Improved target independent vector shuffle combine rule.
This patch improves the existing algorithm in DAGCombiner that
attempts to fold shuffles according to rule:
  shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3)

Before this change, there were cases where the DAGCombiner conservatively
avoided folding shuffles even if the resulting mask would have been legal.
That is because the algorithm wrongly assumed that commuting
an illegal shuffle mask would always produce an illegal mask.

With this change, we now correctly compute the commuted shuffle mask before
calling method 'isShuffleMaskLegal' on it.
On X86, this improves for example the codegen for the following function:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7>
  %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3>
  ret <4 x i32> %2
}

Before this change the X86 backend (-mcpu=corei7) generated
the following assembly code for function @test:
  shufps $-23, %xmm0, %xmm1  # xmm1 = xmm1[1,2],xmm0[2,3]
  movhlps %xmm1, %xmm1       # xmm1 = xmm1[1,1]
  movaps %xmm1, %xmm0

Now we produce:
  movhlps %xmm0, %xmm0       # xmm0 = xmm0[1,1]

Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly
fold according to the above-mentioned rule.

llvm-svn: 215555
2014-08-13 16:09:40 +00:00
Chandler Carruth 0fb998110a [optnone] Make the optnone attribute effective at suppressing function
attribute and function argument attribute synthesizing and propagating.

As with the other uses of this attribute, the goal remains a best-effort
(no guarantees) attempt to not optimize the function or assume things
about the function when optimizing. This is particularly useful for
compiler testing, bisecting miscompiles, triaging things, etc. I was
hitting specific issues using optnone to isolate test code from a test
driver for my fuzz testing, and this is one step of fixing that.

llvm-svn: 215538
2014-08-13 10:49:33 +00:00
Robert Khasanov ed8829703f [SKX] Extended non-temporal load/store instructions for AVX512VL subsets.
Added avx512_movnt_vl multiclass for handling 256/128-bit forms of instruction.
Added encoding and lowering tests.

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 215536
2014-08-13 10:46:00 +00:00
Daniel Sanders d97a634f12 Re-commit: [mips] Implement .ent, .end, .frame, .mask and .fmask.
Patch by Matheus Almeida and Toma Tabacu

The lld test failure on the previous attempt to commit was caused by the
addition of the .pdr section causing the offsets it was checking to change.
This has been fixed by removing the .ent/.end directives from that test since
they weren't really needed.

llvm-svn: 215535
2014-08-13 10:07:34 +00:00
Chandler Carruth 3f92ecc2a0 Revert r215415 which causse MSan to crash on a great deal of C++ code.
I've followed up on the original commit as well.

llvm-svn: 215532
2014-08-13 09:19:39 +00:00
Elena Demikhovsky 51bbd011c3 AVX-512: Fixed a bug in shufflevector lowering.
PALIGNR instruction does not exist in AVX-512F set.
Added a test.

llvm-svn: 215526
2014-08-13 07:58:43 +00:00
Karthik Bhat a4a4db91be InstCombine: Combine (xor (or %a, %b) (xor %a, %b)) to (add %a, %b)
Correctness proof of the transform using CVC3-

$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR(A | B, BVXOR(A,B) ) = A & B;

$ cvc3 t.cvc
Valid.

llvm-svn: 215524
2014-08-13 05:13:14 +00:00
Chandler Carruth b7eda21bb0 [x86] Rewrite a core part of the new vector shuffle lowering to handle
one pesky test case correctly.

This test case caused the old code to infloop occilating between solving
the low-half and the high-half. The 'side balancing' part of
single-input v8 shuffle lowering didn't handle the one pattern which can
cause it to occilate. Fortunately the fuzz testing found this case.
Unfortuately it was *terrible* to handle. I'm really sorry for the
amount and density of the code here, I'd love suggestions on how to
simplify it. I feel like there *must* be a simpler form here, but after
a lot of days I've not found it. This is the only one I've found that
even works. I've added the one pesky test case along with some nice
comments explaining the core problem that we have to solve here.

So far this has survived approximately 32k test cases. More strenuous
fuzzing commencing.

llvm-svn: 215519
2014-08-13 01:25:45 +00:00
Hal Finkel 46ef7ce283 [PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsic
This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store
intrinsics. As with the construction of the MachineMemOperands for the
intrinsic calls used for unaligned load/store lowering, the only slight
complication is that we need to represent a larger memory range than the
loaded/stored value-type size (because the address is rounded down to an
aligned address, and we need to conservatively represent the entire possible
range of the actual access). This required adding an extra size field to
TargetLowering::IntrinsicInfo, and this was done in a way that required no
modifications to other targets (the size defaults to the store size of the
provided memory data type).

This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed).

llvm-svn: 215512
2014-08-13 01:15:40 +00:00
Hal Finkel 415e344f29 Fix classof for ISD::INTRINSIC_W_CHAIN and INTRINSIC_VOID
Unfortunately, our use of the SDNode class hierarchy for INTRINSIC_W_CHAIN and
INTRINSIC_VOID nodes is somewhat broken right now. These nodes sometimes are
used for memory intrinsics (those with MachineMemOperands), and sometimes not.
When not, the nodes are not created as instances of MemIntrinsicSDNode, but
rather created as some other subclass of SDNode using DAG::getNode. When they
are memory intrinsics, they are created using DAG::getMemIntrinsicNode as
instances of MemIntrinsicSDNode. MemIntrinsicSDNode is a subclass of
MemSDNode, but prior to r214452, we had a non-self-consistent setup whereby
MemIntrinsicSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would
return true but MemSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID
would return false. In r214452, MemSDNode::classof was changed to return true
for INTRINSIC_W_CHAIN and INTRINSIC_VOID, which is now self-consistent. The
problem is that neither the pre-r214452 logic and the post-r214452 logic are
really right. The truth is that not all INTRINSIC_W_CHAIN and INTRINSIC_VOID
nodes are instances of MemIntrinsicSDNode (or MemSDNode for that matter), and
the return value from classof needs to reflect that. This was broken before
r214452 (because MemIntrinsicSDNode::classof always returned true), and was
broken afterward (because MemSDNode::classof also always returned true), and
will now be correct.

The minimal solution is to grab one of the SubclassData bits (there is one left
for MemIntrinsicSDNode nodes) and use it to store whether or not a particular
INTRINSIC_W_CHAIN or INTRINSIC_VOID is really an instance of
MemIntrinsicSDNode or not. Doing this allows both MemIntrinsicSDNode::classof
and MemSDNode::classof to return the correct answer for the underlying object
for both the memory-intrinsic and non-memory-intrinsic cases.

This fixes the problem that r214452 created in the SelectionDAGDumper (thanks
to Matt Arsenault for pointing it out).

Because PowerPC does not implement getTgtMemIntrinsic, this change breaks
test/CodeGen/PowerPC/unal-altivec-wint.ll. I've XFAILed it for now, and will
fix it in a follow-up commit.

llvm-svn: 215511
2014-08-13 01:15:37 +00:00
Adam Nemet 18355283ce [AVX512] Verify the code generated for the intrinsic _mm512_broadcastsd_pd
llvm-svn: 215487
2014-08-13 00:30:05 +00:00
Adam Nemet cee9d0a460 [AVX512] Handle valign masking intrinsic via C++ lowering
I think that this will scale better in most cases than adding a Pat<> for each
mapping from the intrinsic DAG to the intruction (i.e. rri, rrik, rrikz).  We
can just lower to the SDNode and have the resulting DAG be matches by the DAG
patterns.

Alternatively (long term), we could keep the Pat<>s but generate them via the
new AVX512_masking multiclass.  The difficulty is that in order to formulate
that we would have to concatenate DAGs.  Currently this is only supported if
the operators of the input DAGs are identical.

llvm-svn: 215473
2014-08-12 21:13:12 +00:00
Matt Arsenault 4815f09bbe Allwo bitcast + struct GEP transform to work with addrspacecast
llvm-svn: 215467
2014-08-12 19:46:13 +00:00
Jan Vesely e5ca27d716 R600: Use optimized 24bit path in udivrem
v2: drop enum keyword
    use correct extension mode
    don't bother computing the sign in unsinged case

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215462
2014-08-12 17:31:20 +00:00
Jan Vesely 4a33bc6206 R600: Use i24 optimized path for SREM
v2: add tests
    rename LowerSDIV24 to LowerSDIVREM24
    handle the rem part in this function

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215460
2014-08-12 17:31:17 +00:00
Duncan P. N. Exon Smith 09d84addb7 Don't upgrade global constructors when reading bitcode
An optional third field was added to `llvm.global_ctors` (and
`llvm.global_dtors`) in r209015.  Most of the code has been changed to
deal with both versions of the variables.  Users of the C API might
create either version, the helper functions in LLVM create the two-field
version, and clang now creates the three-field version.

However, the BitcodeReader was changed to always upgrade to the
three-field version.  This created an unnecessary inconsistency in the
IR before/after serializing to bitcode.

This commit resolves the inconsistency by making the third field truly
optional (and not upgrading in the bitcode reader).  Since `llvm-link`
was relying on this upgrade code, rather than deleting it I've moved it
into `ModuleLinker`, where it upgrades these arrays as necessary to
resolve inconsistencies between modules.

The ideal resolution would be to remove the 2-field version and make the
third field required.  I filed PR20506 to track that.

I changed `test/Bitcode/upgrade-global-ctors.ll` to a negative test and
duplicated the `llvm-link` check in `test/Linker/global_ctors.ll` to
check both upgrade directions.

Since I came across this as part of PR5680 (serializing use-list order),
I've also added the missing `verify-uselistorder` RUN line to
`test/Bitcode/metadata-2.ll`.

llvm-svn: 215457
2014-08-12 16:46:37 +00:00
Rafael Espindola daa117be4c Make the test a bit more strict.
Before it would pass even if @b or @c ended up pointing to a variable named
@a123.

llvm-svn: 215450
2014-08-12 15:55:27 +00:00
Rafael Espindola cb9cd7f913 Add a plugin testcase for merging weak variables.
I initially thought I could implement COMDATs with aliases by just
internalizing GVs instead of dropping them. This is a counter
example: Internalizing one of the @a would make @b and @c point
to different variables.

llvm-svn: 215447
2014-08-12 15:39:14 +00:00
NAKAMURA Takumi 01d9e24f70 llvm/test/TableGen/*Foreach*.td: Remove XFAIL:vg_leak. They have not been failing since r215176.
llvm-svn: 215445
2014-08-12 14:06:21 +00:00
Tim Northover 39c70bbf56 llvm-objdump: print contents of MachO __unwind_info sections
llvm-svn: 215437
2014-08-12 11:52:59 +00:00
Gerolf Hoflehner eb90500d06 [MachineCombiner] Fix for ICE bug 20598
The combiner ignored DBG nodes when checking
the uses of a virtual register.

It combined a sequence like
   %vreg1 = madd %vreg2, %vreg3,...
   DBG_VALUE (%vreg1 ...)
   %vreg4 = add %vreg1,...
to
  %vreg4 = madd %vreg2, %vreg3

leaving behind a dangling DBG_VALUE with
a definition. This triggered an assertion
in the MachineTraceMetrics.cpp module.

llvm-svn: 215431
2014-08-12 07:54:12 +00:00
Adrian Prantl 9724b5c9a4 DebugLocEntry: Restore the comparison predicate from before the
refactoring in 215384. This way it can unique multiple entries describing
the same piece even if they don't have the exact same location.
(The same piece may get merged in and be added from OpenRanges).
There ought to be a more elegant solution for this, though.

llvm-svn: 215418
2014-08-12 01:07:53 +00:00
Reid Kleckner 3ae6e1528a msan: Handle musttail calls
First, avoid calling setTailCall(false) on musttail calls.  The funciton
prototypes should be "congruent", so the shadow layout should be exactly
the same.

Second, avoid inserting instrumentation after a musttail call to
propagate the return value shadow.  We don't need to propagate the
result of a tail call, it should already be in the right place.

Reviewed By: eugenis

Differential Revision: http://reviews.llvm.org/D4331

llvm-svn: 215415
2014-08-12 00:12:43 +00:00
Michael J. Spencer 6b2f5b47d2 [x86] Fold extract_vector_elt of a load into the Load's address computation.
llvm-svn: 215409
2014-08-11 23:49:33 +00:00
David Majnemer ab07f00c64 InstCombine: Combine (add (and %a, %b) (or %a, %b)) to (add %a, %b)
What follows bellow is a correctness proof of the transform using CVC3.

$ < t.cvc
A, B : BITVECTOR(32);

QUERY BVPLUS(32, A & B, A | B) = BVPLUS(32, A, B);

$ cvc3 < t.cvc
Valid.

llvm-svn: 215400
2014-08-11 22:32:02 +00:00
Tom Stellard 155bbb7713 R600/SI: Add a ComplexPattern for selecting MUBUF _OFFSET variant
This saves us from having to copy a 64-bit 0 value into VGPRs for
BUFFER_* instruction which only have a 12-bit immediate offset.

llvm-svn: 215399
2014-08-11 22:18:17 +00:00
Tom Stellard 48194cdd81 R600/SI: Add check for low 32 bits of encoding to mubuf tests
There are no variable values like registers encoded in the low 32 bits of MUBUF
instructions, so it is relatively easy to check these bits, and it will
help prevent us from introducing encoding bugs.

llvm-svn: 215397
2014-08-11 22:18:11 +00:00
Tom Stellard 93ba12f163 R600/SI: Clear lds bit on MUBUF instructions used for private stores
This bit was left uninitialized, which was causing some random failures
of piglit tests.

NOTE: This is a candidate for the 3.5 branch.
llvm-svn: 215396
2014-08-11 22:18:09 +00:00
Tom Stellard e04fd9d430 R600/SI: Fix broken test
llvm-svn: 215395
2014-08-11 22:18:05 +00:00
Quentin Colombet c64c175173 [AArch64] Fix registerAllocator assigns same register for base and wback in
pre/post-index load and store.

Patch by Steven Wu <stevenwu@apple.com>

llvm-svn: 215390
2014-08-11 21:39:53 +00:00
Saleem Abdulrasool 27c78bf131 ARM: try harder to detect non-IT eligible instructions
For many Thumb-1 register register instructions, setting the CPSR is not
permitted inside an IT block.  We would not correctly flag those instructions.
The previous change to identify this scenario was insufficient as it did not
actually catch all the instances.  The current list is formed by manual
inspection of the ARMv6M ARM.

The change to the Thumb2 IT block test is due to the fact that the new more
stringent checking of the MIs results in the If Conversion pass being prevented
from executing (since not all the instructions in the BB are predicable).  This
results in code gen changes.

Thanks to Tim Northover for pointing out that the previous patch was
insufficient and hinting that the use of the v6M ARM would be much easier to use
than the v7 or v8!

llvm-svn: 215382
2014-08-11 20:13:25 +00:00
Rafael Espindola 55b3254ea2 Fix using -plugin-opt=apiflie when also using -plugin-opt=emit-llvm.
llvm-svn: 215378
2014-08-11 19:06:54 +00:00
Sanjay Patel 1f80cde804 Correct a missing RUN line in the ARM codegen test for fneg ops. We should also explicitly specify +/-neonfp.
The bug was introduced at r99570 when use of "-arm-use-neon-fp" was removed.

Differential Revision: http://reviews.llvm.org/D4846

llvm-svn: 215377
2014-08-11 19:04:28 +00:00
Reid Kleckner e7a552b5f2 Add missing test for r215031
llvm-svn: 215374
2014-08-11 18:34:54 +00:00
Reid Kleckner 21aedc4b83 MC: Diagnose an unexpected token in COFF .section instead of asserting
This can easily arise when trying to assemble and ELF style .section
directive for a COFF object file.

llvm-svn: 215373
2014-08-11 18:34:43 +00:00
Rafael Espindola b16196a3e0 Fix use of uninitialized variable.
Fixes linking bitcode files that use the new style comdats for constructors
with ones that don't.

llvm-svn: 215364
2014-08-11 17:07:34 +00:00
Daniel Sanders b9bc75b625 Revert r215359 - [mips] Implement .ent, .end, .frame, .mask and .fmask assembler directives
It seems to cause an lld test (elf/Mips/hilo16-3.test) to fail. Reverted while we investigate.

llvm-svn: 215361
2014-08-11 16:10:19 +00:00
Daniel Sanders 21cf026893 [mips] Implement .ent, .end, .frame, .mask and .fmask assembler directives
Patch by Matheus Almeida and Toma Tabacu

Differential Revision: http://reviews.llvm.org/D4179

llvm-svn: 215359
2014-08-11 15:28:56 +00:00
Tim Northover c532bbd647 AArch64: add support for dynamic-loader relocations
LLD needs them, and it's good to be able to print them properly when
our object dumpers encounter them.

Patch by Daniel Stewart.

llvm-svn: 215352
2014-08-11 10:10:27 +00:00
Tim Northover bb9c88fa73 llvm-readobj: zero out timestamp in COFF auto-generated test files.
The timestamp meant these files changed with each invocation of
relocs.py, confusing matters when we add relocations and need to
update the tests.

llvm-svn: 215350
2014-08-11 09:53:07 +00:00
Oliver Stannard 11790b2dac ARM: __gnu_h2f_ieee and __gnu_f2h_ieee always use the soft-float calling convention
By default, LLVM uses the "C" calling convention for all runtime
library functions. The half-precision FP conversion functions use the
soft-float calling convention, and are needed for some targets which
use the hard-float convention by default, so must have their calling
convention explicitly set.

llvm-svn: 215348
2014-08-11 09:12:32 +00:00
Jiangning Liu dd6e12d71c In Machine CSE pass, the source register of a COPY machine instruction can
be propagated to all its users, and this propagation could increase the 
probability of finding common subexpressions. If the COPY has only one user,
the COPY itself can be removed.

llvm-svn: 215344
2014-08-11 05:17:19 +00:00
Jiangning Liu 40b04fd994 In LVI(Lazy Value Info), originally value on a BB can only be caculated once,
and the lattice will be updated to be a state other than "undefined". This
limiation could miss some opportunities of lowering "overdefined" to be an
even accurate value. So this patch ask the algorithm to try to lower the
lattice value again even if the value has been lowered to be "overdefined".

llvm-svn: 215343
2014-08-11 05:02:04 +00:00
Petar Jovanovic 3a908a0bfc Add support for scalarizing cttz_zero_undef
Follow up to r214266. Add missing case in ScalarizeVectorResult() for
cttz_zero_undef.

Differential Revision: http://reviews.llvm.org/D4813

llvm-svn: 215330
2014-08-10 22:49:54 +00:00
Saleem Abdulrasool ed8885b402 ARM: correct isPredicable for MULS in ThHUMB mode
The ARM ARM states that CPSR may not be updated by a MUL in thumb mode.  Due to
an ordering of Thumb 2 Size Reduction and If Conversion, we would end up
generating a THUMB MULS inside an IT block.

The If Conversion pass uses the TTI isPredicable method to ensure that it can
transform a Basic Block.  However, because we only check for IT handling on
Thumb2 functions, we may miss some cases.  Even then, it only validates that the
CPSR is not *live* rather than it is not accessed.  This corrects the handling
for that particular case since the same restriction does not hold on the vast
majority of the instructions.

This does prevent the IfConversion optimization from kicking in in certain
cases, but generating correct code is more valuable.  Addresses PR20555.

llvm-svn: 215328
2014-08-10 22:20:37 +00:00
Joerg Sonnenberger bfef1dd694 @l and friends adjust their value depending the context used in.
For ori, they are unsigned, for addi, signed. Create a new target
expression type to handle this and evaluate Fixups accordingly.

llvm-svn: 215315
2014-08-10 12:41:50 +00:00
Joerg Sonnenberger 5aab5afcd2 Allow the third argument for the subi family to be an expression.
llvm-svn: 215286
2014-08-09 17:10:26 +00:00
Joerg Sonnenberger 5f6b6cec70 Update disassembler test to check the full dccci/iccci form.
llvm-svn: 215283
2014-08-09 14:01:10 +00:00
Joerg Sonnenberger 0d5e068fd5 Use the full form of dccci and iccci from the early PPC 405 documents,
since the operands are actually used on those cores. Provide aliases for
the only documented case in the newer Power ISA speec.

llvm-svn: 215282
2014-08-09 13:58:31 +00:00
Tom Stellard c0503db9e2 R600/SI: Custom lower CONCAT_VECTORS
This will lower them using register copies rather than loads and stores
to the stack.

llvm-svn: 215270
2014-08-09 01:06:56 +00:00
Tom Stellard 4f575f7aaf R600/SI: Update concat_vectors.ll to check for scratch usage
These tests were using SI-NOT: MOVREL to make sure concat vectors
weren't being lowered to stack loads and stores, but we are using
scratch buffers for the stack now instead of registers, so we need
to add an additional SI-NOT check for scratch buffers.

With this change I was able to uncover one broken test which will
be fixed in a future commit.

llvm-svn: 215269
2014-08-09 01:06:53 +00:00
Joerg Sonnenberger eb9d13fcd1 Allow large immediates for branch instructions in 32bit mode.
llvm-svn: 215240
2014-08-08 20:57:58 +00:00
Joerg Sonnenberger 7ee0f31a8b Provide an implementation of getNoopForMachoTarget for PPC, otherwise
empty functions will assert in the MC object writer.

llvm-svn: 215238
2014-08-08 19:13:23 +00:00
Juergen Ributzka 793f28d274 [FastISel][X86] Fix INC/DEC optimization (r215230)
I accidentally also used INC/DEC for unsigned arithmetic which doesn't work,
because INC/DEC don't set the required flag which is used for the overflow
check.

llvm-svn: 215237
2014-08-08 18:47:04 +00:00
Juergen Ributzka 4022614899 [FastISel][X86] Use INC/DEC when possible for {sadd|ssub}.with.overflow intrinsics.
This is a small peephole optimization to emit INC/DEC when possible.

Fixes <rdar://problem/17952308>.

llvm-svn: 215230
2014-08-08 17:21:37 +00:00
Joerg Sonnenberger 0013b9292d Add support for SPE load/store from memory.
llvm-svn: 215220
2014-08-08 16:43:49 +00:00
Rafael Espindola 40f5446d84 pr20589: Fix duplicated arch flag.
llvm-svn: 215216
2014-08-08 16:18:29 +00:00
Daniel Sanders feb613028b [mips] Invert the abicalls feature bit to be noabicalls so that it's possible for -mno-abicalls to take effect.
Also added the testcase that should have been in r215194.

This behaviour has surprised me a few times now. The problem is that the
generated MipsSubtarget::ParseSubtargetFeatures() contains code like this:

   if ((Bits & Mips::FeatureABICalls) != 0) IsABICalls = true;

so '-abicalls' means 'leave it at the default' and '+abicalls' means 'set it to
true'. In this case, (and the similar -modd-spreg case) I'd like the code to be

  IsABICalls = (Bits & Mips::FeatureABICalls) != 0;

or possibly:

   if ((Bits & Mips::FeatureABICalls) != 0)
     IsABICalls = true;
   else
     IsABICalls = false;

and preferably arrange for 'Bits & Mips::FeatureABICalls' to be true by default
(on some triples).

llvm-svn: 215211
2014-08-08 15:47:17 +00:00
Josh Klontz ac0d28dfe6 Add missing Interpreter intrinsic lowering for sin, cos and ceil
llvm-svn: 215209
2014-08-08 15:00:12 +00:00
Jiangning Liu dcc651f99f [AArch64] Fix a type conversion bug for anlyzing compare.
The bug can cause spec2006/483.xalancbmk failure.

Patched by David Xu.

llvm-svn: 215206
2014-08-08 14:19:29 +00:00
Daniel Sanders c30f30fe8a [mips] Remove reason for XFAIL from a test that isn't actually XFAILed.
llvm-svn: 215201
2014-08-08 12:58:17 +00:00
James Molloy 65b08f5e46 [LoopVectorizer] Enable support for floating-point subtraction reductions
llvm-svn: 215200
2014-08-08 12:41:08 +00:00
James Molloy 3feea9c11a [AArch64] Add an FP load balancing pass for Cortex-A57
For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations.

This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU.

Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness).

This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected.

llvm-svn: 215199
2014-08-08 12:33:21 +00:00
Tim Northover 0f18ff9817 AArch64: stop trying to take control of all UnknownArch triples.
This short-circuited our error reporting for incorrectly specified
target triples (you'd get AArch64 code instead).

Should fix PR20567.

llvm-svn: 215191
2014-08-08 08:27:44 +00:00
Patrik Hagglund b0e86ec814 [pr19635] Revert most of r170537, and add new testcase.
Patch provided by Andrey Kuharev.

Sorry, r170537 was obviously wrong.

llvm-svn: 215190
2014-08-08 08:21:19 +00:00
David Majnemer fe8c7540b0 GlobalOpt: Optimize in the face of insertvalue/extractvalue
GlobalOpt didn't know how to simulate InsertValueInst or
ExtractValueInst.  Optimizing these is pretty straightforward.

N.B. This came up when looking at clang's IRGen for MS ABI member
pointers; they are represented as aggregates.

llvm-svn: 215184
2014-08-08 05:50:43 +00:00
NAKAMURA Takumi 15ac9af4f4 Fix llvm/test/DebugInfo/X86/recursive_inlining.ll to use %llc_dwarf.
llvm-svn: 215181
2014-08-08 02:24:05 +00:00
Adam Nemet 7d498629f1 [AVX512] Add zero-masking variant to AVX512_masking multiclass
This completes one item from the todo-list of r215125 "Generate masking
instruction variants with tablegen".

The AddedComplexity is needed just like for the k variant.

Added a codegen test based on valignq.

llvm-svn: 215173
2014-08-07 23:53:38 +00:00
Adam Nemet fa1f7201fc [AVX512] Add codegen test for the masking variant of valign
The AddedComplexity is needed just like in avx512_perm_3src.  There may be a
bug in the complexity computation...

llvm-svn: 215168
2014-08-07 23:18:18 +00:00
Akira Hatanaka 5acc58fcfb [stack protector] Look through bitcasts to get global variable
__stack_chk_guard.

Handle the case where the pointer operand of the load instruction that loads the
stack guard is not a global variable but instead a bitcast.

%StackGuard = load i8** bitcast (i64** @__stack_chk_guard to i8**)
call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot)

Original test case provided by Ana Pazos.

This fixes PR20558.

llvm-svn: 215167
2014-08-07 23:08:24 +00:00
Adrian Prantl 80c8b2742f Make these regexes stricter by disallowing any additional characters in the output.
Thanks to dblaikie for pointing this out!

llvm-svn: 215166
2014-08-07 23:04:07 +00:00
Arnold Schwaighofer 4fb3c47456 SLPVectorizer: Use the type of the value loaded/stored to get the ABI alignment
We were using the pointer type which is incorrect.

llvm-svn: 215162
2014-08-07 22:47:27 +00:00
Adrian Prantl 03c67849d1 Add a separate testcase for a DWARF expression describing a value in a
subregister.

llvm-svn: 215161
2014-08-07 22:44:34 +00:00
Adrian Prantl 26e66b155f Reflow this comment.
llvm-svn: 215160
2014-08-07 22:44:24 +00:00
David Blaikie 09fdfabdda DebugInfo: Fix overwriting/loss of inlined arguments to recursively inlined functions.
Due to an unnecessary special case, inlined arguments that happened to
be from the same function as they were inlined into were misclassified
as non-inline arguments and would overwrite the non-inlined arguments.

Assert that we never overwrite a function's arguments, and stop
misclassifying inlined arguments as non-inline arguments to fix this
issue.

Excuse the rather crappy test case - handcrafted IR might do better, or
someone who understands better how to tickle the inliner to create a
recursive inlining situation like this (though it may also be necessary
to tickle the variable in a particular way to cause it to be recorded in
the MMI side table and go down this particular path for location
information).

llvm-svn: 215157
2014-08-07 22:22:49 +00:00
Reed Kotler 87048a4c9e fix materialization of one bit constants and global values which are accessed through
a base GOT entry.

Summary:
get tip of tree mips fast-isel to pass test-suite

Two bugs were fixed:

1) one bit booleans were treated as 1 bit signed integers and so the literal '1' could become sign extended.
2) mips uses got for pic but in certain cases, as with string constants for example, many items can be referenced from the same got entry and this case was not handled properly.

Test Plan: test-suite

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: mcrosier

Differential Revision: http://reviews.llvm.org/D4801

llvm-svn: 215155
2014-08-07 22:09:01 +00:00
Eric Christopher b9fd9ed37e Temporarily Revert "Nuke the old JIT." as it's not quite ready to
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.

Approved by Jim Grosbach, Lang Hames, Rafael Espindola.

This reverts commits r215111, 215115, 215116, 215117, 215136.

llvm-svn: 215154
2014-08-07 22:02:54 +00:00
Gerolf Hoflehner 97c383bc36 MachineCombiner Pass for selecting faster instruction sequence on AArch64
Re-commit of r214832,r21469 with a work-around that
avoids the previous problem with gcc build compilers

The work-around is to use SmallVector instead of ArrayRef
of basic blocks in preservesResourceLen()/MachineCombiner.cpp

llvm-svn: 215151
2014-08-07 21:40:58 +00:00
Owen Anderson 6c19ab1b5d Fix a case in SROA where lifetime intrinsics could inhibit alloca promotion. In
this case, the code path dealing with vector promotion was missing the explicit
checks for lifetime intrinsics that were present on the corresponding integer
promotion path.

llvm-svn: 215148
2014-08-07 21:07:35 +00:00
Rafael Espindola cc847b6376 Fix test failure on ARM.
llvm-svn: 215140
2014-08-07 20:33:06 +00:00
Rafael Espindola e20470d399 Remove a few XFAILs.
These tests now pass with MCJIT.

llvm-svn: 215136
2014-08-07 19:35:22 +00:00
Akira Hatanaka bbd33f6766 [Branch probability] Recompute branch weights of tail-merged basic blocks.
BranchFolderPass was not correctly setting the basic block branch weights when
tail-merging created or merged blocks. This patch recomutes the weights of
tail-merged blocks using the following formula:

branch_weight(merged block to successor j) =
sum(block_frequency(bb) * branch_probability(bb -> j))

bb is a block that is in the set of merged blocks.

<rdar://problem/16256423>

llvm-svn: 215135
2014-08-07 19:30:13 +00:00
Joerg Sonnenberger 54c340b76a Add the majority of the remaining SPE instructions.
llvm-svn: 215131
2014-08-07 18:52:39 +00:00
Justin Bogner 1b9f936f91 FileCheck: Add a flag to allow checking empty input
Currently FileCheck errors out on empty input. This is usually the
right thing to do, but makes testing things like "this command does
not emit some error message" hard to test. This usually leads to
people using "command 2>&1 | count 0" instead, and then the bots that
use guard malloc fail a few hours later.

By adding a flag to FileCheck that allows empty inputs, we can make
tests that consist entirely of "CHECK-NOT" lines feasible.

llvm-svn: 215127
2014-08-07 18:40:37 +00:00
Joerg Sonnenberger f74c74693c Indent
llvm-svn: 215126
2014-08-07 18:05:32 +00:00
NAKAMURA Takumi 17ae2fca4c llvm/test/tools/llvm-objdump: Reorganize target-dependent some tests.
llvm-svn: 215122
2014-08-07 17:17:19 +00:00
Rafael Espindola f8b27c41e8 Nuke the old JIT.
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.

Thanks to Lang Hames for making MCJIT a good replacement!

llvm-svn: 215111
2014-08-07 14:21:18 +00:00
Joerg Sonnenberger 84d35dfe96 Add mfasr and mtasr
llvm-svn: 215110
2014-08-07 13:35:34 +00:00
Joerg Sonnenberger 853feaa808 Add mfrtcu and mfrtcl instructions
llvm-svn: 215109
2014-08-07 13:16:58 +00:00
Joerg Sonnenberger 1837a7b4fa Support mttbl and mttbu mnemonic
llvm-svn: 215108
2014-08-07 13:06:23 +00:00