Commit Graph

12912 Commits

Author SHA1 Message Date
Artyom Skrobov a70dfe18d3 Re-apply r237247 - [AArch64] Codegen VMAX/VMIN for safe math cases
No longer breaks SPEC2000/2006

llvm-svn: 237361
2015-05-14 12:59:46 +00:00
Elena Demikhovsky d5b3e376d2 AVX-512: Added i1 type handling for calling conventions.
i1 type is a legal type on AVX-512 and can be passed as parameter or return value.
i1 is promoted to i8 on return and to i32 for call arguments (i8 is also promoted to i32 here).
The result code is similar to the previous X86 targets, where i1 is allways promoted to i8.

llvm-svn: 237350
2015-05-14 09:04:45 +00:00
Ahmed Bougacha 6402ad27c0 [CodeGen] Use standard -not gnueabi- naming for f16 libcalls on Darwin.
Other targets probably should as well.  Since r237161, compiler-rt has
both, but I don't see why anything other than gnueabi would use a
gnueabi naming scheme.

llvm-svn: 237324
2015-05-14 01:00:51 +00:00
Brendon Cahoon d11c92a41c [Hexagon] Generate loop1 instruction for nested loops
loop1 is for the outer loop and loop0 is for the inner loop.

Differential Revision: http://reviews.llvm.org/D9680

llvm-svn: 237266
2015-05-13 17:56:03 +00:00
Brendon Cahoon 254e656862 [Hexagon] Generate hardware loop when loop has a critical edge
The hardware loop pass should try to generate a hardware loop
instruction when the original loop has a critical edge.

Differential Revision: http://reviews.llvm.org/D9678

llvm-svn: 237258
2015-05-13 14:54:24 +00:00
Silviu Baranga 780a3b3be7 Revert r237247 - [AArch64] Codegen VMAX/VMIN.. as it is causing failures in SPEC2000/2006
llvm-svn: 237256
2015-05-13 14:03:18 +00:00
Artyom Skrobov b526681e08 [AArch64] Codegen VMAX/VMIN for safe math cases
llvm-svn: 237247
2015-05-13 12:01:09 +00:00
Sanjoy Das a1d39ba940 [Statepoints] Support for "patchable" statepoints.
Summary:
This change adds two new parameters to the statepoint intrinsic, `i64 id`
and `i32 num_patch_bytes`.  `id` gets propagated to the ID field
in the generated StackMap section.  If the `num_patch_bytes` is
non-zero then the statepoint is lowered to `num_patch_bytes` bytes of
nops instead of a call (the spill and reload code remains unchanged).
A non-zero `num_patch_bytes` is useful in situations where a language
runtime requires complete control over how a call is lowered.

This change brings statepoints one step closer to patchpoints.  With
some additional work (that is not part of this patch) it should be
possible to get rid of `TargetOpcode::STATEPOINT` altogether.

PlaceSafepoints generates `statepoint` wrappers with `id` set to
`0xABCDEF00` (the old default value for the ID reported in the stackmap)
and `num_patch_bytes` set to `0`.  This can be made more sophisticated
later.

Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9546

llvm-svn: 237214
2015-05-12 23:52:24 +00:00
Saleem Abdulrasool ee13fbe848 CodeGen: ignore DEBUG_VALUE nodes in KILL tagging
DEBUG_VALUE nodes do not take part in code generation.  Ignore them when
performing KILL updates.  Addresses PR23486.

llvm-svn: 237211
2015-05-12 23:36:18 +00:00
Chandler Carruth 942fba95e2 Revert r237175: [X86] Always return the sret parameter in eax/rax ...
This commit broke an x86 test and the bots have been broken for well
over an hour now so I'm just reverting.

llvm-svn: 237210
2015-05-12 23:34:27 +00:00
Reid Kleckner b465563b46 [X86] Always return the sret parameter in eax/rax, even on 32-bit
Summary:
This rule was always in the old SysV i386 ABI docs and the new ones that
H.J. Lu has put together, but we never noticed:

  EAX   scratch register; also used to return integer and pointer values
        from functions; also stores the address of a returned struct or union

Fixes PR23491.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9715

llvm-svn: 237175
2015-05-12 20:56:32 +00:00
Sundeep Kushwaha 83a4039c17 [PATCH] [HEXAGON] Add a test program to verify calling convention
for large struct return by value.

Differential Revision: http://reviews.llvm.org/D9709

llvm-svn: 237170
2015-05-12 20:13:10 +00:00
Pat Gavlin c7dc6d6ee7 [Statepoints] Split the calling convention and statepoint flags operand to STATEPOINT into two separate operands.
Differential Revision: http://reviews.llvm.org/D9623

llvm-svn: 237166
2015-05-12 19:50:19 +00:00
Petar Jovanovic e0de8f4efb [Mips] Return false for isFPCloseToIncomingSP()
On Mips, frame pointer points to the same side of the frame as the stack
pointer. This function is used to decide where to put register scavenging
spill slot. So far, it was put on the wrong side of the frame, and thus it
was too far away from $fp when frame was larger than 2^15 bytes.

Patch by Vladimir Radosavljevic.

http://reviews.llvm.org/D8895

llvm-svn: 237153
2015-05-12 17:14:05 +00:00
Tom Stellard 28d13a4b12 R600/SI: add pass to mark CF live ranges as non-spillable
Spilling can insert instructions almost anywhere, and this can mess
up control flow lowering in a multitude of ways, due to instruction
reordering. Let's sort this out the easy way: never spill registers
involved with control flow, i.e. saved EXEC masks.

Unfortunately, this does not work at all with optimizations disabled,
as the register allocator ignores spill weights. This should be
addressed in a future commit.

The test was reduced from the "stacks" shader of [1]. Some issues
trigger the machine verifier while another one is checked manually.

[1] http://madebyevan.com/webgl-path-tracing/

v2: only insert pass with optimizations enabled, merge test runs.

Patch by: Grigori Goronzy

llvm-svn: 237152
2015-05-12 17:13:02 +00:00
Sunil Srivastava d79dfcbc37 Changed renaming of local symbols by inserting a dot vefore the numeric suffix.
One code change and several test changes to match that
details in http://reviews.llvm.org/D9481

llvm-svn: 237150
2015-05-12 16:47:30 +00:00
Tom Stellard 8f96dfc9ea R600/SI: Remove M0Reg register class
It is no longer used.

llvm-svn: 237142
2015-05-12 15:00:52 +00:00
Tom Stellard 381a94a764 R600/SI: Remove explicit m0 operand from DS instructions
Instead add m0 as an implicit operand.  This helps avoid spills
of the m0 register in some cases.

llvm-svn: 237141
2015-05-12 15:00:49 +00:00
Tom Stellard 7b222e110f R600/SI: Make sendmsg test more strict
We want to make sure that the m0 copies are being cse'd.

llvm-svn: 237134
2015-05-12 14:18:16 +00:00
Elena Demikhovsky fae20d3565 AVX-512, X86: Added lowering for shift operations for SKX.
The other changes in the LowerShift() are not functional,
just to make the code more convenient.
So, the functional changes for SKX only.

llvm-svn: 237129
2015-05-12 13:25:46 +00:00
John Brawn 70605f7d22 [ARM] Use AEABI aligned function variants
AEABI defines aligned variants of memcpy etc. that can be faster than
the default version due to not having to do alignment checks. When
emitting target code for these functions make use of these aligned
variants if possible. Also convert memset to memclr if possible.

Differential Revision: http://reviews.llvm.org/D8060

llvm-svn: 237127
2015-05-12 13:13:38 +00:00
Igor Laevsky 87ef5eaf46 Reverse ordering of base and derived pointer during safepoint lowering.
According to the documentation in StackMap section for the safepoint we should have:
"The first Location in each pair describes the base pointer for the object. The second is the derived pointer actually being relocated."
But before this change we emitted them in reverse order - derived pointer first, base pointer second.

llvm-svn: 237126
2015-05-12 13:12:14 +00:00
Vasileios Kalintiris b48c905613 [mips][FastISel] Handle calls with non legal types i8 and i16.
Summary: Allow calls with non legal integer types based on i8 and i16 to be processed by mips fast-isel.

Based on a patch by Reed Kotler.

Test Plan:
"Make check" test forthcoming.
Test-suite passes at O0/O2 and with mips32 r1/r2

Reviewers: rkotler, dsanders

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D6770

llvm-svn: 237121
2015-05-12 12:29:17 +00:00
Vasileios Kalintiris c07f848785 [mips][FastISel] Simplify callabi.ll by using multiple check prefixes.
Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9635

llvm-svn: 237119
2015-05-12 12:17:11 +00:00
Vasileios Kalintiris 32cd69a2eb [mips][FastISel] Allow computation of addresses from constant expressions.
Summary:
Try to compute addresses when the offset from a memory location is a constant
expression.

Based on a patch by Reed Kotler.

Test Plan:
Passes test-suite for -O0/O2 and mips 32 r1/r2

Reviewers: rkotler, dsanders

Subscribers: llvm-commits, aemerson, rfuhler

Differential Revision: http://reviews.llvm.org/D6767

llvm-svn: 237117
2015-05-12 12:08:31 +00:00
Elena Demikhovsky c1ac5d7bd5 AVX-512: select operation for i1 vectors
like: select i1 %cond, <16 x i1> %a, <16 x i1> %b.
I added pseudo-CMOV patterns to resolve the "select".
Added tests for KNL and SKX.

llvm-svn: 237106
2015-05-12 09:36:52 +00:00
Michael Kuperstein 6f5ff6905c [X86] DAGCombine should not assume arbitrary vector types are simple
The X86-specific DAGCombine for stores should not assume vector types are always simple.
This fixes PR23476.

Differential Revision: http://reviews.llvm.org/D9659

llvm-svn: 237097
2015-05-12 07:33:07 +00:00
Eric Christopher 824f42f209 Migrate existing backends that care about software floating point
to use the information in the module rather than TargetOptions.

We've had and clang has used the use-soft-float attribute for some
time now so have the backends set a subtarget feature based on
a particular function now that subtargets are created based on
functions and function attributes.

For the one middle end soft float check go ahead and create
an overloadable TargetLowering::useSoftFloat function that
just checks the TargetSubtargetInfo in all cases.

Also remove the command line option that hard codes whether or
not soft-float is set by using the attribute for all of the
target specific test cases - for the generic just go ahead and
add the attribute in the one case that showed up.

llvm-svn: 237079
2015-05-12 01:26:05 +00:00
Andrew Kaylor cc14f387e8 [WinEH] Handle nested landing pads that return directly to the parent function.
Differential Revision: http://reviews.llvm.org/D9684

llvm-svn: 237063
2015-05-11 23:06:02 +00:00
Andrew Kaylor 762a6bea1f [WinEH] Update exception numbering to give handlers their own base state.
Differential Revision: http://reviews.llvm.org/D9512

llvm-svn: 237014
2015-05-11 19:41:19 +00:00
Pirama Arumuga Nainar af171e7720 [X86] Updates to X86 backend for f16 promotion
Summary:
r235215 adds support for f16 to be considered as a load/store type and
promote f16 operations to f32.

This patch has miscellaneous fixes for the X86 backend so all f16
operations are handled:
1. Set loadextaction for f16 vectors to expand.
2. Handle FP_EXTEND in a switch statement when handling v2f32
3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or
(store (SINT_TO_FP )) to a FILD.

Tests included.

Reviewers: ab, srhines, delena

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9092

llvm-svn: 237004
2015-05-11 17:14:39 +00:00
Elena Demikhovsky 3822af0715 AVX-512: Changed CC parameter in "cmp" intrinsic
from i8 to i32 according to the Intel Spec

by Igor Breger (igor.breger@intel.com)

llvm-svn: 236979
2015-05-11 09:03:14 +00:00
Elena Demikhovsky 0d7e9364d1 AVX-512: Added SKX instructions and intrinsics:
{add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae

By Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236971
2015-05-11 06:05:05 +00:00
Elena Demikhovsky f40342d6a2 AVX-512: fixed UINT_TO_FP operation for 512-bit types.
llvm-svn: 236955
2015-05-10 14:23:52 +00:00
Elena Demikhovsky 75d1489326 AVX-512: fixed a bug in i1 vectors lowering
llvm-svn: 236947
2015-05-10 10:33:32 +00:00
NAKAMURA Takumi 7f52d6d82c llvm/test/CodeGen/AArch64/tailcall_misched_graph.ll: s/REQUIRE/REQUIRES/
llvm-svn: 236928
2015-05-09 05:59:00 +00:00
James Y Knight fca02be3c1 Fix MergeConsecutiveStore for non-byte-sized memory accesses.
The bug showed up as a compile-time assertion failure:
  Assertion `NumBits >= MIN_INT_BITS && "bitwidth too small"' failed
when building msan tests on x86-64.

Prior to r236850, this bug was masked due to a bogus alignment check,
which also accidentally rejected non-byte-sized accesses. Afterwards,
an invalid ElementSizeBytes == 0 got further into the function, and
triggered the assertion failure.

It would probably be a good idea to allow it to handle merging stores
of unusual widths as well, but for now, to un-break it, I'm just
making the minimal fix.

Differential Revision: http://reviews.llvm.org/D9626

llvm-svn: 236927
2015-05-09 03:13:37 +00:00
Pete Cooper d54fb89901 [Fast-ISel] Don't mark the first use of a remat constant as killed.
When emitting something like 'add x, 1000' if we remat the 1000 then we should be able to
mark the vreg containing 1000 as killed.  Given that we go bottom up in fast-isel, a later
use of 1000 will be higher up in the BB and won't kill it, or be impacted by the lower kill.

However, rematerialised constant expressions aren't generated bottom up.  The local value save area
grows downwards.  This means that if you remat 2 constant expressions which both use 1000 then the
first will kill it, then the second, which is *lower* in the BB will read a killed register.

This is the case in the attached test where the 2 GEPs both need to generate 'add x, 6680' for the constant offset.

Note that this commit only makes kill flag generation conservative.  There's nothing else obviously wrong with
the local value save area growing downwards, and in fact it needs to for handling arbitrarily complex constant expressions.

However, it would be nice if there was a solution which would let us generate more accurate kill flags, or just kill flags completely.

llvm-svn: 236922
2015-05-09 00:51:03 +00:00
Arnold Schwaighofer f54b73d681 ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not non-aliasing distinct objects
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.

Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.

Fix for PR23459.

rdar://20740035

llvm-svn: 236916
2015-05-08 23:52:00 +00:00
Hans Wennborg ae0254dabc Switch lowering: cluster adjacent fall-through cases even at -O0
It's cheap to do, and codegen is much faster if cases can be merged
into clusters.

llvm-svn: 236905
2015-05-08 21:23:39 +00:00
Pete Cooper e4bb07ecff [Fast-ISel] Clear kill flags on registers replaced by updateValueMap.
When selecting an extract instruction, we don't actually generate code but instead work out which register we are reading, and rewrite uses of the extract def to the source register.  This is done via updateValueMap,.

However, its possible that the source register we are rewriting *to* to also have uses.  If those uses are after a kill of the value we are rewriting *from* then we have uses after a kill and the verifier fails.

This code checks for the case where the to register is also used, and if so it clears all kill on the from register.  This is conservative, but better that always clearing kills on the from register.

llvm-svn: 236897
2015-05-08 20:46:54 +00:00
Brendon Cahoon bece8edcdd [Hexagon] Generate more hardware loops
Refactored parts of the hardware loop pass to generate
more. Also, added more tests.

Differential Revision: http://reviews.llvm.org/D9568

llvm-svn: 236896
2015-05-08 20:18:21 +00:00
Pete Cooper 7f7c9f1dad [X86] Fast-ISel was incorrectly always killing the source of a truncate.
A trunc from i32 to i1 on x86_64 generates an instruction such as

%vreg19<def> = COPY %vreg9:sub_8bit<kill>; GR8:%vreg19 GR32:%vreg9

However, the copy here should only have the kill flag on the 32-bit path, not the 64-bit one.
Otherwise, we are killing the source of the truncate which could be used later in the program.

llvm-svn: 236890
2015-05-08 18:29:42 +00:00
Pat Gavlin cc0431d1c0 Extend the statepoint intrinsic to allow statepoints to be marked as transitions from GC-aware code to code that is not GC-aware.
This changes the shape of the statepoint intrinsic from:

  @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args)

to:

  @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args)

This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back.

In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation.

Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops.

Differential Revision: http://reviews.llvm.org/D9501

llvm-svn: 236888
2015-05-08 18:07:42 +00:00
Pete Cooper 85b1c48b20 Clear kill flags on all used registers when sinking instructions.
The test here was sinking the AND here to a lower BB:

	%vreg7<def> = ANDWri %vreg8, 0; GPR32common:%vreg7,%vreg8
	TBNZW %vreg8<kill>, 0, <BB#1>; GPR32common:%vreg8

which meant that vreg8 was read after it was killed.

This commit changes the code from clearing kill flags on the AND to clearing flags on all registers used by the AND.

llvm-svn: 236886
2015-05-08 17:54:32 +00:00
Brendon Cahoon df43e68629 [Hexagon] Update AnalyzeBranch, etc target hooks
Improved the AnalyzeBranch, InsertBranch, and RemoveBranch
functions in order to handle more of our branch instructions.
This requires changes to analyzeCompare and PredicateInstructions.
Specifically, we've added support for new value compare jumps,
improved handling of endloop, added more compare instructions,
and improved support for predicate instructions.

Differential Revision: http://reviews.llvm.org/D9559

llvm-svn: 236876
2015-05-08 16:16:29 +00:00
Andrea Di Biagio 84e22b9096 [X86] Teach 'getTargetShuffleMask' how to look through ISD::WrapperRIP when decoding a PSHUFB mask.
The function 'getTargetShuffleMask' already knows how to deal with PSHUFB nodes
where the mask node is a load from constant pool, and the constant pool node
is wrapped by a X86ISD::Wrapper node. This patch extends that logic by teaching
it how to also look through X86ISD::WrapperRIP.

This helps function combineX86ShufflesRecusively to combine more shuffle
sequences containing PSHUFB nodes if we are in RIPRel PIC mode.

Before this change, llc (with -relocation-model=pic -march=x86-64) was unable
to decode a pshufb where the mask was loaded from a constant pool. For example,
the no-op shuffle from test 'x86-fold-pshufb.ll' was not folded into its
operand, so instead of generating a single 'movaps' the backend always
generated a sub-optimal 'movdqa + pshufb' sequence.

Added test x86-fold-pshufb.ll.

llvm-svn: 236863
2015-05-08 15:11:07 +00:00
James Y Knight 38514d2177 Fix test added in r236850 for OSX builders.
Need to specify triple so that llvm emits the asm syntax that the
test expected.

llvm-svn: 236855
2015-05-08 14:04:54 +00:00
James Y Knight 284e7b3d6c Fix alignment checks in MergeConsecutiveStores.
1) check whether the alignment of the memory is sufficient for the
*merged* store or load to be efficient.

Not doing so can result in some ridiculously poor code generation, if
merging creates a vector operation which must be aligned but isn't.

2) DON'T check that the alignment of each load/store is equal. If
you're merging 2 4-byte stores, the first *might* have 8-byte
alignment, but the second certainly will have 4-byte alignment. We do
want to allow those to be merged.

llvm-svn: 236850
2015-05-08 13:47:01 +00:00
Vasileios Kalintiris 42544d6472 [mips] Emit the .insn directive for empty basic blocks.
Summary:
In microMIPS, labels need to know whether they are on code or data. This is
indicated with STO_MIPS_MICROMIPS and can be inferred by being followed
by instructions. For empty basic blocks, we can ensure this by emitting the
.insn directive after the label.

Also, this fixes some failures in our out-of-tree microMIPS buildbots, for the
exception handling regression tests under: SingleSource/Regression/C++/EH

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9530

llvm-svn: 236815
2015-05-08 09:10:15 +00:00
Eric Christopher 81fa35ca24 Now that we have a soft-float attribute, use it instead of the
hard coded command line option for the Mips soft float tests.

llvm-svn: 236801
2015-05-08 00:57:22 +00:00
Pete Cooper ba593ad3f3 Clear kill flags in tail duplication.
If we duplicate an instruction then we must also clear kill flags on any uses we rewrite.
Otherwise we might be killing a register which was used in other BBs.

For example, here the entry BB ended up with these instructions, the ADD having been tail duplicated.

	%vreg24<def> = t2ADDri %vreg10<kill>, 1, pred:14, pred:%noreg, opt:%noreg; GPRnopc:%vreg24 rGPR:%vreg10
	%vreg22<def> = COPY %vreg10; GPR:%vreg22 rGPR:%vreg10

	The copy here is inserted after the add and so needs vreg10 to be live.

llvm-svn: 236782
2015-05-07 21:48:26 +00:00
Pete Cooper f52123b454 [AArch64] Fix sext/zext folding in address arithmetic.
We were accidentally folding a sign/zero extend in to address arithmetic in a different BB when the extend wasn't available there.

Cross BB fast-isel isn't safe, so restrict this to only when the extend is in the same BB as the use.

llvm-svn: 236764
2015-05-07 19:21:36 +00:00
Nemanja Ivanovic f3c94b1e3c Add VSX Scalar loads and stores to the PPC back end
This patch corresponds to review:
http://reviews.llvm.org/D9440

It adds a new register class to the PPC back end to contain single precision
values in VSX registers. Additionally, it adds scalar loads and stores for
VSX registers.

llvm-svn: 236755
2015-05-07 18:24:05 +00:00
Diego Novillo de5b8016ab Fix information loss in branch probability computation.
Summary:
This addresses PR 22718. When branch weights are too large, they were
being clamped to the range [1, MaxWeightForBB]. But this clamping is
only applied to edges that go outside the range, so it distorts the
relative branch probabilities.

This patch changes the weight calculation to scale every branch so the
relative probabilities are preserved. The scaling is done differently
now. First, all the branch weights are added up, and if the sum exceeds
32 bits, it computes an integer scale to bring all the weights within
the range.

The patch fixes an existing test that had slightly wrong branch
probabilities due to the previous clamping. It now gets branch weights
scaled accordingly.

Reviewers: dexonsmith

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9442

llvm-svn: 236750
2015-05-07 17:22:06 +00:00
Sanjay Patel a9f6d3505d [x86] eliminate unnecessary shuffling/moves with unary scalar math ops (PR21507)
Finish the job that was abandoned in D6958 following the refactoring in
http://reviews.llvm.org/rL230221:

1. Uncomment the intrinsic def for the AVX r_Int instruction.
2. Add missing r_Int entries to the load folding tables; there are already
   tests that check these in "test/Codegen/X86/fold-load-unops.ll", so I
   haven't added any more in this patch.
3. Add patterns to solve PR21507 ( https://llvm.org/bugs/show_bug.cgi?id=21507 ).

So instead of this:

  movaps	%xmm0, %xmm1
  rcpss	%xmm1, %xmm1
  movss	%xmm1, %xmm0

We should now get:

  rcpss	%xmm0, %xmm0

And instead of this:

  vsqrtss	%xmm0, %xmm0, %xmm1
  vblendps	$1, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm1[0],xmm0[1,2,3]

We should now get:

  vsqrtss	%xmm0, %xmm0, %xmm0


Differential Revision: http://reviews.llvm.org/D9504

llvm-svn: 236740
2015-05-07 15:48:53 +00:00
Elena Demikhovsky 29792e9a80 AVX-512: Added all forms of FP compare instructions for KNL and SKX.
Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec.

By Igor Breger (igor.breger@intel.com)

llvm-svn: 236714
2015-05-07 11:24:42 +00:00
NAKAMURA Takumi 386c22c0ff llvm/test/CodeGen/X86/llc-override-mcpu-mattr.ll: Tweak not to be affected by x64 Calling Convention.
llvm-svn: 236710
2015-05-07 10:18:28 +00:00
Akira Hatanaka 3058d0f080 Let llc and opt override "-target-cpu" and "-target-features" via command line
options.

This commit fixes a bug in llc and opt where "-mcpu" and "-mattr" wouldn't
override function attributes "-target-cpu" and "-target-features" in the IR.

Differential Revision: http://reviews.llvm.org/D9537

llvm-svn: 236677
2015-05-06 23:54:14 +00:00
Pete Cooper 27483915e8 Handle dead defs in the if converter.
We had code such as this:
  r2 = ...
  t2Bcc

label1:
  ldr ... r2

label2;
  return r2<dead, def>

The if converter was transforming this to
   r2<def> = ...
   return [pred] r2<dead,def>
   ldr <r2, kill>
   return

which fails the machine verifier because the ldr now reads from a dead def.

The fix here detects dead defs in stepForward and passes them back to the caller in the clobbers list.  The caller then clears the dead flag from the def is the value is live.

llvm-svn: 236660
2015-05-06 22:51:04 +00:00
Pete Cooper 54085cdc7b Fix incorrect kill flags in fastisel.
If called twice in the same BB on the same constant, FastISel::fastEmit_ri_ was marking the materialized vreg as killed on each use, instead of only the last use.

Change this to only mark the last use as killed by making earlier uses check if the vreg is already used elsewhere.

llvm-svn: 236650
2015-05-06 22:09:29 +00:00
Pete Cooper d31583ddfb [x86] Fix register class of folded load index reg.
When folding a load in to another instruction, we need to fix the class of the index register
Otherwise, it could be something like GR64 not GR64_NOSP and would fail the machine verifier.

llvm-svn: 236644
2015-05-06 21:37:19 +00:00
Tim Northover e4310fe946 CodeGen: move over-zealous assert into actual if statement.
It's quite possible to encounter an insertvalue instruction that's more deeply
nested than the value we're looking for, but when that happens we really
mustn't compare beyond the end of the index array.

Since I couldn't see any guarantees about what comparisons std::equal makes, we
probably need to directly check the size beforehand. In practice, I suspect
most std::equal implementations would probably bail early, which would be OK.
But just in case...

rdar://20834485

llvm-svn: 236635
2015-05-06 20:07:38 +00:00
Duncan P. N. Exon Smith 653c1099b4 DwarfDebug: Emit number of bytes in .debug_loc entry directly
Emit the number of bytes in a `.debug_loc` entry directly.  The old code
created temp labels (expensive), emitted the difference between them,
and then emitted one on each side of the relevant bytes.

(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`
(the optimized version of ld64's `-save-temps` when linking the
`verify-uselistorder` executable in an LTO bootstrap).  I've hacked
`MCContext::Allocate()` to just call `malloc()` instead of using the
`BumpPtrAllocator` so that the heap profile is easier to read.  As far
as peak memory is concerned, `MCContext::Allocate()` is equivalent to a
leak, since it only gets freed at process teardown.

In my heap profile, this patch drops memory usage of
`DwarfDebug::emitDebugLoc()` from 132.56 MB (11.4%) down to 29.86 MB
(2.7%) at peak memory.  Some of that must be noise from `SmallVector`
(or other) allocations -- peak memory only dropped from 1160 MB down to
1100 MB -- but this nevertheless shaves 5% off the top.)

llvm-svn: 236629
2015-05-06 19:11:20 +00:00
Pawel Bylica 3b0adaf6b0 Readd the regression test from r236584. Calling convention fixed to linux.
llvm-svn: 236610
2015-05-06 16:43:21 +00:00
Pete Cooper d927c6eaf8 [ARM] Fast-Isel was incorrectly selecting <2 x double> adds.
With neon enabled, we reach SelectBinaryFPOp and are able to get registers for a <2 x double> add.

However, we shouldn't actually attempt arithmetic on it as ARMIselLowering says "v2f64 is legal so that QR subregs can be extracted as f64 elements, but neither Neon nor VFP support any arithmetic operations on it."

This commit disables SelectBinaryFPOp for any vector types.  There's already a FIXME to try handle neon.  Doing so would require fixing this conditional which isn't safe for vectors 'VT == MVT::f64 || VT == MVT::i64'

llvm-svn: 236609
2015-05-06 16:39:17 +00:00
Bill Schmidt 5fe2e25f7c [PPC64LE] Adjust vector splats during VSX swap optimization
The initial code drop for VSX swap optimization permitted the
optimization only when all operations in a web of related computation
are lane-insensitive.  For some lane-sensitive operations, we can
still permit the optimization provided that we make adjustments to
those operations.  This patch adds special handling for vector splats
so that their presence doesn't kill the optimization.

Vector splats are lane-sensitive since they identify by number a
vector element to be used as the source of a splat.  When swap
optimizations take place, the desired vector element will move to the
opposite doubleword of the quadword vector.  We thus replace the index
I by (I + N/2) % N, where N is the number of elements in the vector.

A new test case is added to test that swap optimization succeeds when
vector splats are present, and that the proper input element is used
as the source of the splat.

An ancillary change removes SH_BUILDVEC as one of the kinds of special
handling that may be required by VSX swap optimization.  From
experience with GCC, I had expected to need some modifications for
vector build operations, but I did not find that to be the case.

llvm-svn: 236606
2015-05-06 15:40:46 +00:00
Artyom Skrobov 3f8eae92a4 [ARM] generate VMAXNM/VMINNM for a compare followed by a select, in safe math mode too
llvm-svn: 236590
2015-05-06 11:44:10 +00:00
Pawel Bylica b25491faf4 Revert regression test from r236584.
Temporary remove a regression test added in r236584. It fails on Windows.

llvm-svn: 236586
2015-05-06 10:41:46 +00:00
Pawel Bylica 9f1fb9d1ef SelectionDAG: Handle out-of-bounds index in extract vector element
Summary: This patch correctly handles undef case of EXTRACT_VECTOR_ELT node where the element index is constant and not less than vector size.

Test Plan:
CodeGen for X86 test included.
Also one incorrect regression test fixed.

Reviewers: qcolombet, chandlerc, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D9250

llvm-svn: 236584
2015-05-06 10:19:14 +00:00
Ahmed Bougacha e8d0c4ccea [ARM][FastISel] Use TST #1 instead of CMP #0 for select.
Since r234249, i1 are sext instead of zext; because of that, doing
"CMP rN, #0; IT EQ/NE" isn't correct anymore.

"TST #1" is the conservatively correct alternative - the tradeoff being
that it doesn't have a 16-bit encoding -, so use that instead.

llvm-svn: 236569
2015-05-06 04:14:02 +00:00
Sanjoy Das 77b16b78e9 [Statepoints] Remove broken test case.
statepoint-indirect-return.ll breaks on linux systems.  Delete the test
case to make the bots green while I figure out what the right fix is.

llvm-svn: 236568
2015-05-06 02:51:46 +00:00
Sanjoy Das c6bf3e9f12 [StatepointLowering] Don't create temporary instructions. NFCI.
Summary:
Instead of creating a temporary call instruction and lowering that, use
SelectionDAGBuilder::lowerCallOperands.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9480

llvm-svn: 236563
2015-05-06 02:36:20 +00:00
Pete Cooper d0dae3e577 [X86 fast-isel] Constrain the index reg class to not include SP.
The index reg on instructions with complex address modes is a GPR64_NOSP.  Constrain it to appease the machine verifier.

llvm-svn: 236557
2015-05-05 23:41:53 +00:00
Pete Cooper ce9ad757c7 Fix IfConverter to handle regmask machine operands.
Note, this is a recommit of r236515 after fixing an error in r236514.  The buildbot ran fast enough that it picked up r236514 prior to r236515 and threw an error.  r236515 itself ran 'make check' without errors.

Original commit message follows:

A regmask (typically seen on a call) clobbers the set of registers it lists.  The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.

These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier.  Otherwise, uses after the if converted call could think they are reading an undefined register.

Reviewed by Matthias Braun and Quentin Colombet.

llvm-svn: 236550
2015-05-05 22:09:41 +00:00
Peter Collingbourne 85a0e23bc8 Thumb2SizeReduction: Check the correct set of registers for LDMIA.
The register set for LDMIA begins at offset 3, not 4. We were previously
missing the short encoding of this instruction in the case where the base
register was the first register in the register set.

Also clean up some dead code:

- The isARMLowRegister check is redundant with what VerifyLowRegs does;
  replace with an assert.
- Remove handling of LDMDB instruction, which has no short encoding (and
  does not appear in ReduceTable).

Differential Revision: http://reviews.llvm.org/D9485

llvm-svn: 236535
2015-05-05 20:07:10 +00:00
Ulrich Weigand 9958c489bb [DAGCombiner] Account for getVectorIdxTy() when narrowing vector load
This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert
the element number from getVectorIdxTy() to PtrTy before doing pointer
arithmetic on it.  This is needed on z, where element numbers are i32
but pointers are i64.

Original patch by Richard Sandiford.

llvm-svn: 236530
2015-05-05 19:34:10 +00:00
Ulrich Weigand af2c618e2b [DAGCombiner] Fix ReplaceExtractVectorEltOfLoadWithNarrowedLoad for BE
For little-endian, the function would convert (extract_vector_elt (load X), Y)
to X + Y*sizeof(elt).  For big-endian it would instead use
X + sizeof(vec) - Y*sizeof(elt).  The big-endian case wasn't right since
vector index order always follows memory/array order, even for big-endian.
(Note that the current handling has to be wrong for Y==0 since it would
access beyond the end of the vector.)

Original patch by Richard Sandiford.

llvm-svn: 236529
2015-05-05 19:33:37 +00:00
Ulrich Weigand 2693c0a491 [LegalizeVectorTypes] Allow single loads and stores for more short vectors
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal.
E.g. it would load a v4i8 as an i32 if i32 was legal.

This patch extends that behavior to promoted integers as well as legal ones.
If the integer type for the full vector width is TypePromoteInteger,
the element type is going to be TypePromoteInteger too, and it's still
better to use a single promoting load or truncating store rather than N
individual promoting loads or truncating stores.  E.g. if you have a v2i8
on a target where i16 is promoted to i32, it's better to load the v2i8 as
an i16 rather than load both i8s individually.

Original patch by Richard Sandiford.

llvm-svn: 236528
2015-05-05 19:32:57 +00:00
Ulrich Weigand c1708b2618 [SystemZ] Add vector intrinsics
This adds intrinsics to allow access to all of the z13 vector instructions.
Note that instructions whose semantics can be described by standard LLVM IR
do not get any intrinsics.

For each instructions whose semantics *cannot* (fully) be described, we
define an LLVM IR target-specific intrinsic that directly maps to this
instruction.

For instructions that also set the condition code, the LLVM IR intrinsic
returns the post-instruction CC value as a second result.  Instruction
selection will attempt to detect code that compares that CC value against
constants and use the condition code directly instead.

Based on a patch by Richard Sandiford.

llvm-svn: 236527
2015-05-05 19:31:09 +00:00
Ulrich Weigand 5211f9ff4d [SystemZ] Mark v1i128 and v1f128 as unsupported
The ABI specifies that <1 x i128> and <1 x fp128> are supposed to be
passed in vector registers.  We do not yet support those types, and
some infrastructure is missing before we can do so.

In order to prevent accidentally generating code violating the ABI,
this patch adds checks to detect those types and error out if user
code attempts to use them.

llvm-svn: 236526
2015-05-05 19:30:05 +00:00
Ulrich Weigand cd2a1b5341 [SystemZ] Handle sub-128 vectors
The ABI allows sub-128 vectors to be passed and returned in registers,
with the vector occupying the upper part of a register.  We therefore
want to legalize those types by widening the vector rather than promoting
the elements.

The patch includes some simple tests for sub-128 vectors and also tests
that we can recognize various pack sequences, some of which use sub-128
vectors as temporary results.  One of these forms is based on the pack
sequences generated by llvmpipe when no intrinsics are used.

Signed unpacks are recognized as BUILD_VECTORs whose elements are
individually sign-extended.  Unsigned unpacks can have the equivalent
form with zero extension, but they also occur as shuffles in which some
elements are zero.

Based on a patch by Richard Sandiford.

llvm-svn: 236525
2015-05-05 19:29:21 +00:00
Ulrich Weigand 49506d78e7 [SystemZ] Add CodeGen support for scalar f64 ops in vector registers
The z13 vector facility includes some instructions that operate only on the
high f64 in a v2f64, effectively extending the FP register set from 16
to 32 registers.  It's still better to use the old instructions if the
operands happen to fit though, since the older instructions have a shorter
encoding.

Based on a patch by Richard Sandiford.

llvm-svn: 236524
2015-05-05 19:28:34 +00:00
Ulrich Weigand 80b3af7ab3 [SystemZ] Add CodeGen support for v4f32
The architecture doesn't really have any native v4f32 operations except
v4f32->v2f64 and v2f64->v4f32 conversions, with only half of the v4f32
elements being used.  Even so, using vector registers for <4 x float>
and scalarising individual operations is much better than generating
completely scalar code, since there's much less register pressure.
It's also more efficient to do v4f32 comparisons by extending to 2
v2f64s, comparing those, then packing the result.

This particularly helps with llvmpipe.

Based on a patch by Richard Sandiford.

llvm-svn: 236523
2015-05-05 19:27:45 +00:00
Ulrich Weigand cd808237b2 [SystemZ] Add CodeGen support for v2f64
This adds ABI and CodeGen support for the v2f64 type, which is natively
supported by z13 instructions.

Based on a patch by Richard Sandiford.

llvm-svn: 236522
2015-05-05 19:26:48 +00:00
Ulrich Weigand ce4c109585 [SystemZ] Add CodeGen support for integer vector types
This the first of a series of patches to add CodeGen support exploiting
the instructions of the z13 vector facility.  This patch adds support
for the native integer vector types (v16i8, v8i16, v4i32, v2i64).

When the vector facility is present, we default to the new vector ABI.
This is characterized by two major differences:
- Vector types are passed/returned in vector registers
  (except for unnamed arguments of a variable-argument list function).
- Vector types are at most 8-byte aligned.

The reason for the choice of 8-byte vector alignment is that the hardware
is able to efficiently load vectors at 8-byte alignment, and the ABI only
guarantees 8-byte alignment of the stack pointer, so requiring any higher
alignment for vectors would require dynamic stack re-alignment code.

However, for compatibility with old code that may use vector types, when
*not* using the vector facility, the old alignment rules (vector types
are naturally aligned) remain in use.

These alignment rules are not only implemented at the C language level
(implemented in clang), but also at the LLVM IR level.  This is done
by selecting a different DataLayout string depending on whether the
vector ABI is in effect or not.

Based on a patch by Richard Sandiford.

llvm-svn: 236521
2015-05-05 19:25:42 +00:00
Pete Cooper 05b84d4168 Revert "Fix IfConverter to handle regmask machine operands."
This reverts commit b27413cbfd78d959c18e713bfa271fb69e6b3303 (ie r236515).

This is to get the bots green while i investigate the failures.

llvm-svn: 236517
2015-05-05 18:49:05 +00:00
Pete Cooper 6ebc207703 Fix IfConverter to handle regmask machine operands.
A regmask (typically seen on a call) clobbers the set of registers it lists.  The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.

These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier.  Otherwise, uses after the if converted call could think they are reading an undefined register.

Reviewed by Matthias Braun and Quentin Colombet.

llvm-svn: 236515
2015-05-05 18:31:36 +00:00
Reid Kleckner 0738a9c02e Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236360.

This change exposed a bug in WinEHPrepare by opting win32 code into EH
preparation. We already knew that WinEHPrepare has bugs, and is the
status quo for x64, so I don't think that's a reason to hold off on this
change. I disabled exceptions in the sanitizer tests in r236505 and an
earlier revision.

llvm-svn: 236508
2015-05-05 17:44:16 +00:00
Quentin Colombet 61b305edfd [ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.

As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.


** Context **

Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.


** Motivating example **

Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b)  {
 %tmp = alloca i32, align 4
 %tmp2 = icmp slt i32 %a, %b
 br i1 %tmp2, label %true, label %false

true:
 store i32 %a, i32* %tmp, align 4
 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
 br label %false

false:
 %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
 ret i32 %tmp.0
}

On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f:                                     ; @f
; BB#0:
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
LBB0_2:                                 ; %false
  mov  sp, x29
  ldp x29, x30, [sp], #16
  ret

With shrink-wrapping we could generate:
_f:                                     ; @f
; BB#0:
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
  add sp, x29, #16            ; =16
  ldp x29, x30, [sp], #16
LBB0_2:                                 ; %false
  ret

Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.


** Proposed Solution **

This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.

Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.

The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.

Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.


** Design Decisions **

1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
  pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.

Differential Revision: http://reviews.llvm.org/D9210

<rdar://problem/3201744>

llvm-svn: 236507
2015-05-05 17:38:16 +00:00
Kit Barton d4eb73c00e This patch adds ABI support for v1i128 data type.
It adds v1i128 to the appropriate register classes and checks parameter passing
and return values.

This is related to http://reviews.llvm.org/D9081, which will add instructions
that exploit the v1i128 datatype.

Phabricator review: http://reviews.llvm.org/D9475

llvm-svn: 236503
2015-05-05 16:10:44 +00:00
Daniel Sanders eda60d217b [mips] Generate code for insert/extract operations when using the N64 ABI and MSA.
Summary:
When using the N64 ABI, element-indices use the i64 type instead of i32.
In many cases, we can use iPTR to account for this but additional patterns
and pseudo's are also required.

This fixes most (but not quite all) failures in the test-suite when using
N64 and MSA together.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9342

llvm-svn: 236494
2015-05-05 10:32:24 +00:00
Daniel Sanders 4160c802d9 [mips][msa] Test basic operations for the N32 ABI too.
Summary:
This required adding instruction aliases for dneg.

N64 will be enabled shortly but requires additional bugfixes.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9341

llvm-svn: 236489
2015-05-05 08:48:35 +00:00
Reid Kleckner 9dad227b85 [X86] Fix assertion while DAG combining offsets and ExternalSymbols
ExternalSymbol nodes do not contain offsets, unlike GlobalValue nodes.

llvm-svn: 236471
2015-05-04 23:22:36 +00:00
Sanjay Patel ec2d7358b9 zap windows line endings; NFC
llvm-svn: 236460
2015-05-04 21:27:27 +00:00
Tim Northover 851ff69b42 CodeGen: match up correct insertvalue indices when assessing tail calls.
When deciding whether a value comes from the aggregate or inserted value of an
insertvalue instruction, we compare the indices against those of the location
we're interested in. One of the lists needs reversing because the input data is
backwards (so that modifications take place at the end of the SmallVector), but
we were reversing both before leading to incorrect results.

Should fix PR23408

llvm-svn: 236457
2015-05-04 20:41:51 +00:00
Elena Demikhovsky d41e506342 AVX-512: added a test for encoding
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236421
2015-05-04 12:59:15 +00:00
Elena Demikhovsky 60eb9db7bb AVX-512: added calling convention for i1 vectors in 32-bit mode.
Fixed some bugs in extend/truncate for AVX-512 target.
Removed VBROADCASTM (masked broadcast) node, since it is not used any more.

llvm-svn: 236420
2015-05-04 12:40:50 +00:00
Elena Demikhovsky 52266388f8 AVX-512: added integer "add" and "sub" instructions with saturation for SKX
with intrinsics and tests

by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236418
2015-05-04 12:35:55 +00:00
Elena Demikhovsky 2557a22be7 AVX-512: Added VPACK* instructions forms for KNL and SKX
and their intrinsics
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236414
2015-05-04 09:14:02 +00:00
Elena Demikhovsky 1b60ed7069 Masked gather and scatter intrinsics - enabled codegen for KNL.
llvm-svn: 236394
2015-05-03 07:12:25 +00:00
Simon Pilgrim 017ca19384 [DAGCombiner] Enabled vector float/double -> int constant folding
llvm-svn: 236387
2015-05-02 13:04:07 +00:00
Simon Pilgrim e170a4f5fa Line ending fix
llvm-svn: 236386
2015-05-02 11:50:47 +00:00
Simon Pilgrim 7d6df82dd1 [SSE] Added vector int (i32 and i64) -> float/double conversion tests
llvm-svn: 236385
2015-05-02 11:42:47 +00:00
Simon Pilgrim 6e3b7bad11 [SSE] Added vector float/double -> i32 and i64 conversion tests
llvm-svn: 236384
2015-05-02 11:18:47 +00:00
Eric Christopher a2d44dee73 Rework test to use FileCheck by making sure we have no xmm registers
with numbers.

llvm-svn: 236373
2015-05-02 01:06:17 +00:00
Reid Kleckner 83d89fa546 Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236359. Things are still broken despite testing. :(

llvm-svn: 236360
2015-05-01 22:50:14 +00:00
Reid Kleckner 51476acd77 Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236340.

llvm-svn: 236359
2015-05-01 22:40:25 +00:00
Colin LeMahieu bb0d7cbee1 [Hexagon] r236351 fix does not work on builder configurations yet.
llvm-svn: 236358
2015-05-01 22:39:20 +00:00
Quentin Colombet 0de2346859 [AArch64][FastISel] Variant of the logical instructions that use two input
registers cannot write on SP.

rdar://problem/20748715

llvm-svn: 236352
2015-05-01 21:34:57 +00:00
Colin LeMahieu b662565475 [Hexagon] Adding expression MC emission and removing XFAIL from test that hits this code path.
llvm-svn: 236348
2015-05-01 21:14:21 +00:00
Quentin Colombet 9df2fa261b [AArch64][FastISel] Fix the setting of kill flags for MUL -> UMULH sequences.
rdar://problem/20748715

llvm-svn: 236346
2015-05-01 20:57:11 +00:00
Reid Kleckner 2747d3d55a Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236339, it breaks the win32 clang-cl self-host.

llvm-svn: 236340
2015-05-01 20:14:04 +00:00
Reid Kleckner 4856fc61b4 [WinEH] Add an EH registration and state insertion pass for 32-bit x86
This pass is responsible for constructing the EH registration object
that gets linked into fs:00, which is all it does in this change. In the
future, it will also insert stores to update the EH state number.

I considered keeping this functionality in WinEHPrepare, but it's pretty
separable and X86 specific. It has conceptually very little to do with
the task of WinEHPrepare, which is currently outlining.  WinEHPrepare is
also in theory useful on ARM, but this logic is pretty x86 specific.

Reviewers: andrew.w.kaylor, majnemer

Differential Revision: http://reviews.llvm.org/D9422

llvm-svn: 236339
2015-05-01 20:04:54 +00:00
Peter Collingbourne d27d3a151f ARM: Align functions containing Thumb-2 jump tables to 4 bytes.
Functions with jump tables need an alignment of 4 because they use the ADR
instruction, which aligns the PC to 4 bytes before adding an offset.

Differential Revision: http://reviews.llvm.org/D9424

llvm-svn: 236327
2015-05-01 18:05:59 +00:00
Simon Pilgrim 9fb06bca67 [SelectionDAG] Unary vector constant folding integer legality fixes
This patch fixes issues with vector constant folding not correctly handling scalar input operands if they require implicit truncation - this was tested with llvm-stress as recommended by Patrik H Hagglund.

The patch ensures that integer input scalars from a build vector are correctly truncated before folding, and that constant integer scalar results are promoted to a legal type before inclusion in the new folded build vector.

I have added another crash test case and also a test for UINT_TO_FP / SINT_TO_FP using an non-truncated scalar input, which was failing before this patch.

Differential Revision: http://reviews.llvm.org/D9282

llvm-svn: 236308
2015-05-01 08:20:04 +00:00
Tom Stellard aa798340c3 R600/SI: Add VCC as an implict def of SI_KILL
When SI_KILL has a register operand, its lowered form writes to vcc.

llvm-svn: 236307
2015-05-01 03:44:09 +00:00
Tom Stellard 0b7feb1cb7 R600/SI: Fix verifier errors from the SIAnnotateControlFlow pass
This pass was generating 'Instruction does not dominate all uses!'
errors for programs which had loops with a condition variable that
depended on the result of a phi instruction from outside of the loop.

The pass was inserting new phi nodes outside of the loop which used values
defined inside the loop.

http://bugs.freedesktop.org/show_bug.cgi?id=90056

llvm-svn: 236306
2015-05-01 03:44:08 +00:00
Quentin Colombet 65b5b01d56 [ARM][TEST] Strengthen test against smarter reg alloc.
Follow-up of r236247.

rdar://problem/20770899

llvm-svn: 236296
2015-05-01 00:45:55 +00:00
Pete Cooper 2127b00cd5 [ARM] optimizeSelect should clear kill flags.
If we move an instruction from one block down to a MOVC and predicate it,
then the original instruction could be moved in to a loop.  In this case,
its invalid for any kill flags to remain on there.

Fails with -verfy-machineinstrs.

rdar://problem/20752113

llvm-svn: 236290
2015-04-30 23:57:47 +00:00
Pete Cooper 451755d370 Commute the internal flag on MachineOperands.
When commuting a thumb instruction in the size reduction pass, thumb
instructions are represented as a bundle and so some operands may be marked
as internal.  The internal flag has to move with the operand when commuting.

This test is sensitive to register allocation so can't specifically check that
this error was happening, but so long as it continues to pass with -verify then
hopefully its still ok.

rdar://problem/20752113

llvm-svn: 236282
2015-04-30 23:14:14 +00:00
Quentin Colombet 329fa890ba [AArch64] Fix bad register class constraint in fast-isel for TST instruction.
rdar://problem/20748715

llvm-svn: 236273
2015-04-30 22:27:20 +00:00
Pete Cooper 5111881cfc Don't always apply kill flag in thumb2 ABS pseudo expansion.
The expansion for t2ABS was always setting the kill flag on the rsb instruction.
It should instead only be set on rsb if it was set on the original ABS instruction.

rdar://problem/20752113

llvm-svn: 236272
2015-04-30 22:15:59 +00:00
Reid Kleckner 60d5232be2 [X86] Use 4 byte preferred aggregate alignment on Win32
This helps reduce the frequency of stack realignment prologues in 32-bit
X86 Windows code. Before this change and the corresponding clang change,
we would take the max of the type preferred alignment and the explicit
alignment on the alloca.

If you don't override aggregate alignment in datalayout, you get a
default of 8. This dates back to 2007 / r34356, and changing it seems
prohibitively difficult at this point.

llvm-svn: 236270
2015-04-30 22:11:59 +00:00
Andrea Di Biagio 737a361006 Fix comment in test. NFC.
llvm-svn: 236262
2015-04-30 21:22:28 +00:00
Andrea Di Biagio c84b5bdd69 Fix for PR23103. Correctly propagate the 'IsUndef' flag to the register operands of a commuted instruction.
Revision 220239 exposed a latent bug in method
'TargetInstrInfo::commuteInstruction'. When commuting the operands of a machine
instruction, method 'commuteInstruction' didn't correctly propagate the
'IsUndef' flag to the register operands of the new (commuted) instruction.

Before this patch, the following instruction:
  %vreg4<def> = VADDSDrr  %vreg14, %vreg5<undef>; FR64:%vreg4,%vreg14,%vreg5

was wrongly converted by method 'commuteInstruction' into:
  %vreg4<def> = VADDSDrr  %vreg5, %vreg14<undef>; FR64:%vreg4,%vreg5,%vreg14

The correct instruction should have been:
  %vreg4<def> = VADDSDrr  %vreg5<undef>, %vreg14; FR64:%vreg4,%vreg5,%vreg14

This patch fixes the problem in method 'TargetInstrInfo::commuteInstruction'.
When swapping the operands of a machine instruction, we now make sure that
'IsUndef' flags are correctly set.
Added test case 'pr23103.ll'.

Differential Revision: http://reviews.llvm.org/D9406

llvm-svn: 236258
2015-04-30 21:03:29 +00:00
Pete Cooper 4d8d2ec3eb Don't rewrite jumps to empty BBs to landing pads.
In the test case here, the 'unreachable' BB was removed by BranchFolding because its empty.

It then rewrote the jump from 'entry' to jump to its fallthrough, which was a landing pad.

This results in 'entry' jumping to 2 different landing pads, which fails the machine verifier.

rdar://problem/20750162

llvm-svn: 236248
2015-04-30 18:58:23 +00:00
Quentin Colombet 0a905042cd [ARM] Do not generate invalid encoding for stack adjust, even if this is just
temporary.

Because of that:
1. The machine verifier was complaining on such code.
2. The generate code worked just because the thumb reduction size pass fixed the
opcode.

rdar://problem/20749824

llvm-svn: 236247
2015-04-30 18:52:49 +00:00
Jan Vesely 808fff585b Reinstate revisions r234755, r234759, r234760
changes:
  Don't apply on hexagon and NVPTX since they no longer claim to support UADDO/USUBO
  Add location to getConstant
  Drop comment about the ops being turned into expand

llvm-svn: 236240
2015-04-30 17:15:56 +00:00
Daniel Sanders 59f89aa8ed [mips][msa] Rename main check prefix to 'ALL' in basic operations tests. NFC
Summary:
The majority of the checks are subtarget independent. The few that aren't
will be corrected shortly.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9340

llvm-svn: 236220
2015-04-30 09:57:37 +00:00
Daniel Sanders fa159165be [mips][msa] Use CHECK-LABEL where missing, and remove checks matching the .size directive. NFC.
Summary: 

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9339

llvm-svn: 236219
2015-04-30 09:56:30 +00:00
Daniel Sanders 90b059d555 [mips] Add missing signext attributes to MSA basic operations tests. NFC.
Summary:
This doesn't make much difference to MIPS32, but it will simplify a
MIPS64r6 bugfix which will follow shortly by removing unnecessary
sign-extension of parameters.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9338

llvm-svn: 236216
2015-04-30 09:24:09 +00:00
Simon Pilgrim ecf5875bd5 [SSE] Fix for MUL v16i8 on pre-SSE41 targets (PR23369).
Sign extension of i8 to i16 was placing the unpacked bytes in the lower byte instead of the upper byte.

llvm-svn: 236209
2015-04-30 08:23:16 +00:00
Owen Anderson d8a029c81b Semantically revert r236031, which is not a good idea for in-order targets.
At the least it should be guarded by some kind of target hook.
It also introduced catastrophic compile time and code quality
regressions on some out of tree targets (test case still being
reduced/sanitized).

Sanjay agreed with reverting this patch until these issues can be
resolved.

llvm-svn: 236199
2015-04-30 04:06:32 +00:00
Hans Wennborg 680a60f0d0 XFAIL test/CodeGen/Generic/MachineBranchProb.ll on Hexagon (PR23377)
llvm-svn: 236196
2015-04-30 01:59:04 +00:00
Hans Wennborg 4b828d35fd Switch lowering: use profile info to build weight-balanced binary search trees
This will cause hot nodes to appear closer to the root.

The literature says building the tree like this makes it a near-optimal (in
terms of search time given key frequencies) binary search tree. In LLVM's case,
we can do up to 3 comparisons in each leaf node, so it might be better to opt
for lower tree height in some cases; that's something to look into in the
future.

Differential Revision: http://reviews.llvm.org/D9318

llvm-svn: 236192
2015-04-30 00:57:37 +00:00
Ahmed Bougacha ace0593b42 Flip r236172 testcase RUN option ordering for BSD sed(1). NFC.
llvm-svn: 236186
2015-04-30 00:07:34 +00:00
Pete Cooper 46361a1ea1 Change x86 CMOVE_F to read it source, not write it.
This was breaking sqlite with the machine verifier because operand 0 was a def according to tablegen, but didn't have the 'isDef' flag set.

Looking at the ISA, its clear that this operand is a source as writing to st(0) is implicit.  So move the operand to the correct place in the td file.

rdar://problem/20751584

llvm-svn: 236183
2015-04-29 23:51:33 +00:00
Reid Kleckner bcda1cd45a [WinEH] Start EH preparation for 32-bit x86, it uses no arguments
32-bit x86 MSVC-style exceptions are functionaly similar to 64-bit, but
they take no arguments. Instead, they implicitly use the value of EBP
passed in by the caller as a pointer to the parent's frame. In LLVM, we
can represent this as llvm.frameaddress(1), and feed that into all of
our calls to llvm.framerecover.

The next steps are:
- Add an alloca to the fs:00 linked list of handlers
- Add something like llvm.sjlj.lsda or generalize it to store in the
  alloca
- Move state number calculation to WinEHPrepare, arrange for
  FunctionLoweringInfo to call it
- Use the state numbers to insert explicit loads and stores in the IR

llvm-svn: 236172
2015-04-29 22:49:54 +00:00
Reid Kleckner c695471365 [X86] Avoid mangling frameescape labels
x86 Windows uses the '_' prefix for all global symbols, and this was
mistakenly being applied to frameescape labels, which are not externally
visible global symbols. They use the private global prefix 'L'.

The *right* way to fix this is probably to stop masquerading this label
as an ExternalSymbol and create a new SDNode type. These labels are not
"external", and we know they will be resolved by assembly time. Having a
custom SDNode type would allow us to do better X86 address mode
matching, so it's probably worth doing eventually.

llvm-svn: 236123
2015-04-29 16:46:01 +00:00
Duncan P. N. Exon Smith a9308c49ef IR: Give 'DI' prefix to debug info metadata
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`.  The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.

Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one.  It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs.  YMMV of
course.

Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py.  I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three.  It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).

Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.

llvm-svn: 236120
2015-04-29 16:38:44 +00:00
Vasileios Kalintiris 1249e74648 Mips fast-isel - handle functions which return i8 or i6 .
Summary: Allow Mips fast-isel to handle functions which return i8/i16 signed/unsigned.

Test Plan:
Make check tests are forthcoming.
Already passes test-suite at O0/O2 for Mips 32 r1/r2

Reviewers: dsanders, rkotler

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D6765

llvm-svn: 236103
2015-04-29 14:17:14 +00:00
Daniel Sanders 301f937765 [mips] Correct 128-bit shifts on 64-bit targets.
Summary:
The existing code was correct for 32-bit GPR's but not 64-bit GPR's. It now
accounts for both cases.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits, mohit.bhakkad, sagar

Differential Revision: http://reviews.llvm.org/D9337

llvm-svn: 236099
2015-04-29 12:28:58 +00:00
Tim Northover e18d662201 ARM: fix peephole optimisation of TST
We were trying to look through COPY instructions, but only to the next
instruction in a BB and incorrectly anyway. The cases where that would actually
be a good idea are rare enough (and not even tested!) that it's not worth
trying to get right.

rdar://20721342

llvm-svn: 236050
2015-04-28 22:03:55 +00:00
Andrew Kaylor 046f7b42f2 [WinEH] Split blocks at calls to llvm.eh.begincatch
Differential Revision: http://reviews.llvm.org/D9311

llvm-svn: 236046
2015-04-28 21:54:14 +00:00
Sanjay Patel 2fbc4e5c49 transform fadd chains to increase parallelism
This is a compromise: with this simple patch, we should always handle a chain of exactly 3
operations optimally, but we're not generating the optimal balanced binary tree for a longer
sequence.

In general, this transform will reduce the dependency chain for a sequence of instructions
using N operands from a worst case N-1 dependent operations to N/2 dependent operations. 
The optimal balanced binary tree would reduce the chain to log2(N).

The trade-off for not dealing with longer sequences is: (1) we have less complexity in the
compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we
don't need to worry about the increased register pressure required to parallelize longer
sequences. It also seems unlikely that we would ever encounter really long strings of
dependent ops like that in the wild, but I'm not sure how to verify that speculation.
FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math
and this patch.

We can extend this patch to cover other associative operations such as fmul, fmax, fmin, 
integer add, integer mul.

This is a partial fix for:
https://llvm.org/bugs/show_bug.cgi?id=17305

and if extended:
https://llvm.org/bugs/show_bug.cgi?id=21768
https://llvm.org/bugs/show_bug.cgi?id=23116

The issue also came up in:
http://reviews.llvm.org/D8941

Differential Revision: http://reviews.llvm.org/D9232

llvm-svn: 236031
2015-04-28 21:03:22 +00:00
Tom Stellard 96301d2455 R600: Fix up for AsmPrinter's OutStreamer being a unique_ptr
Fixes a crash with basically any OpenGL application using the radeonsi
driver.

Patch by: Michel Dänzer

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90176
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 236004
2015-04-28 17:37:03 +00:00
Justin Holewinski 3d2a976197 [NVPTX] Handle addrspacecast constant expressions in aggregate initializers
We need to track if an AddrSpaceCast expression was seen when
generating an MCExpr for a ConstantExpr.  This change introduces a
custom lowerConstant method to the NVPTX asm printer that will create
NVPTXGenericMCSymbolRefExpr nodes at the appropriate places to encode
the information that a given symbol needs to be casted to a generic
address.

llvm-svn: 236000
2015-04-28 17:18:30 +00:00
Elena Demikhovsky 1f7b3644d3 Fixed crash of variable shift inst on AVX2
https://llvm.org/bugs/show_bug.cgi?id=22955

llvm-svn: 235993
2015-04-28 14:46:35 +00:00
Elena Demikhovsky ae51853924 AVX-512: Added "pandn" intrinsics set
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 235971
2015-04-28 08:12:42 +00:00
Hans Wennborg 67c03759e4 Switch lowering: Take branch weight into account when ordering for fall-through
Previously, the code would try to put a fall-through case last,
even if that meant moving a case with much higher branch weight
further down the chain.

Ordering by branch weight is most important, putting a fall-through
block last is secondary.

llvm-svn: 235942
2015-04-27 23:35:22 +00:00
Ahmed Bougacha c004c60c0a [AArch64] Also combine vector selects fed by non-i1 SETCCs.
After legalization, scalar SETCC has an i32 result type on AArch64.
The i1 requirement seems too conservative, replace it with an assert.

This also means that we now can run after legalization. That should also
be fine, since the ops legalizer runs again after each combine, and
all types created all have the same sizes as the (legal) inputs.

Exposed by r235917; while there, robustize its tests (bsl also uses the
register it defines).

llvm-svn: 235922
2015-04-27 21:43:12 +00:00
Ahmed Bougacha 89bba61c84 [AArch64] Don't assert when combining (v3f32 select (setcc f64)).
When the setcc has f64 operands, we can't build a vector setcc mask
to feed a vselect, because f64 doesn't divide v3f32 evenly.
Just bail out when that happens.

llvm-svn: 235917
2015-04-27 21:01:20 +00:00
Hans Wennborg ba6d2568f9 Switch lowering: order bit tests by branch weight.
llvm-svn: 235912
2015-04-27 20:21:17 +00:00
Bill Schmidt fe723b9a6d [PPC64LE] Remove unnecessary swaps from lane-insensitive vector computations
This patch adds a new SSA MI pass that runs on little-endian PPC64
code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors
without alignment constraints are accomplished for little-endian using
lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional
xxswapd instructions hurts performance in comparison with big-endian
code, but they are necessary in the general case to support correct
semantics.

However, the general case does not apply to most vector code. Many
vector instructions are lane-insensitive; they do not "care" which
lanes the parallel computations are performed within, provided that
the resulting data is stored into the correct locations. Thus this
pass looks for computations that perform only lane-insensitive
operations, and remove the unnecessary swaps from loads and stores in
such computations.

Future improvements will allow computations using certain
lane-sensitive operations to also be optimized in this manner, by
modifying the lane-sensitive operations to account for the permuted
order of the lanes. However, this patch only adds the infrastructure
to permit this; no lane-sensitive operations are optimized at this
time.

This code is heavily exercised by the various vectorizing applications
in the projects/test-suite tree. For the time being, I have only added
one simple test case to demonstrate what the pass is doing. Although
it is quite simple, it provides coverage for much of the code,
including the special case handling of copies and subreg-to-reg
operations feeding the swaps. I plan to add additional tests in the
future as I fill in more of the "special handling" code.

Two existing tests were affected, because they expected the swaps to
be present, but they are now removed.

llvm-svn: 235910
2015-04-27 19:57:34 +00:00
Elena Demikhovsky a480ef5494 AVX-512: added calling conventions for i1 vectors.
Fixed bug: https://llvm.org/bugs/show_bug.cgi?id=20724

llvm-svn: 235889
2015-04-27 15:11:19 +00:00
Brendon Cahoon 55bdeb7bc7 [Hexagon] Use constant extenders to fix up hardware loops
Use a loop instruction with a constant extender for a hardware
loop instruction that is too far away from the start of the loop.
This is cheaper than changing the SA register value.

Differential Revision: http://reviews.llvm.org/D9262

llvm-svn: 235882
2015-04-27 14:16:43 +00:00
Vasileios Kalintiris 7a6b18783f Reapply "[mips][FastISel] Implement shift ops for Mips fast-isel.""
This reapplies r235194, which was reverted in r235495 because it was causing a
failure in our out-of-tree buildbots for MIPS. With the sign-extension patch
in r235718, this patch doesn't cause any problem any more.

llvm-svn: 235878
2015-04-27 13:28:05 +00:00
Elena Demikhovsky d1084c5b3f AVX-512: Extend/Truncate operations for SKX,
SETCC for bit-vectors

llvm-svn: 235875
2015-04-27 12:57:59 +00:00
Simon Pilgrim 4f683c264a [X86][SSE] Add v16i8/v32i8 multiplication support
Patch to allow int8 vectors to be multiplied on the SSE unit instead of being scalarized.

The patch sign extends the i8 lanes to i16, uses the SSE2 pmullw multiplication instruction, then packs the lower byte from each result.

Differential Revision: http://reviews.llvm.org/D9115

llvm-svn: 235837
2015-04-27 07:55:46 +00:00
Matt Arsenault 957bfc7458 R600: Remove / merge redundant testcases
llvm-svn: 235813
2015-04-26 00:53:33 +00:00
Sanjay Patel 3eb5146b3c add SSE run to check non-AVX codegen
llvm-svn: 235809
2015-04-25 20:41:51 +00:00
Simon Pilgrim aedd3c5160 line endings fix
llvm-svn: 235800
2015-04-25 12:12:43 +00:00
Reid Kleckner cfbfe6f29c [SEH] Implement GetExceptionCode in __except blocks
This introduces an intrinsic called llvm.eh.exceptioncode. It is lowered
by copying the EAX value live into whatever basic block it is called
from. Obviously, this only works if you insert it late during codegen,
because otherwise mid-level passes might reschedule it.

llvm-svn: 235768
2015-04-24 20:25:05 +00:00
David Blaikie 445e3fbc54 [opaque pointer type] Add textual IR support for explicit type parameter to the invoke instruction
Same as r235145 for the call instruction - the justification, tradeoffs,
etc are all the same. The conversion script worked the same without any
false negatives (after replacing 'call' with 'invoke').

llvm-svn: 235755
2015-04-24 19:32:54 +00:00
Sundeep Kushwaha 5d41a6992d [PATCH] [Hexagon] Adding a test case for calling convention.
http://reviews.llvm.org/D9241

llvm-svn: 235754
2015-04-24 19:22:02 +00:00
Yaron Keren de2e2b0214 Teach AArch64\lit.local.cfg the new triple names windows-gnu and windows-msvc.
Tests were failing when built with -DLLVM_DEFAULT_TARGET_TRIPLE=i686-pc-windows-gnu.

llvm-svn: 235733
2015-04-24 17:14:16 +00:00
Hans Wennborg ec679a8b3b Switch lowering: fix APInt overflow causing infinite loop / OOM
llvm-svn: 235729
2015-04-24 16:53:55 +00:00
Reid Kleckner 2c3ccaacb7 [WinEH] Split the landingpad BB instead of cloning it
This means we don't have to RAUW the landingpad instruction and
landingpad BB, which is a nice win.

llvm-svn: 235725
2015-04-24 16:22:19 +00:00
Jingyue Wu 312fd0242d [NVPTX] Emits "generic()" depending on the original address space
Summary:
Fixes a bug in the NVPTX codegen. The code used to miss necessary "generic()"
on aggregates of addrspacecasts.

Test Plan: addrspacecast-gvar.ll

Reviewers: eliben, jholewinski

Reviewed By: jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D9130

llvm-svn: 235689
2015-04-24 02:57:30 +00:00
Matt Arsenault 5e10016f03 R600/SI: Fix verifier error when producing v_madmk_f32
Copy the kill flags when swapping the operands.

llvm-svn: 235687
2015-04-24 01:57:58 +00:00
Matthias Braun e1a67412cf R600/RegisterCoalescer: Enable more rematerialization/add missing testcase
This enables the rematerialization of some R600 MOV instructions in the
RegisterCoalescer and adds a testcase for r235668.

llvm-svn: 235675
2015-04-24 00:25:50 +00:00
Reid Kleckner 5c5facc2ce Re-commit "[SEH] Remove the old __C_specific_handler code now that WinEHPrepare works"
This reverts commit r235617.

r235649 should have addressed the problems.

llvm-svn: 235667
2015-04-23 23:22:33 +00:00
Hal Finkel d86e90abdd [PowerPC] Use sync inst alias when printing
So long as the choice between printing msync and sync is not ambiguous, we can
print 'sync 0' and just 'sync'.

llvm-svn: 235663
2015-04-23 23:05:08 +00:00
Tom Stellard ff5cf0e1fd R600: Correctly lower CONCAT_VECTOR nodes with more than 2 operands
llvm-svn: 235662
2015-04-23 22:59:24 +00:00
Andrew Kaylor 20ae2a311f [WinEH] Ignore filter clauses while mapping landing pad blocks.
llvm-svn: 235656
2015-04-23 22:38:36 +00:00
Reid Kleckner e3af86e9d9 [WinEH] Replace more lpad value uses with undef
We were asserting on code like this:
  extern "C" unsigned long _exception_code();
  void might_crash(unsigned long);
  void foo() {
    __try {
      might_crash(0);
    } __except(1) {
      might_crash(_exception_code());
    }
  }

Gtest and many other libraries get the exception code from the __except
block. What's supposed to happen here is that EAX is live into the
__except block, and it contains the exception code. Eventually we'll
represent that as a use of the landingpad ehptr value, but for now we
can replace it with undef.

llvm-svn: 235649
2015-04-23 21:22:30 +00:00
Quentin Colombet 796d906e06 [MachineCopyPropagation] Handle undef flags conservatively so that we do not
remove copies that are useful after breaking some hardware dependencies.
In other words, handle this kind of situations conservatively by assuming reg2
is redefined by the undef flag.
reg1 = copy reg2
= inst reg2<undef>
reg2 = copy reg1
Copy propagation used to remove the last copy.
This is incorrect because the undef flag on reg2 in inst, allows next
passes to put whatever trashed value in reg2 that may help.
In practice we end up with this code:
reg1 = copy reg2
reg2 = 0
= inst reg2<undef>
reg2 = copy reg1

This fixes PR21743.

llvm-svn: 235647
2015-04-23 21:17:39 +00:00
Tom Stellard 8b0182af2f R600/SI: Fix indirect addressing with a negative constant offset
When the base register index of the vector plus the constant offset
was less than zero, we were passing the wrong base register to the indirect
addressing instruction.

In this case, we need to set the base register to v0 and then add
the computed (negative) index to m0.

llvm-svn: 235641
2015-04-23 20:32:01 +00:00
Peter Collingbourne 167668f8c8 Thumb2: When applying branch optimizations, visit branches in reverse order.
The order in which branches appear in ImmBranches is approximately their
order within the function body. By visiting later branches first, we reduce
the distance between earlier forward branches and their targets, making it
more likely that the cbn?z optimization, which can only apply to forward
branches, will succeed for those earlier branches.

Differential Revision: http://reviews.llvm.org/D9185

llvm-svn: 235640
2015-04-23 20:31:35 +00:00
Peter Collingbourne cfee5b04bc ARM: When re-creating a branch via InsertBranch, preserve CPSR flags.
In particular, this preserves the kill flag, which allows the Thumb2 cbn?z
optimization to be applied in cases where a branch has been re-created after
the live variables analysis pass, e.g. by the machine block placement pass.

This appears to be low risk; a number of other targets seem to already be
doing something similar, e.g. AArch64, PowerPC.

Differential Revision: http://reviews.llvm.org/D9184

llvm-svn: 235639
2015-04-23 20:31:32 +00:00
Peter Collingbourne 6529523151 Thumb2: When optimizing for size, do not if-convert branches involving comparisons with zero.
This allows the constant island pass to lower these branches to cbn?z
instructions, resulting in a shorter instruction sequence.

Differential Revision: http://reviews.llvm.org/D9183

llvm-svn: 235638
2015-04-23 20:31:30 +00:00
Peter Collingbourne 78f1ecc59c ARM: When spilling extra registers for alignment, prefer low registers on all Thumb targets.
This makes it more likely that we can use the 16-bit push and pop instructions
on Thumb-2, saving around 4 bytes per function.

Differential Revision: http://reviews.llvm.org/D9165

llvm-svn: 235637
2015-04-23 20:31:26 +00:00
Peter Collingbourne 1213918bf4 ARM: Only enforce 4-byte alignment on Thumb-2 functions with constant pools.
This appears to have been introduced back in r76698 as part of an unrelated
change. I can find no official ARM documentation stating that Thumb-2 functions
require 4-byte alignment; in fact, ARM documentation appears to contradict
this (see, e.g., ARM Architecture Reference Manual Thumb-2 Supplement,
section 2.6.1: "Thumb-2 enforces 16-bit alignment on all instructions.").

Also remove code that sets alignment for ARM functions, which is redundant
with code in the MachineFunction constructor, and remove the hidden
-arm-align-constant-islands flag, which has been enabled by default since
r146739 (Dec 2011) and has probably received sufficient testing by now.

Differential Revision: http://reviews.llvm.org/D9138

llvm-svn: 235636
2015-04-23 20:31:22 +00:00
Reid Kleckner 909ea7e6b8 Revert "[SEH] Remove the old __C_specific_handler code now that WinEHPrepare works"
We still have some "uses remain after removal" issues in -O0 builds.

This reverts commit r235557.

llvm-svn: 235617
2015-04-23 18:34:01 +00:00
Hal Finkel 7c5cb066d0 [PowerPC] Enable printing instructions using aliases
TableGen had been nicely generating code to print a number of instructions using
shorter aliases (and PowerPC has plenty of short mnemonics), but we were not
calling it. For some of the aliases we support in the parser, TableGen can't
infer the "inverse" alias relationship, so there is still more to do.

Thus, after some hours of updating test cases...

llvm-svn: 235616
2015-04-23 18:30:38 +00:00
Pirama Arumuga Nainar 745615ca00 [AArch64] Add nvcast patterns for v4f16 and v8f16
Summary:
Constant stores of f16 vectors can create NvCast nodes from various
operand types to v4f16 or v8f16 depending on patterns in the stored
constants.  This patch adds nvcast rules with v4f16 and v8f16 values.

AArchISelLowering::LowerBUILD_VECTOR has the details on which constant
patterns generate the nvcast nodes.

Reviewers: jmolloy, srhines, ab

Subscribers: rengolin, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D9201

llvm-svn: 235610
2015-04-23 17:32:25 +00:00
Pirama Arumuga Nainar b18815354d [AArch64] Handle vec4, vec8, vec16 *itofp for half
Summary:
Set operation action for SINT_TO_FP and UINT_TO_FP nodes with v4i32,
v8i8, v8i16 inputs to allow promotion of v4f16 results.

Add tests for sitofp and uitofp for vec4, vec8, vec16, and i8, i16, i32,
and i64 vectors.  Only missing tests are for v16i8 and v16i16 as the
shift operations are too complicated to write a proper check sequence.

The conversions from v4i64 to v4f16 do not depend on this patch - v4i64
is split and the conversion gets handled while lowering v2i64.  I am
adding a test here for completeness.

Reviewers: aemerson, rengolin, ab, jmolloy, srhines

Subscribers: rengolin, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D9166

llvm-svn: 235609
2015-04-23 17:16:27 +00:00
Hans Wennborg 0867b151c9 Re-commit r235560: Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
Third time's the charm. The previous commit was reverted as a
reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--'
on an iterator at the beginning of a vector, causing asserts
when using debugging iterators. This commit fixes that.

llvm-svn: 235608
2015-04-23 16:45:24 +00:00
Sanjay Patel f4b0f07430 use update_llc_test_checks.py to tighten checking; remove unnecessary CPU param
llvm-svn: 235604
2015-04-23 16:07:50 +00:00
Krzysztof Parzyszek 876a19d855 [Hexagon] Shrink-wrap stack frame (Hexagon-specific)
llvm-svn: 235603
2015-04-23 16:05:39 +00:00
Krzysztof Parzyszek a17cebd219 [Hexagon] Add testcases for stack alignment and variable-sized objects
llvm-svn: 235602
2015-04-23 15:12:49 +00:00
Aaron Ballman 0be238cebd Revert r235560; this commit was causing several failed assertions in Debug builds using MSVC's STL. The iterator is being used outside of its valid range.
llvm-svn: 235597
2015-04-23 13:41:59 +00:00
Simon Pilgrim 86b034bae9 [DAGCombiner] Remove extra bitcasts surrounding vector shuffles
Patch to remove extra bitcasts from shuffles, this is often a legacy of XformToShuffleWithZero being used to combine bitmaskings (of float vectors bitcast to integer vectors) into shuffles: bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1)

Differential Revision: http://reviews.llvm.org/D9097

llvm-svn: 235578
2015-04-23 08:43:13 +00:00
Andrew Kaylor 86e67f7ebc [WinEH] Removing seh-filter.ll until I can determine its validity
llvm-svn: 235566
2015-04-23 00:38:22 +00:00
Andrew Kaylor 43e1d76278 [WinEH] Don't skip landing pads that end with an unreachable instruction.
llvm-svn: 235563
2015-04-23 00:20:44 +00:00
Hans Wennborg 15823d49b6 Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
This is a re-commit of r235101, which also fixes the problems with the previous patch:

- Switches with only a default case and non-fallthrough were handled incorrectly

- The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here.

> This is a major rewrite of the SelectionDAG switch lowering. The previous code
> would lower switches as a binary tre, discovering clusters of cases
> suitable for lowering by jump tables or bit tests as it went along. To increase
> the likelihood of finding jump tables, the binary tree pivot was selected to
> maximize case density on both sides of the pivot.
>
> By not selecting the pivot in the middle, the binary trees would not always
> be balanced, leading to performance problems in the generated code.
>
> This patch rewrites the lowering to search for clusters of cases
> suitable for jump tables or bit tests first, and then builds the binary
> tree around those clusters. This way, the binary tree will always be balanced.
>
> This has the added benefit of decoupling the different aspects of the lowering:
> tree building and jump table or bit tests finding are now easier to tweak
> separately.
>
> For example, this will enable us to balance the tree based on profile info
> in the future.
>
> The algorithm for finding jump tables is quadratic, whereas the previous algorithm
> was O(n log n) for common cases, and quadratic only in the worst-case. This
> doesn't seem to be major problem in practice, e.g. compiling a file consisting
> of a 10k-case switch was only 30% slower, and such large switches should be rare
> in practice. Compiling e.g. gcc.c showed no compile-time difference.  If this
> does turn out to be a problem, we could limit the search space of the algorithm.
>
> This commit also disables all optimizations during switch lowering in -O0.
>
> Differential Revision: http://reviews.llvm.org/D8649

llvm-svn: 235560
2015-04-22 23:14:56 +00:00
Reid Kleckner 64a2a6a473 [SEH] Remove the old __C_specific_handler code now that WinEHPrepare works
This removes the -sehprepare flag and makes __C_specific_handler
functions always to use WinEHPrepare.

This was tested by building all of chromium_builder_tests and running a
few tests that use SEH, but if something breaks, we can revert this.

llvm-svn: 235557
2015-04-22 22:13:09 +00:00
Krzysztof Parzyszek 952d951418 [Hexagon] Some cleanup of instruction selection code
llvm-svn: 235552
2015-04-22 21:17:00 +00:00
Reid Kleckner fd7df284b8 [WinEH] Demote values and phis live across exception handlers up front
In particular, this handles SSA values that are live *out* of a handler.
The existing code only handles values that are live *in* to a handler.

It also handles phi nodes in the block where normal control should
resume after the end of a catch handler.  When EH return points have phi
nodes, we need to split the return edge. It is impossible for phi
elimination to emit copies in the previous block if that block gets
outlined. The indirectbr that we leave in the function is only notional,
and is eliminated from the MachineFunction CFG early on.

Reviewers: majnemer, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D9158

llvm-svn: 235545
2015-04-22 21:05:21 +00:00
Krzysztof Parzyszek cd97c985c7 [Hexagon] Use A2_tfrsi for constant pool and jump table addresses
llvm-svn: 235535
2015-04-22 18:25:53 +00:00
Pirama Arumuga Nainar 67e82482c0 Fix correctness check for test_vec_fpextend_double
Summary:
Remove the CHECK-DAG calls introduced in r235341, and add a comment that
this test may break due to scheduling variations.

This patch completes the fix discussed in http://reviews.llvm.org/D8804

Reviewers: dsanders, srhines

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9178

llvm-svn: 235530
2015-04-22 18:04:12 +00:00
Matt Arsenault deaef8e24b R600: Fix always inline pass breaking noinline functions
No test since calls are not actually supported yet.

llvm-svn: 235524
2015-04-22 17:10:44 +00:00
Sanjay Patel cab567873f [x86] Add store-folded memop patterns for vcvtps2ph
Differential Revision: http://reviews.llvm.org/D7296

llvm-svn: 235517
2015-04-22 16:11:19 +00:00
Andrea Di Biagio 6cd2f42fac [X86][AVX] Fix failure due to a missing ISel pattern to select VBROADCAST nodes (PR23259).
This fixes a regression introduced at revision 218263.

On AVX, if we optimize for size, a splat build_vector of a load
is lowered into a VBROADCAST node. This is done even if the value type of the
splat build_vector node is v2i64.

Since AVX doesn't support v2f64/v2i64 broadcasts, revision 218263 added two
extra tablegen patterns to allow selecting a VMOVDDUPrm from an X86VBroadcast
where the scalar element comes from a loadi64/loadf64.

However, revision 218263 forgot to add an extra fallback pattern for the case
where we have a X86VBroadcast of a loadi64 with multiple uses.

This patch adds the missing tablegen pattern in X86InstrSSE.td.
This patch also adds an extra test to 'splat-for-size.ll' to verify that ISel
doesn't crash with a 'fatal error in the backend' due to a missing AVX pattern
to select v2i64 X86ISD::BROADCAST nodes.

llvm-svn: 235509
2015-04-22 14:53:39 +00:00
Hal Finkel 0d49cf2645 [DAGCombine] Disable select(c, load,load) for indexed loads
This turned up after r235333, but was a pre-existing bug. The optimization
which transforms select(c, load, load) into a load of a select of the addresses
does not handle indexed loads (pre/post inc/dec). However, it did not check for
them either, leading to a crash if it tried to transform one of them.

llvm-svn: 235497
2015-04-22 11:32:25 +00:00
Vasileios Kalintiris e7508c9fc7 Revert "[mips][FastISel] Implement shift ops for Mips fast-isel."
This reverts commit r235194. It was causing a failure in FastISel buildbots
due to sign-extension issues.

llvm-svn: 235495
2015-04-22 10:08:46 +00:00
James Molloy cd2334e86e [AArch64] Disable complex GEP optimization by default.
Enough concerns were raised that this optimization is pessimising some code patterns.

The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher.

llvm-svn: 235491
2015-04-22 09:11:38 +00:00
Lang Hames 65613a634a [patchpoint] Add support for symbolic patchpoint targets to SelectionDAG and the
X86 backend.

The code generated for symbolic targets is identical to the code generated for
constant targets, except that a relocation is emitted to fix up the actual
target address at link-time. This allows IR and object files containing
patchpoints to be cached across JIT-invocations where the target address may
change.

llvm-svn: 235483
2015-04-22 06:02:31 +00:00
Sanjay Patel fe1365ac50 [x86] allow 64-bit extracted vector element integer stores on a 32-bit system
With SSE2, we can generate a 'movq' or other 64-bit store op on a 32-bit system
even though 64-bit integers are not legal types.

So instead of producing this:

  pshufd	$229, %xmm0, %xmm1      ## xmm1 = xmm0[1,1,2,3]
  movd	%xmm0, (%eax)
  movd	%xmm1, 4(%eax)

We can do:

  movq %xmm0, (%eax)

This is a fix for the problem noted in D7296.

Differential Revision: http://reviews.llvm.org/D9134

llvm-svn: 235460
2015-04-22 00:24:30 +00:00
Reid Kleckner f14787dad8 [WinEH] Correctly handle inlined __finally blocks with captures
We should also teach the inliner to collapse framerecover of
frameaddress of the current frame down to an alloca, but that can happen
later.

llvm-svn: 235459
2015-04-22 00:07:52 +00:00
Krzysztof Parzyszek 499bc5faa1 [Hexagon] Patterns for frame index with offset for isel
llvm-svn: 235418
2015-04-21 21:28:03 +00:00
Reid Kleckner d2a1a51996 Re-land r235154-r235156 under the existing -sehprepare flag
Keep the old SEH fan-in lowering on by default for now, since projects
rely on it.  This will make it easy to test this change with a simple
flag flip.

llvm-svn: 235399
2015-04-21 18:23:57 +00:00
Matthias Braun 9e9e8b3230 X86: Match for X86ISD nodes in LowerBUILD_VECTOR instead of BUILD_VECTORCombine
There doesn't seem to be a reason to perform this target ISD node matching
in an DAGCombine, moving it to lowering fixes PR23296.

Differential Revision: http://reviews.llvm.org/D9137

llvm-svn: 235394
2015-04-21 17:21:36 +00:00
Vasileios Kalintiris 32177d6bec [mips] Optimize code generation for 64-bit variable shift instructions.
Summary:
The 64-bit version of the variable shift instructions uses the
shift_rotate_reg class which uses a GPR32Opnd to specify the variable
shift amount. With this patch we avoid the generation of a redundant
SLL instruction for the variable shift instructions in 64-bit targets.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7413

llvm-svn: 235376
2015-04-21 10:49:03 +00:00
Elena Demikhovsky 50b88ddb87 AVX-512: Added logical and arithmetic instructions for SKX
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 235375
2015-04-21 10:27:40 +00:00
Simon Pilgrim 398ce22b86 [X86][SSE] Provide execution domains for scalar floating point operations
This is an updated version of Chandler's patch D7402 that got accepted but never committed, and has bit-rotted a bit since.

I've updated the execution domain declarations to match the approach of the packed templates and also added some extra scalar unary tests.

Differential Revision: http://reviews.llvm.org/D9095

llvm-svn: 235372
2015-04-21 08:40:22 +00:00
Simon Pilgrim 860f08779c CONCAT_VECTOR of BUILD_VECTOR - minor fix
Fixed issue with the combine of CONCAT_VECTOR of 2 BUILD_VECTOR nodes - the optimisation wasn't ensuring that the scalar operands of both nodes were the same type/size for implicit truncation.

Test case spotted by Patrik Hagglund

llvm-svn: 235371
2015-04-21 08:05:43 +00:00
Pawel Bylica 57c2f7c756 Fix generic shift expansion when shift amount is 0
Summary:
This fixes http://llvm.org/bugs/show_bug.cgi?id=16439. 

This is one possible way to approach this. The other would be to split InL>>(nbits-Amt) into (InL>>(nbits-1-Amt))>>1, which is also valid since since we only need to care about Amt up nbits-1. It's hard to tell which one is better since the shift might be expensive if this stage of expansion is not yet a legal machine integer, whereas comparisons with zero are relatively cheap at all sizes, but more expensive than a shift if the shift is on a legal machine type. 

Patch by Keno Fischer!

Test Plan: regression test from http://reviews.llvm.org/D7752

Reviewers: chfast, resistor

Reviewed By: chfast, resistor

Subscribers: sanjoy, resistor, chfast, llvm-commits

Differential Revision: http://reviews.llvm.org/D4978

llvm-svn: 235370
2015-04-21 06:28:36 +00:00
Matthias Braun b6b5aaad98 X86: Do not select X86 custom vector nodes if operand types don't match
X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected
if the operand types do not match the result type because vector type
legalization cannot deal with this for custom nodes.

Testcase X86ISD::ADDSUB is attached. I could not create a testcase for
the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296

Differential Revision: http://reviews.llvm.org/D9120

llvm-svn: 235367
2015-04-21 01:13:41 +00:00
Pirama Arumuga Nainar 80f958dbf4 Fix flakiness in fp16-promote.ll
Summary:
In the f16-promote test, make the checks for native conversion instructions
similar to the libcall checks:
- Remove hard coded register names
- Do not check exact instruction sequences.

This fixes test flakiness due to non-determinism in instruction
scheduling and register allocation.  I also fixed a few minor things in
the CHECK-LIBCALL checks.

I'll try to find a way to check that unnecessary loads, stores, or
conversions don't happen.

Reviewers: mzolotukhin, srhines, ab

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9112

llvm-svn: 235363
2015-04-20 23:54:41 +00:00
Sanjay Patel 362f89cd46 use update_llc_test_checks.py to tighten checking
Also, replace win and linux runs with a generic run because that
makes no difference in what this test is checking.

llvm-svn: 235361
2015-04-20 23:31:53 +00:00
Andrew Kaylor 41758517bf [WinEH] Fix problem with mapping shared empty handler blocks.
Differential Revision: http://reviews.llvm.org/D9125

llvm-svn: 235354
2015-04-20 22:04:09 +00:00
Olivier Sallenave b99c2eb0f0 Refactoring and enhancement to FMA combine.
llvm-svn: 235344
2015-04-20 20:29:40 +00:00
Andrew Kaylor 3ae6251ceb Fixing line endings
llvm-svn: 235342
2015-04-20 20:27:28 +00:00
Pirama Arumuga Nainar 34056dea1b [MIPS] OperationAction for FP_TO_FP16, FP16_TO_FP
Summary:
Set operation action for FP16 conversion opcodes, so the Op legalizer
can choose the gnu_* libcalls for Mips.

Set LoadExtAction and TruncStoreAction for f16 scalars and vectors to
prevent (fpext (load )) and (store (fptrunc)) from getting combined into
unsupported operations.

Added test cases to test that these operations are handled correctly
for f16 scalars and vectors.  This patch depends on
http://reviews.llvm.org/D8755.

Reviewers: srhines

Subscribers: llvm-commits, ab

Differential Revision: http://reviews.llvm.org/D8804

llvm-svn: 235341
2015-04-20 20:15:36 +00:00
Tom Stellard 69a7b91e95 DAGCombine: Remove redundant NaN checks around ISD::FSQRT
This folds:

(select (setcc x, -0.0, *lt), NaN, (fsqrt x)) -> ( fsqrt x)

llvm-svn: 235333
2015-04-20 19:38:27 +00:00
Andrea Di Biagio 98c367093d [X86][FastIsel] Fix assertion failure when selecting int-to-double conversion (PR23273).
This fixes a regression introduced at revision 231243.
The target-independent selection algorithm in FastISel knows how to select
a SINT_TO_FP if the target is SSE but not AVX. That is because on X86, the
tablegen'd 'fastEmit' functions know how to select CVTSI2SSrr and CVTSI2SDrr.

Method X86FastISel::X86SelectSIToFP was therefore working under the
wrong assumption that the target was AVX. That assumption was incorrect since
we can have a target that is neither AVX nor SSE.

So, rather than asserting for the presence of AVX, we should have had an
early exit from 'X86SelectSIToFP' if the target was not AVX.
This patch fixes the issue replacing the invalid assertion with an early exit.

Thanks to Dimitry Andric for reporting this problem and for providing a small
reproducible testcase. Added test pr23273.ll.

llvm-svn: 235295
2015-04-20 11:56:59 +00:00
Hal Finkel 1e5733bbed [InlineAsm] Remove EarlyClobber on registers that are also inputs
When an inline asm call has an output register marked as early-clobber, but
that same register is also an input operand, what should we do? GCC accepts
this, and is documented to accept this for read/write operands saying,
"Furthermore, if the earlyclobber operand is also a read/write operand, then
that operand is written only after it's used." For write-only operands, the
situation seems less clear, but I have at least one existing codebase that
assumes this will work, in part because it has syscall macros like this:

({                                                                         \
  register uint64_t r0 __asm__ ("r0") = (__NR_ ## name);                   \
  register uint64_t r3 __asm__ ("r3") = ((uint64_t) (arg0));               \
  register uint64_t r4 __asm__ ("r4") = ((uint64_t) (arg1));               \
  register uint64_t r5 __asm__ ("r5") = ((uint64_t) (arg2));               \
  __asm__ __volatile__                                                     \
  ("sc"                                                                    \
   : "=&r"(r0),"=&r"(r3),"=&r"(r4),"=&r"(r5)                               \
   :   "0"(r0),  "1"(r3),  "2"(r4),  "3"(r5)                               \
   : "r6","r7","r8","r9","r10","r11","r12","cr0","memory");                \
  r3;                                                                      \
})

Furthermore, with register aliases and subregister relationships that only the
backend knows about, rejecting this in the frontend seems like a difficult
proposition (if we wanted to do so). However, keeping the early-clobber flag on
the INLINEASM MI does not work for us, because it will cause the register's
live interval to end to soon (so it will not appear defined to be used as an
input).

Fortunately, fixing this does not seem hard: When forming the INLINEASM MI,
check to see if any of the early-clobber outputs are also inputs, and if so,
remove the early-clobber flag.

llvm-svn: 235283
2015-04-20 00:01:30 +00:00
Simon Pilgrim 749953eebb [X86][SSE] Fix for getScalarValueForVectorElement to detect scalar sources requiring truncation.
The fix ensures that scalar sources inserted into a vector are the correct bit size.

Integer scalar sources from BUILD_VECTOR and SCALAR_TO_VECTOR nodes may require truncation that this function doesn't currently support.

llvm-svn: 235281
2015-04-19 22:16:49 +00:00
Simon Pilgrim 4c107b5258 [X86][SSE] Extended copysign tests to include llvm intrinsic implementation and constant folding.
llvm-svn: 235279
2015-04-19 21:34:57 +00:00
Simon Pilgrim 6292d50eda [X86][AVX2] Force execution domain on broadcast folding tests.
llvm-svn: 235260
2015-04-18 21:24:16 +00:00
Simon Pilgrim e68c8606ed [X86][SSE] Force execution domain on float/double unpack shuffle tests.
llvm-svn: 235259
2015-04-18 18:50:55 +00:00
Ahmed Bougacha 279e3ee954 [GlobalMerge] Look at uses to create smaller global sets.
Instead of merging everything together, look at the users of
GlobalVariables, and try to group them by function, to create
sets of globals used "together".

Using that information, a less-aggressive alternative is to keep merging
everything together *except* globals that are only ever used alone, that
is, those for which it's clearly non-profitable to merge with others.

In my testing, grouping by Function is too aggressive, but grouping by
BasicBlock is too conservative.  Anything in-between isn't trivially
available, so stick with Function grouping for now.

cl::opts are added for testing; both enabled by default.

A few of the testcases aren't testing the merging proper, but just
various edge cases when merging does occur.  Update them to use the
previous grouping behavior. Also, one of the tests is unrelated to
GlobalMerge; change it accordingly.
While there, switch to r234666' flags rather than the brutal -O3.

Differential Revision: http://reviews.llvm.org/D8070

llvm-svn: 235249
2015-04-18 01:21:58 +00:00
Ahmed Bougacha e14a4d487e [AArch64] Don't force MVT::Untyped when selecting LD1LANEpost.
The result is either an Untyped reg sequence, on ldN with N > 1, or
just the type of the input vector, on ld1.  Don't force Untyped.
Instead, just use the type of the reg sequence.

This mirrors the behavior of createTuple, which feeds the LD1*_POST.

The narrow code path wasn't actually covered by tests, because V64
insert_vector_elt are widened to V128 before the LD1LANEpost combine
has the chance to run, usually.

The only case where it does run on V64 vectors is if the vector ops
legalizer ran.  So, tickle the code with a ctpop.

Fixes PR23265.

llvm-svn: 235243
2015-04-17 23:43:33 +00:00
Ahmed Bougacha 28fc7ca53f Fix another typo in r235224 testcase. NFC.
Third time's the charm!

llvm-svn: 235242
2015-04-17 23:38:46 +00:00
Andrew Kaylor ea8df61d4d [WinEH] Fixes for a few cppeh failures.
Differential Review: http://reviews.llvm.org/D9065

llvm-svn: 235239
2015-04-17 23:05:43 +00:00
Pete Cooper 2bbbd8b543 AArch64: Add test for returning [2 x i64] in registers. NFC.
llvm-svn: 235228
2015-04-17 21:31:25 +00:00
Ahmed Bougacha 84b801f6b4 Fix typo in r235224 testcase. NFC.
llvm-svn: 235226
2015-04-17 21:11:58 +00:00
Ahmed Bougacha 2448ef5f33 [AArch64] Avoid vector->load dependency cycles when creating LD1*post.
They would break the SelectionDAG.
Note that the opposite load->vector dependency is already obvious in:
  (LD1*post vec, ..)

llvm-svn: 235224
2015-04-17 21:02:30 +00:00
Pirama Arumuga Nainar db7c07e2bf Add support to promote f16 to f32
Summary:
This patch adds legalization support to operate on FP16 as a load/store type
and do operations on it as floats.

Tests for ARM are added to test/CodeGen/ARM/fp16-promote.ll

Reviewers: srhines, t.p.northover

Differential Revision: http://reviews.llvm.org/D8755

llvm-svn: 235215
2015-04-17 18:36:25 +00:00
Vasileios Kalintiris 816ea84e7a [mips][FastISel] Implement FastMaterializeAlloca in Mips fast-isel.
Summary: Implement the method FastMaterializeAlloca in Mips fast-isel

Based on a patch by Reed Kotler.

Test Plan:
Passes test-suite at O0/O2 for mips32 r1/r2
fastalloca.ll

Reviewers: dsanders, rkotler

Subscribers: rfuhler, llvm-commits

Differential Revision: http://reviews.llvm.org/D6742

llvm-svn: 235213
2015-04-17 17:29:58 +00:00
Sanjay Patel 2161c49a4e [X86, AVX] add an exedepfix entry for vmovq == vmovlps == vmovlpd
This is the AVX extension of r235014:
http://llvm.org/viewvc/llvm-project?view=revision&revision=235014

Review:
http://reviews.llvm.org/D8691

llvm-svn: 235210
2015-04-17 17:02:37 +00:00
Vasileios Kalintiris a4035e6284 [mips][FastISel] Implement shift ops for Mips fast-isel.
Summary:
Add shift operators implementation to fast-isel for Mips.  These are shift ops
for non legal forms, i.e. i8 and i16.

Based on a patch by Reed Kotler.

Test Plan:

Reviewers: dsanders

Subscribers: echristo, rfuhler, llvm-commits

Differential Revision: http://reviews.llvm.org/D6726

llvm-svn: 235194
2015-04-17 14:29:21 +00:00
James Molloy a4ff7b2713 Fix TRUNCATE splitting helper logic.
This is a followon to r233681 - I'd misunderstood the semantics of FTRUNC,
and had confused it with (FP_ROUND ..., 0).

Thanks for Ahmed Bougacha for his post-commit review!

llvm-svn: 235191
2015-04-17 13:51:40 +00:00
Vasileios Kalintiris bb60cfb5c4 [mips] Teach the delay slot filler to remove needless KILL instructions.
Summary:
Previously, the presence of KILL instructions would block valid candidates
from filling a specific delay slot. With the elimination of the KILL
instructions, in the appropriate range, we are able to fill more slots and
keep the information from future def/use analysis consistent.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D7724

llvm-svn: 235183
2015-04-17 12:01:02 +00:00
Nico Weber a762fa6c98 Revert r235154-r235156, they cause asserts when building win64 code (http://crbug.com/477988)
llvm-svn: 235170
2015-04-17 09:10:43 +00:00
Reid Kleckner 56cc879ac1 Fix test failure due to racing commits
It looks like r235145 changed the .ll syntax for variadic calls. Update
tests to use the new syntax.

llvm-svn: 235156
2015-04-17 01:09:53 +00:00
Reid Kleckner d4523e3c51 [SEH] Reimplement x64 SEH using WinEHPrepare
This now emits simple, unoptimized xdata tables for __C_specific_handler
based on the handlers listed in @llvm.eh.actions calls produced by
WinEHPrepare.

This adds support for running __finally blocks when exceptions are
thrown, and removes the old landingpad fan-in codepath.

I ran some manual execution tests on small basic test cases with and
without optimization, as well as on Chrome base_unittests, which uses a
small amount of SEH.  I'm sure there are bugs, and we may need to
revert.

llvm-svn: 235154
2015-04-17 01:01:27 +00:00
Ahmed Bougacha 941420d9ea [AArch64] Don't assert on f16 in DUP PerfectShuffle generator.
Found by code inspection, but breaking i16 at least breaks other tests.
They aren't checking this in particular though, so also add some
explicit tests for the already working types.

llvm-svn: 235148
2015-04-16 23:57:07 +00:00