Commit Graph

15568 Commits

Author SHA1 Message Date
Andrey Turetskiy 958eb46443 [X86] Introduce Lakemont CPU.
Add a new Intel MCU CPU Lakemont, which doesn't support X87.

Differential Revision: http://reviews.llvm.org/D18650

llvm-svn: 265128
2016-04-01 10:16:15 +00:00
Sean Silva 3de5ef96c9 Improve CHECK-NOT robustness of dllexport tests
This changes some dllexport tests, to verify that some symbols that
should not be exported are not, in a way that improves the robustness
of CHECK-SAME interaction with CHECK-NOT.

We plan to enable dllimport/dllexport support for the PS4, and these
changes are for points we noticed in our internal testing.

Patch by Warren Ristow!

llvm-svn: 265106
2016-04-01 03:54:03 +00:00
Sanjoy Das 9d41a8f269 Don't use an i64 return type with webkit_jscc
Re-enable an assertion enabled by Justin Lebar in rL265092.  rL265092
was breaking test/CodeGen/X86/deopt-intrinsic.ll because webkit_jscc
does not like non-i64 return types.  Change the test case to not do
that.

llvm-svn: 265099
2016-04-01 02:51:21 +00:00
Chuang-Yu Cheng 35c6181982 Fix Sub-register Rewriting in Aggressive Anti-Dependence Breaker
Previously, HandleLastUse would delete RegRef information for sub-registers
if they were dead even if their corresponding super-register were still live.

If the super-register were later renamed, then the definitions of the
sub-register would not be updated appropriately. This patch alters the
behavior so that RegInfo information for sub-registers is only deleted when
the sub-register and super-register are both dead.

This resolves PR26775. This is the mirror image of Hal's r227311 commit.

Author: Tom Jablin (tjablin)
Reviewers: kbarton uweigand nemanjai hfinkel

http://reviews.llvm.org/D18448

llvm-svn: 265097
2016-04-01 02:05:29 +00:00
Justin Lebar efcc81cbb4 [NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect.
Summary:
Previously the NVVMReflect pass would read its configuration from
command-line flags or a static configuration given to the pass at
instantiation time.

This doesn't quite work for clang's use-case.  It needs to pass a value
for __CUDA_FTZ down on a per-module basis.  We use a module flag for
this, so the NVVMReflect pass needs to be updated to read said flag.

Reviewers: tra, rnk

Subscribers: cfe-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D18672

llvm-svn: 265090
2016-04-01 01:09:07 +00:00
Adrian Prantl b8089516a5 testcase gardening: update the emissionKind enum to the new syntax. (NFC)
llvm-svn: 265081
2016-04-01 00:16:49 +00:00
Adrian Prantl b939a25707 Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.
This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder
into DICompileUnit. DIBuilder is not the right place for this enum to live
in — a metadata consumer should not have to include DIBuilder.h.
I also added a Verifier check that checks that the emission kind of a
DICompileUnit is actually legal.

http://reviews.llvm.org/D18612
<rdar://problem/25427165>

llvm-svn: 265077
2016-03-31 23:56:58 +00:00
Tim Shen 2e24d0c0c1 Move asm-printer-topological-order.ll to PowerPC backend
llvm-svn: 265067
2016-03-31 22:32:10 +00:00
Tim Shen 800ed436e5 [AsmPrinter] Print aliases in topological order
Print aliases in topological order, that is, for any alias a = b,
b must be printed before a. This is because on some targets (e.g. PowerPC)
linker expects aliases in such an order to generate correct TOC information.

GCC also prints aliases in topological order.

llvm-svn: 265064
2016-03-31 22:08:19 +00:00
Jun Bum Lim 760afcb338 [AArch64] Allow loads with imp-def to be handled in getMemOpBaseRegImmOfsWidth()
Summary:
This change will allow loads with imp-def to be clustered in machine-scheduler pass.
areMemAccessesTriviallyDisjoint() can also handle loads with imp-def.

Reviewers: mcrosier, jmolloy, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18665

llvm-svn: 265051
2016-03-31 20:53:47 +00:00
Hal Finkel 0f02ccd955 [PowerPC] Cleanup test/CodeGen/PowerPC/qpx-load-splat.ll
Removing unnecessary attributes and metadata...

llvm-svn: 265049
2016-03-31 20:45:00 +00:00
Sanjay Patel 61e13249d8 [x86] add memset tests to show another potential improvement
llvm-svn: 265048
2016-03-31 20:40:32 +00:00
Hal Finkel fc35391f2b [PowerPC] Add a late MI-level pass for QPX load/splat simplification
Chapter 3 of the QPX manual states that, "Scalar floating-point load
instructions, defined in the Power ISA, cause a replication of the source data
across all elements of the target register." Thus, if we have a load followed
by a QPX splat (from the first lane), the splat is redundant. This adds a late
MI-level pass to remove the redundant splats in some of these cases
(specifically when both occur in the same basic block).

This optimization is scheduled just prior to post-RA scheduling. It can't happen
before anything that might replace the load with some already-computed quantity
(i.e. store-to-load forwarding).

llvm-svn: 265047
2016-03-31 20:39:41 +00:00
Hans Wennborg 132cd62121 Revert r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
I think it might have caused these build breakages:
http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio

llvm-svn: 265046
2016-03-31 20:27:30 +00:00
Simon Pilgrim aab59b7a28 [X86][SSE] Some basic tests for variable shuffles
We don't really support non-constant shuffle masks, but these tests are for cases where BUILD_VECTOR is made up from vector extracts (as well as undef/zero scalars).

llvm-svn: 265045
2016-03-31 20:26:30 +00:00
Benjamin Kramer 569efd2cfd [ARM] Expand v1i64 and v2i64 ctpop.
The default is legal, which results in 'Cannot select' errors. This is
triggered during selfhost due to a recent cost model change.

llvm-svn: 265040
2016-03-31 19:42:04 +00:00
Hans Wennborg e97fb414e8 [X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)
For code such as:

  void f(int, int);
  void g() {
      f(1, 2);
  }

compiled for 32-bit X86 Linux, Clang would previously generate:

  subl    $12, %esp
  subl    $8, %esp
  pushl   $2
  pushl   $1
  calll   f
  addl    $16, %esp
  addl    $12, %esp
  retl

This patch fixes that by merging adjacent stack adjustments in
eliminateCallFramePseudoInstr().

Differential Revision: http://reviews.llvm.org/D18627

llvm-svn: 265039
2016-03-31 19:26:24 +00:00
Sanjay Patel 92d5ea5e07 [x86] use SSE/AVX ops for non-zero memsets (PR27100)
Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast
targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping
into a codegen sinkhole while trying to splat a byte into an XMM reg.

Follow-on bugs exposed by the current codegen are:
https://llvm.org/bugs/show_bug.cgi?id=27141
https://llvm.org/bugs/show_bug.cgi?id=27143

Differential Revision: http://reviews.llvm.org/D18566

llvm-svn: 265029
2016-03-31 17:30:06 +00:00
Hans Wennborg a7543ba10c More checks in win32-seh-nested-finally.ll after comment on r264966
llvm-svn: 265027
2016-03-31 16:42:10 +00:00
Ulrich Weigand 6ad762d36f [PowerPC] Attempt to fix fast-isel-i64offset.ll failure
The test case added in r265023 is failing on ninja-x64-msvc-RA-centos6.
Update the test to make less specific assumptions on code generation.

llvm-svn: 265026
2016-03-31 16:38:57 +00:00
Ulrich Weigand 3707ba8030 [PowerPC] Correctly compute 64-bit offsets in fast isel
PPCSimplifyAddress contains this code:

  IntegerType *OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(*Context)
                                            : Type::getInt64Ty(*Context));

to determine the type to be used for an index register, if one needs
to be created.  However, the "VT" here is the type of the data being
loaded or stored, *not* the type of an address.  This means that if
a data element of type i32 is accessed using an index that does not
not fit into 32 bits, a wrong address is computed here.

Note that PPCFastISel is only ever used on 64-bit currently, so the type
of an address is actually *always* MVT::i64.  Other parts of the code,
even in this same PPCSimplifyAddress routine, already rely on that fact.
Thus, this patch changes the code to simply unconditionally use
Type::getInt64Ty(*Context) as OffsetTy.

llvm-svn: 265023
2016-03-31 15:37:06 +00:00
Jun Bum Lim cf9744367b [AArch64] Handle missing store pair opportunity
Summary:
This change will handle missing store pair opportunity where the first store
instruction stores zero followed by the non-zero store. For example, this change
will convert :

  str wzr, [x8]
  str w1, [x8, #4]
into:
  stp wzr, w1, [x8]

Reviewers: jmolloy, t.p.northover, mcrosier

Subscribers: flyingforyou, aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18570

llvm-svn: 265021
2016-03-31 14:47:24 +00:00
Ulrich Weigand 1931b01a64 [PowerPC] Remove incorrect use of COPY_TO_REGCLASS in fast isel
The fast isel pass currently emits a COPY_TO_REGCLASS node to convert
from a F4RC to a F8RC register class during conversion of a
floating-point number to integer. There is actually no support in the
common code instruction printers to emit COPY_TO_REGCLASS nodes, so the
PowerPC back-end has special code there to simply ignore
COPY_TO_REGCLASS.

This is correct *if and only if* the source and destination registers of
COPY_TO_REGCLASS are the same (except for the different register class).
But nothing guarantees this to be the case, and if the register
allocator does end up allocating source and destination to different
registers after all, the back-end simply generates incorrect code. I've
included a test case that shows such incorrect code generation.

However, it seems that COPY_TO_REGCLASS is actually not intended to be
used at the MI layer at all. It is used during SelectionDAG, but always
lowered to a plain COPY before emitting MI. Other back-end's fast isel
passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong
for the PowerPC back-end to emit it here.

This patch changes the PowerPC back-end to directly emit COPY instead of
COPY_TO_REGCLASS and removes the special handling in the instruction
printers.

Differential Revision: http://reviews.llvm.org/D18605

llvm-svn: 265020
2016-03-31 14:44:50 +00:00
Davide Italiano 936a2b09f3 [DebugInfo] Subprograms should belong to a CU.
Start fixing tests accordingly. There are still
about 35 failures before we can enable this check
in the IR verifier.

llvm-svn: 264990
2016-03-31 03:40:07 +00:00
Hal Finkel 851b33a0b1 [PowerPC] Load two floats directly instead of using one 64-bit integer load
When dealing with complex<float>, and similar structures with two
single-precision floating-point numbers, especially when such things are being
passed around by value, we'll sometimes end up loading both float values by
extracting them from one 64-bit integer load. It looks like this:

  t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64
      t16: i64 = srl t13, Constant:i32<32>
    t17: i32 = truncate t16
  t18: f32 = bitcast t17
    t19: i32 = truncate t13
  t20: f32 = bitcast t19

The problem, especially before the P8 where those bitcasts aren't legal (and
get expanded via the stack), is that it would have been better to use two
floating-point loads directly. Here we add a target-specific DAGCombine to do
just that. In short, we turn:

	ld 3, 0(5)
	stw 3, -8(1)
	rldicl 3, 3, 32, 32
	stw 3, -4(1)
	lfs 3, -4(1)
	lfs 0, -8(1)

into:

        lfs 3, 4(5)
        lfs 0, 0(5)

llvm-svn: 264988
2016-03-31 02:56:05 +00:00
Sean Silva 24d7e2e869 Fix case confusion.
The test case was defining and using a function 'notExported()', but
the FileCheck checks were checking for the name 'not_exported'. This
changes the test to use 'notExported' across the board. Also, the test
defined a function 'not_defined()', but doesn't have any checks related
to it. For consistency, this name is changed to 'notDefined'. A later
commit will add checks for 'notDefined'.

Patch by Warren Ristow!

llvm-svn: 264984
2016-03-31 01:47:33 +00:00
Hans Wennborg be0df2b102 Add some more triples after r264966
llvm-svn: 264972
2016-03-30 23:55:22 +00:00
Hans Wennborg 6596977130 [X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325)
The size savings are significant, and from what I can tell, both ICC and GCC do this.

Differential Revision: http://reviews.llvm.org/D18573

llvm-svn: 264966
2016-03-30 23:38:01 +00:00
Matt Arsenault 2fe4fbc184 AMDGPU: Add frexp_exp intrinsic
llvm-svn: 264944
2016-03-30 22:28:52 +00:00
Simon Pilgrim c49bd2ede0 [X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros
Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.

llvm-svn: 264922
2016-03-30 20:52:24 +00:00
Tom Stellard 1d5e6d4bdc AMDGPU/SI: Improve MachineSchedModel definition
This patch contains a few improvements to the model, including:

- Using a single resource with a defined buffers size for each memory unit.
- Setting the IssueWidth correctly.
- Fixing latency values for memory instructions.

shader-db stats:

16429 shaders in 3231 tests
Totals:
SGPRS: 318232 -> 312328 (-1.86 %)
VGPRS: 208996 -> 209346 (0.17 %)
Code Size: 7147044 -> 7166440 (0.27 %) bytes
LDS: 83 -> 83 (0.00 %) blocks
Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave
Max Waves: 49182 -> 49243 (0.12 %)
Wait states: 0 -> 0 (0.00 %)A

Differential Revision: http://reviews.llvm.org/D18453

llvm-svn: 264877
2016-03-30 16:35:13 +00:00
Tom Stellard 0bc954e3bc AMDGPU/SI: Enable lanemask tracking in misched
Summary:
This results in higher register usage, but should make it easier for
the compiler to hide latency.

This pass is a prerequisite for some more scheduler improvements, and I
think the increase register usage with this patch is acceptable, because
when combined with the scheduler improvements, the total register usage
will decrease.

shader-db stats:

2382 shaders in 478 tests
Totals:
SGPRS: 48672 -> 49088 (0.85 %)
VGPRS: 34148 -> 34847 (2.05 %)
Code Size: 1285816 -> 1289128 (0.26 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 492544 -> 573440 (16.42 %) bytes per wave
Max Waves: 6856 -> 6846 (-0.15 %)
Wait states: 0 -> 0 (0.00 %)

Depends on D18451

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18452

llvm-svn: 264876
2016-03-30 16:35:09 +00:00
Simon Pilgrim b87ffe8519 [X86][XOP] BITREVERSE lowering using VPPERM
XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction.

llvm-svn: 264870
2016-03-30 14:14:00 +00:00
Simon Pilgrim 9490b56a89 [X86][SSE] Test the legalization of vector comparison results
We are currently doing a REALLY bad job of packing results of vector comparisons into the legalized <X x i1> result equivalents - a mixture of PACKSS/PMOVMSKB would be much better here.

llvm-svn: 264867
2016-03-30 13:55:00 +00:00
Simon Pilgrim ab305a9d4c [X86][SSE] Added tests for clearing upper bits of vector elements
Patterns based on PR6455

llvm-svn: 264857
2016-03-30 11:43:26 +00:00
Chandler Carruth 8e06a10d1f [x86] Fix a horrible bug in our lowering of x86 floating point atomic
operations.

Specifically, we had code that tried to badly approximate reconstructing
all of the possible variations on addressing modes in two x86
instructions based on those in one pseudo instruction. This is not the
first bug uncovered with doing this, so stop doing it altogether.
Instead generically and pedantically copy every operand from the address
over to both new instructions, and strip kill flags from any register
operands.

This fixes a subtle bug seen in the wild where we would mysteriously
drop parts of the addressing mode, causing for example the index
argument in the added test case to just be completely ignored.

Hypothetically, this was an extremely bad miscompile because it actually
caused a predictable and leveragable write of a 64bit quantity to an
unintended offset (the first element of the array intead of whatever
other element was intended). As a consequence, in theory this could even
have introduced security vulnerabilities.

However, this was only something that could happen with an atomic
floating point add. No other operation could trigger this bug, so it
seems extremely unlikely to have occured widely in the wild.

But it did in fact occur, and frequently in scientific applications
which were using relaxed atomic updates of a floating point value after
adding a delta. Those would end up being quite badly miscompiled by
LLVM, which is how we found this. Of course, this often looks like
a race condition in the code, but it was actually a miscompile.

I suspect that this whole RELEASE_FADD thing was a complete mistake.
There is no such operation, and I worry that anything other than add
will get remarkably worse codegeneration. But that's not for this
change....

llvm-svn: 264845
2016-03-30 08:41:59 +00:00
Adrian Prantl 4a09777b37 Upgrade some wildly anachronistic debug info in testcases.
llvm-svn: 264797
2016-03-29 22:34:30 +00:00
Sanjay Patel f48f7d74e2 use FileCheck and auto-check-generation script for exact checking
1. Removed the run line for mingw32 and made the Darwin triples unknown.
   This is a test of 32-bit vs. 64-bit platform and the underlying hardware.
   We have other tests for checking behavioral differences of the OS platform.

2. Changed the CPU specifiers to the attributes they were meant to represent.
   Any CPU that doesn't have SSE4.2 is assumed to have slow unaligned 16-byte accesses,
   so it won't use those here.
 
3. Although the stores really could all be CHECK-DAG, I left them as CHECK-NEXT to
   show the strange behavior of the instruction scheduler in the SLOW_32 case.

4. The odd-looking instructions are due to the use of a null pointer in the IR, so
   we have integer immediate store addresses. Cute.

llvm-svn: 264796
2016-03-29 22:27:39 +00:00
Nirav Dave 2aab7f4358 Add support for no-jump-tables
Add function soft attribute to the generation of Jump Tables in CodeGen
as initial step towards clang support of gcc's no-jump-table support

Reviewers: hans, echristo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18321

llvm-svn: 264756
2016-03-29 17:46:23 +00:00
Manman Ren f46262e0b7 Swift Calling Convention: add swiftself attribute.
Differential Revision: http://reviews.llvm.org/D17866

llvm-svn: 264754
2016-03-29 17:37:21 +00:00
Sanjay Patel 7c61507991 [x86] add tests to show current memset codegen
llvm-svn: 264748
2016-03-29 17:09:27 +00:00
Sanjay Patel 32de9628e3 regenerate checks
llvm-svn: 264738
2016-03-29 16:11:29 +00:00
Junmo Park 32ddc544c3 fix CHECK_NOT -> CHECK-NOT
llvm-svn: 264706
2016-03-29 07:53:07 +00:00
Elena Demikhovsky 95629caaa9 AVX-512: fixed a bug in fp_to_uint pattern on KNL
Fixed fp_to_uint instruction selection on KNL.
One pattern was missing for <4 x double> to <4 x i32>

Differential Revision: http://reviews.llvm.org/D18512

llvm-svn: 264701
2016-03-29 06:33:41 +00:00
Hal Finkel fa7057a415 [PowerPC] Refactor popcnt[dw] target features
Instead of using two feature bits, one to indicate the availability of the
popcnt[dw] instructions, and another to indicate whether or not they're fast,
use a single enum. This allows more consistent control via target attribute
strings, and via Clang's command line.

llvm-svn: 264690
2016-03-29 01:36:01 +00:00
Kyle Butt 5e241b11ed [Codegen] Decrease minimum jump table density.
Minimum density for both optsize and non optsize are now options
-sparse-jump-table-density (default 10) for non optsize functions
-dense-jump-table-density (default 40) for optsize functions, which
matches the current default. This improves several benchmarks at google
at the cost of a small codesize increase. For code compiled with -Os,
the old behavior continues

llvm-svn: 264689
2016-03-29 00:23:41 +00:00
Sanjay Patel 1f867c6f9c fix checks: *_DAG -> *-DAG
llvm-svn: 264676
2016-03-28 22:11:06 +00:00
Sanjay Patel 53f9ae288a fix CHECK_NEXT -> CHECK-NEXT
llvm-svn: 264674
2016-03-28 22:03:07 +00:00
Sanjay Patel d9ebff8bab fix CHECK_DAG -> CHECK-DAG
llvm-svn: 264673
2016-03-28 22:00:38 +00:00
Sanjay Patel 52fe9d0efe fix CHECK_NEXT -> CHECK-NEXT
llvm-svn: 264672
2016-03-28 21:58:27 +00:00
Sanjay Patel 8f9e26964a fix CHECK_LABEL -> CHECK-LABEL
llvm-svn: 264671
2016-03-28 21:56:48 +00:00
Sanjay Patel a1e5ef624c trailing whitespace
llvm-svn: 264670
2016-03-28 21:52:53 +00:00
Simon Pilgrim d3df400fa9 [X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op for all their scalar elements.
If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors.

The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent.

Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least.

Differential Revision: http://reviews.llvm.org/D18492

llvm-svn: 264666
2016-03-28 21:33:52 +00:00
Sanjay Patel 7092de4cd2 fix CHECK_NEXT -> CHECK-NEXT
llvm-svn: 264661
2016-03-28 21:14:24 +00:00
Matthias Braun b74eb41d58 MIRParser: Add %subreg.xxx syntax for subregister index operands
Differential Revision: http://reviews.llvm.org/D18279

llvm-svn: 264608
2016-03-28 18:18:46 +00:00
Matthias Braun 2bd8eeb6b7 CodeGen: Correct specification of PHI nodes
They do have a def machine operand.

Fixing the definition is necessary for an upcoming patch.

Differential Revision: http://reviews.llvm.org/D18384

llvm-svn: 264607
2016-03-28 18:18:41 +00:00
Haicheng Wu 6a6bc750d5 [AArch64] Do not lower scalar sdiv/udiv to a shifts + mul sequence when optimizing for minsize
Mimic what x86 does when optimizing sdiv/udiv for minsize.

llvm-svn: 264606
2016-03-28 18:17:07 +00:00
Hal Finkel 7059d41622 [PowerPC] On the A2, popcnt[dw] are very slow
The A2 cores support the popcntw/popcntd instructions, but they're microcoded,
and slower than our default software emulation. Specifically, popcnt[dw] take
approximately 74 cycles, whereas our software emulation takes only 24-28
cycles.

I've added a new target feature to indicate a slow popcnt[dw], instead of just
removing the existing target feature from the a2/a2q processor models, because:
  1. This allows us to return more accurate information via the TTI interface
     (I recognize that this currently makes no practical difference)
  2. Is hopefully easier to understand (it allows the core's features to match
     its manual while still having the desired effect).

llvm-svn: 264600
2016-03-28 17:52:08 +00:00
Derek Schuff ad154c837e Introduce MachineFunctionProperties and the AllVRegsAllocated property
MachineFunctionProperties represents a set of properties that a MachineFunction
can have at particular points in time. Existing examples of this idea are
MachineRegisterInfo::isSSA() and MachineRegisterInfo::tracksLiveness() which
will eventually be switched to use this mechanism.
This change introduces the AllVRegsAllocated property; i.e. the property that
all virtual registers have been allocated and there are no VReg operands
left.

With this mechanism, passes can declare that they require a particular property
to be set, or that they set or clear properties by implementing e.g.
MachineFunctionPass::getRequiredProperties(). The MachineFunctionPass base class
verifies that the requirements are met, and handles the setting and clearing
based on the delcarations. Passes can also directly query and update the current
properties of the MF if they want to have conditional behavior.

This change annotates the target-independent post-regalloc passes; future
changes will also annotate target-specific ones.

Reviewers: qcolombet, hfinkel

Differential Revision: http://reviews.llvm.org/D18421

llvm-svn: 264593
2016-03-28 17:05:30 +00:00
Tom Stellard a76bcc2ea1 AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions
Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together.  The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.

This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.

shader-db stats:

2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18451

llvm-svn: 264589
2016-03-28 16:10:13 +00:00
Krzysztof Parzyszek 2d65ea74dc [Hexagon] Improve handling of unaligned vector loads and stores
llvm-svn: 264584
2016-03-28 15:43:03 +00:00
Krzysztof Parzyszek bb63f66686 [Hexagon] Only use restore functions for single register at -Oz
llvm-svn: 264581
2016-03-28 14:52:21 +00:00
Jacques Pienaar fcef3e4617 [lanai] Add Lanai backend.
Add the Lanai backend to lib/Target.

General Lanai backend discussion on llvm-dev thread "[RFC] Lanai backend" (http://lists.llvm.org/pipermail/llvm-dev/2016-February/095118.html).

Differential Revision: http://reviews.llvm.org/D17011

llvm-svn: 264578
2016-03-28 13:09:54 +00:00
Elena Demikhovsky 83f0647d85 AVX-512: Fixed ICMP instruction selection for i1 operands
ICMP instruction selection fails on SKX and KNL for i1 operand.
I use XOR to resolve:
(A == B) is equivalent to (A xor B) == 0

Differential Revision: http://reviews.llvm.org/D18511

llvm-svn: 264566
2016-03-28 07:47:58 +00:00
Hal Finkel 0b37175ca6 [PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based loop legality
Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc
function calls (fmax[f], etc.) generally map to function calls after lowering.
For some vector types with QPX at least, however, we can legally lower these,
and we don't need to prohibit CTR-based loops on their account.

It turned out, however, that the logic that checked the opcodes associated with
intrinsics was broken (it would set the Opcode variable, but that variable was
later checked only if set for some otherwise-external function call.

This fixes the latter problem and adds the FMAX/MINNUM mappings.

llvm-svn: 264532
2016-03-27 05:40:56 +00:00
Simon Pilgrim dcdf85033c [X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targets
Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization

llvm-svn: 264517
2016-03-26 18:32:13 +00:00
Simon Pilgrim e4dbeb40c6 [X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targets
Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization

Differential Revision: http://reviews.llvm.org/D18307

llvm-svn: 264512
2016-03-26 15:44:55 +00:00
Simon Pilgrim 3eef33a806 [X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors
Currently this is to mainly to prevent scalarization of integer division by constants.

Differential Revision: http://reviews.llvm.org/D18307

llvm-svn: 264511
2016-03-26 15:27:20 +00:00
Simon Pilgrim 7b36cdaecf [X86][SSE] Added v64i8 vector integer multiply tests
llvm-svn: 264510
2016-03-26 09:50:06 +00:00
Simon Pilgrim 7379a70677 [X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 multiplies.
Only pre-AVX512BW targets need to split v32i8 vectors.

llvm-svn: 264509
2016-03-26 09:44:27 +00:00
David Majnemer b549ab02b4 [PowerPC] Disable the CTR optimization in the presence of {min,max}num
The minnum and maxnum intrinsics get lowered to libcalls which
invalidates the CTR optimization.

This fixes PR27083.

llvm-svn: 264508
2016-03-26 09:42:31 +00:00
Simon Pilgrim 9a5f19f509 [X86][SSE] Refreshed vector integer multiply tests
Add all 256-bit vector tests.
Added AVX512F/AVX512BW test targets.
Renamed tests something more meaningful.

llvm-svn: 264507
2016-03-26 09:35:48 +00:00
David Majnemer 020e890a19 [X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddr
We forgot to add the second machine operand to our ADJCALLSTACKDOWN,
resulting in crashes in PEI.

This fixes PR27071.

llvm-svn: 264465
2016-03-25 21:49:11 +00:00
Jun Bum Lim 36c53fe147 [MachineCopyPropagation] Expose more dead copies across instructions with regmasks
When encountering instructions with regmasks, instead of cleaning up all the
elements in MaybeDeadCopies map, remove only the instructions erased. By keeping
more instruction in MaybeDeadCopies, this change will expose more dead copies
across instructions with regmasks.

llvm-svn: 264462
2016-03-25 21:15:35 +00:00
Nirav Dave fa250cad37 Prevent construction of cycle in DAG store merge
When merging stores in DAGCombiner, add check to ensure that no
dependenices exist that would cause the construction of a cycle in our
DAG.  This may happen if one store has a data dependence on another
instruction (e.g. a load) which itself has a (chain) dependence on
another store being merged. These stores cannot be merged safely and
doing so results in a cycle that is discovered in LegalizeDAG.

This test is only done in cases where Antialias analysis is used (UseAA)
as non-AA store merge candidates will be merged logically after all
loads which have been checked to not alias.

Reviewers: ahatanak, spatel, niravd, arsenm, hfinkel, tstellarAMD, jyknight

Subscribers: llvm-commits, tberghammer, danalbert, srhines

Differential Revision: http://reviews.llvm.org/D18336

llvm-svn: 264461
2016-03-25 21:06:30 +00:00
Saleem Abdulrasool 750a90df6a ARM: maintain BB ordering when expanding WIN__DBZCHK
It is possible to have a fallthrough MBB prior to MBB placement.  The original
addition of the BB would result in reordering the BB as not preceding the
successor.  Because of the fallthrough nature of the BB, we could end up
executing incorrect code or even a constant pool island!  Insert the spliced BB
into the same location to avoid that.

Thanks to Tim Northover for invaluable hints and Fiora for the discussion on
what may have been occurring!

llvm-svn: 264454
2016-03-25 19:48:06 +00:00
Hans Wennborg 5f916d3df4 [X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsize
64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes,
respectively, whereas and/or with 8-bit immediate is only three bytes.

Since these instructions imply an additional memory read (which the CPU could
elide, but we don't think it does), restrict these patterns to minsize functions.

Differential Revision: http://reviews.llvm.org/D18374

llvm-svn: 264440
2016-03-25 18:11:31 +00:00
Hans Wennborg 4ae5119eeb X86: Use push-pop for materializing 8-bit immediates for minsize (take 2)
This is the same as r255936, with added logic for avoiding clobbering of the
red zone (PR26023).

Differential Revision: http://reviews.llvm.org/D18246

llvm-svn: 264375
2016-03-25 01:10:56 +00:00
Saleem Abdulrasool 0dab98d926 ARM: fix optimised division on WoA
We did not have an explicit branch to the continuation BB.  When the check was
hoisted, this could permit control follow to fall through into the division
trap.  Add the explicit branch to the continuation basic block to ensure that
code execution is correct.

llvm-svn: 264370
2016-03-25 00:34:11 +00:00
Manman Ren 9dd8c14674 CXX TLS: collect return blocks after SelectAllBasicBlocks.
It is incorrect to get the corresponding MBB for a ReturnInst before
SelectAllBasicBlocks since SelectAllBasicBlocks can change the
correspondence between a ReturnInst and the MBB it is in.

PR27062

llvm-svn: 264358
2016-03-24 23:21:29 +00:00
Sanjoy Das 731c67fed2 Lower varargs correctly in deopt bundle lowering
Earlier we were ignoring varargs in LowerCallSiteWithDeoptBundle because
populateCallLoweringInfo does not set CallLoweringInfo::IsVarArg.

llvm-svn: 264354
2016-03-24 22:37:52 +00:00
Matthias Braun ae81c29352 LiveInterval: Fix Distribute() failing on liveranges with unused VNInfos
This fixes http://llvm.org/PR26991

llvm-svn: 264345
2016-03-24 21:41:38 +00:00
Eric Christopher b979d51afa Finish the incomplete 'd' inline asm constraint support for PPC by
making sure we give it a register and mark it as a register constraint.

llvm-svn: 264340
2016-03-24 21:04:52 +00:00
Eric Christopher 8c95d53d45 Reorder check lines, comments in test and remove unnecessary IR.
llvm-svn: 264339
2016-03-24 21:04:47 +00:00
Sanjoy Das 6bcfe31820 Match call and target calling conventions in test
Fixes an issue in rL264329.

llvm-svn: 264337
2016-03-24 20:51:24 +00:00
Sanjoy Das df9ae70f49 Add lowering support for llvm.experimental.deoptimize
Summary:
Only adds support for "naked" calls to llvm.experimental.deoptimize.
Support for round-tripping through RewriteStatepointsForGC will come
as a separate patch (should be simpler than this one).

Reviewers: reames

Subscribers: sanjoy, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18429

llvm-svn: 264329
2016-03-24 20:23:29 +00:00
Krzysztof Parzyszek c9d4caa32c [Hexagon] Add support for run-time stack overflow checking
Patch by Sundeep Kushwaha.

llvm-svn: 264328
2016-03-24 20:20:07 +00:00
Krzysztof Parzyszek 181fdbd174 [Hexagon] Generate PIC-specific versions of save/restore routines
In PIC mode, the registers R14, R15 and R28 are reserved for use by
the PLT handling code. This causes all functions to clobber these
registers. While this is not new for regular function calls, it does
also apply to save/restore functions, which do not follow the standard
ABI conventions with respect to the volatile/non-volatile registers.

Patch by Jyotsna Verma.

llvm-svn: 264324
2016-03-24 19:18:48 +00:00
Sanjoy Das c0c59fe14e [Statepoints] Fix yet another issue around gc pointer uniqueing
Given that StatepointLowering now uniques derived pointers before
putting them in the per-statepoint spill map, we may end up with missing
entries for derived pointers when we visit a gc.relocate on a pointer
that was de-duplicated away.

Fix this by keeping two maps, one mapping gc pointers to their
de-duplicated values, and one mapping a de-duplicated value to the slot
it is spilled in.

llvm-svn: 264320
2016-03-24 18:57:39 +00:00
Sanjoy Das 8f42b7b3cd Remove unnecessary redirect from test
llvm-svn: 264308
2016-03-24 17:18:00 +00:00
Elena Demikhovsky 95f3173ce9 AVX-512: Generate KTEST instead of TEST fir i1 vectors
KTEST instruction may be used instead of TEST in this case:

%int_sel3 = bitcast <8 x i1> %sel3 to i8
%res = icmp eq i8 %int_sel3, zeroinitializer
br i1 %res, label %L2, label %L1

Differential Revision: http://reviews.llvm.org/D18444

llvm-svn: 264298
2016-03-24 15:53:45 +00:00
Tim Northover 4498eff9bb CodeGen: extend RHS when splitting ATOMIC_CMP_SWAP_WITH_SUCCESS.
If the operation's type has been promoted during type legalization, we
need to account for the fact that the high bits of the comparison
operand are likely unspecified.

The LHS is usually zero-extended, but MIPS sign extends it, so we have
to be slightly careful.

Patch by Simon Dardis.

llvm-svn: 264296
2016-03-24 15:38:38 +00:00
Pirama Arumuga Nainar dc45aef2d8 Remove unsafe AssertZext after promoting result of FP_TO_FP16
Summary:
Some target lowerings of FP_TO_FP16, for instance ARM's vcvtb.f16.f32
instruction, do not guarantee that the top 16 bits are zeroed out.
Remove the unsafe AssertZext and add tests to exercise this.

Reviewers: jmolloy, sbaranga, kristof.beyls, aadg

Subscribers: llvm-commits, srhines, aemerson

Differential Revision: http://reviews.llvm.org/D18426

llvm-svn: 264285
2016-03-24 14:06:03 +00:00
Nemanja Ivanovic 5ebc92dbe1 [PowerPC] Disable direct moves for extractelement and bitcast in 32-bit mode
This patch corresponds to review:
http://reviews.llvm.org/D17711

It disables direct moves on these operations in 32-bit mode since the patterns
assume 64-bit registers. The final patch is slightly different from the
Phabricator review as the bitcast operations needed to be disabled in 32-bit
mode as well. This fixes PR26617.

llvm-svn: 264282
2016-03-24 13:40:33 +00:00
Simon Pilgrim 572ca71573 [X86][XOP] Support for VPPERM byte shuffle instruction
This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode.

Differential Revision: http://reviews.llvm.org/D18189

llvm-svn: 264260
2016-03-24 11:52:43 +00:00
Zlatko Buljan 94af4cbcf4 [mips][microMIPS] Add CodeGen support for DIV, MOD, DIVU, MODU, DDIV, DMOD, DDIVU and DMODU instructions
Differential Revision: http://reviews.llvm.org/D17137

llvm-svn: 264248
2016-03-24 09:22:45 +00:00
Hrvoje Varga dbea1a1e51 [mips][microMIPS] Fix for "Cannot copy registers" assertion
Differential Revision: http://reviews.llvm.org/D17068

llvm-svn: 264245
2016-03-24 06:05:35 +00:00
Simon Pilgrim 0110890a79 [X86][SSE] Added tests to ensure that consecutive loads including any/all volatiles are not combined
llvm-svn: 264225
2016-03-24 00:14:37 +00:00
Paul Robinson f81836bd18 [PS4] Guarantee an instruction after a 'noreturn' call.
We need the "return address" of a noreturn call to be within the
bounds of the calling function; TrapUnreachable turns 'unreachable'
into a 'ud2' instruction, which has that desired effect.

Differential Revision: http://reviews.llvm.org/D18414

llvm-svn: 264224
2016-03-24 00:10:03 +00:00
Matt Arsenault 30d37a74da AMDGPU: Remove atomic inc/dec patterns
There is no benefit to these since materializing the constant 1
requires the same number of instructions as materializing uint_max

llvm-svn: 264215
2016-03-23 23:23:38 +00:00
Matt Arsenault 0a30e456b4 AMDGPU: Promote alloca should skip volatiles
llvm-svn: 264214
2016-03-23 23:17:29 +00:00
Matt Arsenault f43c2a0b49 AMDGPU: Insert moves of frame index to value operands
Strengthen tests of storing frame indices.

Right now this just creates irrelevant scheduling changes.

We don't want to have multiple frame index operands
on an instruction. There seem to be various assumptions
that at least the same frame index will not appear twice
in the LocalStackSlotAllocation pass.

There's no reason to have this happen, and it just
makes it easy to introduce bugs where the immediate
offset is appplied to the storing instruction when it should
really be applied to the value being stored as a separate
add.

This might not be sufficient. It might still be problematic
to have an add fi, fi situation, but that's even less unlikely
to happen in real code.

llvm-svn: 264200
2016-03-23 21:49:25 +00:00
Cong Hou 94710840fb Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
Currently, AnalyzeBranch() fails non-equality comparison between floating points
on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
function can modify the branch by reversing the conditional jump and removing
unconditional jump if there is a proper fall-through. However, in the case of
non-equality comparison between floating points, this can turn the branch
"unanalyzable". Consider the following case:

jne.BB1
jp.BB1
jmp.BB2
.BB1:
...
.BB2:
...

AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
removed:

jne.BB1
jnp.BB2
.BB1:
...
.BB2:
...

However, AnalyzeBranch() cannot analyze this branch anymore as there are two
conditional jumps with different targets. This may disable some optimizations
like block-placement: in this case the fall-through behavior is enforced even if
the fall-through block is very cold, which is suboptimal.

Actually this optimization is also done in block-placement pass, which means we
can remove this optimization from AnalyzeBranch(). However, currently
X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
negation conditions for them.

In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
Here only the second conditional jump is reversed. This is valid as we only need
them to do this "unconditional jump removal" optimization.


Differential Revision: http://reviews.llvm.org/D11393

llvm-svn: 264199
2016-03-23 21:45:37 +00:00
Kyle Butt 613112826e Codegen: [PPC] Word Rotates are Zero Extending.
Add Word rotates to the list of instructions that are zero extending.
This allows them to be used in dot form to compare with zero.

llvm-svn: 264183
2016-03-23 19:51:22 +00:00
Simon Pilgrim b5fb65d43e [X86] Regenerated WidenArith test
llvm-svn: 264157
2016-03-23 14:00:28 +00:00
Andrey Turetskiy 6a3d561ea0 [X86] Introduction of FeatureX87.
Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87.

Differential Revision: http://reviews.llvm.org/D13979

llvm-svn: 264148
2016-03-23 11:13:54 +00:00
Hrvoje Varga c45baf212a [mips][microMIPS] Delay slot filler modifications
Differential Revision: http://reviews.llvm.org/D18181

llvm-svn: 264147
2016-03-23 10:29:38 +00:00
Sanjoy Das e58ca59cf4 [StatepointLowering] Schedule gc relocates before uniqueing them
Otherwise we can see an "unexpected" gc.relocate that we uniqued away.

llvm-svn: 264127
2016-03-23 02:24:07 +00:00
Matthias Braun 68bb2931cc Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"
This commit broke LTO builds. Reverting it to unbreak the bots while the
issue is investigated. See also:

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160321/341002.html

This reverts r263158

llvm-svn: 264088
2016-03-22 20:24:34 +00:00
Simon Pilgrim cc41495eb8 [X86][AVX] Added AVX1 tests for 256-bit vector idiv-by-constant
Prep work based on feedback for D18307 

llvm-svn: 264086
2016-03-22 20:10:49 +00:00
Simon Pilgrim c6f5fe3d69 [SelectionDAG] Ensure constant folded legalized vector element types are compatible with the BUILD_VECTOR type
Found during fuzz testing - 32-bit x86 targets were legalizing a <2 x i1> compare result to <2 x i32> when <2 x i64> was expected.

llvm-svn: 264085
2016-03-22 19:59:53 +00:00
Tim Northover b49a8a9dbb CodeGen: check return types match when emitting tail call to builtin.
We were just completely ignoring the types when determining whether we could
safely emit a libcall as a tail call. This is clearly wrong.

Theoretically, we could dig deeper looking for incidental matches (much like
the generic code in Analysis.cpp does), but it's probably not worth it for the
few libcalls that exist.

llvm-svn: 264084
2016-03-22 19:14:38 +00:00
Sanjoy Das bfecef5e1b Remove unnecessary branch from test
(Addresses post commit review by Reid Kleckner)

llvm-svn: 264083
2016-03-22 18:45:41 +00:00
Sanjoy Das eb5037cadc Allow lowering call sites with both funclets and deopt state
Lowering funclets is a no-op, so we can just go ahead and lower the
deopt state.

llvm-svn: 264078
2016-03-22 18:10:39 +00:00
Dan Gohman 665d7e3838 [WebAssembly] Implement the rotate instructions.
llvm-svn: 264076
2016-03-22 18:01:49 +00:00
Simon Pilgrim 25fb4177fb [X86][SSE] Reapplied: Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Reapplied with a fix for PR26953 (missing vector widening legalization).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 264062
2016-03-22 16:22:08 +00:00
Sanjoy Das dd1d72ce92 Appease the windows buildbots
The guess is that the stdout/stderr ordering may differ between windows
/ unix.

llvm-svn: 264019
2016-03-22 02:11:57 +00:00
Sanjoy Das 38bfc22161 Add "first class" lowering for deopt operand bundles
Summary:
After this change, deopt operand bundles can be lowered directly by
SelectionDAG into STATEPOINT instructions (which are then lowered to a
call or sequence of nop, with an associated __llvm_stackmaps entry0.
This obviates the need to round-trip deoptimization state through
gc.statepoint via RewriteStatepointsForGC.

Reviewers: reames, atrick, majnemer, JosephTremoulet, pgavlin

Subscribers: sanjoy, mcrosier, majnemer, llvm-commits

Differential Revision: http://reviews.llvm.org/D18257

llvm-svn: 264015
2016-03-22 00:59:13 +00:00
Nicolai Haehnle 213e87f2ee AMDGPU: Add SIWholeQuadMode pass
Summary:
Whole quad mode is already enabled for pixel shaders that compute
derivatives, but it must be suspended for instructions that cause a
shader to have side effects (i.e. stores and atomics).

This pass addresses the issue by storing the real (initial) live mask
in a register, masking EXEC before instructions that require exact
execution and (re-)enabling WQM where required.

This pass is run before register coalescing so that we can use
machine SSA for analysis.

The changes in this patch expose a problem with the second machine
scheduling pass: target independent instructions like COPY implicitly
use EXEC when they operate on VGPRs, but this fact is not encoded in
the MIR. This can lead to miscompilation because instructions are
moved past changes to EXEC.

This patch fixes the problem by adding use-implicit operands to
target independent instructions. Some general codegen passes are
relaxed to work with such implicit use operands.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18162

llvm-svn: 263982
2016-03-21 20:28:33 +00:00
Dan Gohman c8d7f14506 [WebAssembly] Implement the eqz instructions.
llvm-svn: 263976
2016-03-21 19:54:41 +00:00
Tom Stellard 92339e888f AMDGPU/SI: Fix threshold calculation for branching when exec is zero
Summary:
When control flow is implemented using the exec mask, the compiler will
insert branch instructions to skip over the masked section when exec is
zero if the section contains more than a certain number of instructions.

The previous code would only count instructions in successor blocks,
and this patch modifies the code to start counting instructions in all
blocks between the start and end of the branch.

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18282

llvm-svn: 263969
2016-03-21 18:56:58 +00:00
Matt Arsenault cb38a6bd35 AMDGPU: Remove SignBitIsZero for mubuf scratch offsets
These instructions do not have the same negative base
address problem that DS instructions do on SI.

llvm-svn: 263964
2016-03-21 18:02:18 +00:00
Peter Collingbourne 86b9fbe980 ARM: Better codegen for 64-bit compares.
This introduces a custom lowering for ISD::SETCCE (introduced in r253572)
that allows us to emit a short code sequence for 64-bit compares.

Before:

	push	{r7, lr}
	cmp	r0, r2
	mov.w	r0, #0
	mov.w	r12, #0
	it	hs
	movhs	r0, #1
	cmp	r1, r3
	it	ge
	movge.w	r12, #1
	it	eq
	moveq	r12, r0
	cmp.w	r12, #0
	bne	.LBB1_2
@ BB#1:                                 @ %bb1
	bl	f
	pop	{r7, pc}
.LBB1_2:                                @ %bb2
	bl	g
	pop	{r7, pc}

After:

	push	{r7, lr}
	subs	r0, r0, r2
	sbcs.w	r0, r1, r3
	bge	.LBB1_2
@ BB#1:                                 @ %bb1
	bl	f
	pop	{r7, pc}
.LBB1_2:                                @ %bb2
	bl	g
	pop	{r7, pc}

Saves around 80KB in Chromium's libchrome.so.

Some notes on this patch:

- I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I
  introduced (nothing else needs them). However, they are necessary in
  order to avoid poor codegen, and they seem similar to existing combines
  in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to
  (brcond Compare)).

- No support for Thumb-1. This is in principle possible, but we'd need
  to implement ARMISD::SUBE for Thumb-1.

Differential Revision: http://reviews.llvm.org/D15256

llvm-svn: 263962
2016-03-21 18:00:02 +00:00
Renato Golin 2b6b7ffd6c [ARM] Add Cortex-A32 support
Adding Cortex-A32 as an available target in the ARM backend.

Patch by Sam Parker.

llvm-svn: 263956
2016-03-21 17:29:01 +00:00
Matt Arsenault b96b57347a AMDGPU: Add frexp_mant intrinsic
llvm-svn: 263948
2016-03-21 16:11:05 +00:00
Silviu Baranga 46030585b3 [DAGCombine] Catch the case where extract_vector_elt can cause an any_ext while processing AND SDNodes
Summary:
extract_vector_elt can cause an implicit any_ext if the types don't
match. When processing the following pattern:

  (and (extract_vector_elt (load ([non_ext|any_ext|zero_ext] V))), c)

DAGCombine was ignoring the possible extend, and sometimes removing
the AND even though it was required to maintain some of the bits
in the result to 0, resulting in a miscompile.

This change fixes the issue by limiting the transformation only to
cases where the extract_vector_elt doesn't perform the implicit
extend.

Reviewers: t.p.northover, jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18247

llvm-svn: 263935
2016-03-21 11:43:46 +00:00
Elena Demikhovsky 39a0020f2d Fixed -mcpu flag
"core-avx" does not exist; I changed to "nehalem"

llvm-svn: 263932
2016-03-21 11:06:20 +00:00
Simon Pilgrim 4af44f3c13 [X86][SSE] Add vector integer division by constant tests
Expanded tests and split into sdiv/srem and udiv/urem cases for 128 and 256 bit vectors.

llvm-svn: 263917
2016-03-20 21:46:58 +00:00
Jingyue Wu 1375560bdb [NVPTX] Adds a new address space inference pass.
Summary:
The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable
to convert the address space of a pointer induction variable. This patch adds a
new pass called NVPTXInferAddressSpaces that overcomes that limitation using a
fixed-point data-flow analysis (see the file header comments for details).

The new pass is experimental and not enabled by default. Users can turn
it on by setting the -nvptx-use-infer-addrspace flag of llc.

Reviewers: jholewinski, tra, jlebar

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D17965

llvm-svn: 263916
2016-03-20 20:59:20 +00:00
Simon Pilgrim c44472a5bc [X86][SSE] Detect zeroable shuffle elements from different value types
Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask.

Differential Revision: http://reviews.llvm.org/D14261

llvm-svn: 263906
2016-03-20 15:45:42 +00:00
Igor Breger 3ea8af5108 AVX512BW: Enable v32i1/v64i1 BUILD_VECTOR
Differential Revision: http://reviews.llvm.org/D18211

llvm-svn: 263898
2016-03-20 13:09:43 +00:00
Manman Ren a3a019cf90 [CXX_FAST_TLS] Fix issues in ARM.
We need to be careful on which registers can be explicitly handled
via copies. Prologue, Epilogue use physical registers and if one belongs
to the set of CSRsViaCopy, it will no longer be CSRed, since PEI overwrites
it after the explicit copies.

llvm-svn: 263857
2016-03-18 23:44:37 +00:00
Manman Ren 4865d89653 [CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.
Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller
and callee have mismatched calling conventions.

llvm-svn: 263856
2016-03-18 23:41:51 +00:00
Manman Ren 2828c57b6f [CXX_FAST_TLS] fix issues with O0 on ARM, AArch64 and X86.
Since at O0, explicit copies via SplitCSR may not be removed even if
they are unnecessary, we choose not to use SplitCSR at O0.

llvm-svn: 263855
2016-03-18 23:38:49 +00:00
Alexei Starovoitov 7e453bb8be BPF: emit an error message for unsupported signed division operation
Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@fb.com>
llvm-svn: 263842
2016-03-18 22:02:47 +00:00
Chad Rosier cdfd7e7201 [AArch64] Enable more load clustering in the MI Scheduler.
This patch adds unscaled loads and sign-extend loads to the TII
getMemOpBaseRegImmOfs API, which is used to control clustering in the MI
scheduler. This is done to create more opportunities for load pairing.  I've
also added the scaled LDRSWui instruction, which was missing from the scaled
instructions. Finally, I've added support in shouldClusterLoads for clustering
adjacent sext and zext loads that too can be paired by the load/store optimizer.

Differential Revision: http://reviews.llvm.org/D18048

llvm-svn: 263819
2016-03-18 19:21:02 +00:00
Nicolai Haehnle 95e8ffd398 AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format
Summary:
Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before
the frontend patches land in Mesa. Eventually, we may want to automatically
reduce the size of loads at the LLVM IR level, which requires such overloads,
and in some cases Mesa can generate them directly.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18255

llvm-svn: 263792
2016-03-18 16:24:40 +00:00
Nicolai Haehnle ad63638f6d AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics
Summary:
These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used
by Mesa to implement atomics with buffer semantics. The intrinsic interface
matches that of buffer.load.format and buffer.store.format, except that the
GLC bit is not exposed (it is automatically deduced based on whether the
return value is used).

The change of hasSideEffects is required for TableGen to accept the pattern
that matches the intrinsic.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, rivanvx, llvm-commits

Differential Revision: http://reviews.llvm.org/D18151

llvm-svn: 263791
2016-03-18 16:24:31 +00:00
Nicolai Haehnle 3003ba00a3 AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.

Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18218

llvm-svn: 263790
2016-03-18 16:24:20 +00:00
Sam Kolton a74cd526e9 [AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3
Review: http://reviews.llvm.org/D18267
llvm-svn: 263789
2016-03-18 15:35:51 +00:00
Tim Shen 5cdf75084a [PPC, FastISel] Fix ordered/unordered fcmp
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.

More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.

Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.

llvm-svn: 263753
2016-03-17 22:27:58 +00:00
Tim Northover 498c56c240 ARM: stop asserting on weird <3 x Ty> vectors in ISelLowering.
llvm-svn: 263741
2016-03-17 20:10:28 +00:00
Petar Jovanovic 0b44f24033 [PowerPC] Disable CTR loops optimization for soft float operations
This patch prevents CTR loops optimization when using soft float operations
inside loop body. Soft float operations use function calls, but function
calls are not allowed inside CTR optimized loops.

Patch by Aleksandar Beserminji.

Differential Revision: http://reviews.llvm.org/D17600

llvm-svn: 263727
2016-03-17 17:11:33 +00:00
Derek Schuff d4207ba0f6 [WebAssembly] Stackify code emitted by eliminateFrameIndex and SP writeback
Summary:
MRI::eliminateFrameIndex can emit several instructions to do address
calculations; these can usually be stackified. Because instructions with
FI operands can have subsequent operands which may be expression trees,
find the top of the leftmost tree and insert the code before it, to keep
the LIFO property.

Also use stackified registers when writing back the SP value to memory
in the epilog; it's unnecessary because SP will not be used after the
epilog, and it results in better code.

Differential Revision: http://reviews.llvm.org/D18234

llvm-svn: 263725
2016-03-17 17:00:29 +00:00
Changpeng Fang 234fcb81d3 AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute
Symmary:
  ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed.

Reviewers
  arsenm, tstellarAMD

Subscribers
  llvm-commits, arsenm

Differential Revision:
  http://reviews.llvm.org/D18197

llvm-svn: 263720
2016-03-17 16:43:50 +00:00
Simon Pilgrim 0f37fbac51 [X86][SSE] Simplified blend-with-zero combining
We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines

This patch stops the combine if we already have a blend in place (means we miss some domain corrections)

llvm-svn: 263717
2016-03-17 15:59:36 +00:00
Saleem Abdulrasool 071a099102 ARM: Revert SVN r253865, 254158, fix windows division
The two changes together weakened the test and caused a regression with division
handling in MSVC mode.  They were applied to avoid an assertion being triggered
in the block frequency analysis.  However, the underlying problem was simply
being masked rather than solved properly.  Address the actual underlying problem
and revert the changes.  Rather than analyze the cause of the assertion, the
division failure was assumed to be an overflow.

The underlying issue was a subtle bug in the BB construction in the emission of
the div-by-zero check (WIN__DBZCHK).  We did not construct the proper successor
information in the basic blocks, nor did we update the PHIs associated with the
basic block when we split them.  This would result in assertions being triggered
in the block frequency analysis pass.

Although the original tests are being removed, the tests themselves performed
very little in terms of validation but merely tested that we did not assert when
generating code.  Update this with new tests that actually ensure that we do not
regress on the code generation.

llvm-svn: 263714
2016-03-17 14:10:49 +00:00
Nicolai Haehnle ef160de3e5 AMDGPU: Prevent uniform loops from becoming infinite
Summary:
Uniform loops where the branch leaving the loop is predicated on VCCNZ
must be skipped if EXEC = 0, otherwise they will be infinite.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18137

llvm-svn: 263658
2016-03-16 20:14:33 +00:00
Simon Pilgrim 53f27fc00f [X86] Reduced alignment of widened vector load/stores to better match PR26953 cases
llvm-svn: 263649
2016-03-16 18:32:44 +00:00
Simon Pilgrim 60228bdb80 [X86] Regenerated + extended widened vector conversion tests
- Ensure we test X86 + X64
- sitopfp / uitofp requires testing for SSE2 and SSE42 as well (part of the fix for PR26953)

llvm-svn: 263640
2016-03-16 15:33:43 +00:00
Igor Breger 0ba7b04f5f AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for 512bit vector because PCMPGT supported only for 128/256bit.
Differential Revision: http://reviews.llvm.org/D18204

llvm-svn: 263624
2016-03-16 08:48:26 +00:00
Simon Pilgrim 39d2411c3b [X86] Regenerated widen load tests
llvm-svn: 263608
2016-03-16 00:41:21 +00:00
Simon Pilgrim f7cb16f7de [X86][SSE41] Additional tests for extracting zeroable shuffle elements
We can currently only match zeroable vector elements of the same size as the shuffle type - these tests demonstrate the problem and a solution will be shortly added in an updated D14261

llvm-svn: 263606
2016-03-16 00:13:36 +00:00
Quentin Colombet fdc838e97f [MIR] Add a test case for the diagnostic of a wrongly typed generic instruction
llvm-svn: 263573
2016-03-15 18:31:29 +00:00
Quentin Colombet a3f2abad55 [AArch64] Move GlobalISel test cases into a GlobalISel subdirectory
llvm-svn: 263572
2016-03-15 18:30:00 +00:00
Changpeng Fang 01f6062227 AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS
Summary:
  Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.

Reviewers:
    arsenm
    tstellarAMD

Subscribers
    llvm-commits, arsenm

Differential Revision:
   http://reviews.llvm.org/D18064

llvm-svn: 263563
2016-03-15 17:28:44 +00:00
Lang Hames 1b640e05ba [MachO] Add MachO alt-entry directive support.
This patch adds support for the MachO .alt_entry assembly directive, and uses
it for global aliases with non-zero GEP offsets. The alt_entry flag indicates
that a symbol should be layed out immediately after the preceding symbol.
Conceptually it introduces an alternate entry point for a function or data
structure. E.g.:

safe_foo:
  // check preconditions for foo
.alt_entry fast_foo
fast_foo:
  // body of foo, can assume preconditions.

The .alt_entry flag is also implicitly set on assembly aliases of the form:

a = b + C

where C is a non-zero constant, since these have the same effect as an
alt_entry symbol: they introduce a label that cannot be moved relative to the
preceding one. Setting the alt_entry flag on aliases of this form fixes
http://llvm.org/PR25381.

llvm-svn: 263521
2016-03-15 01:43:05 +00:00
Eric Christopher da8b3f1914 Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND on
pre-SSE41 hardware" as it seems to be causing crashes during code
generation in halide. PR forthcoming.

This reverts commit r263303.

llvm-svn: 263512
2016-03-14 23:59:57 +00:00
Chad Rosier 6d98655070 [AArch64] Break the dependency between FP and SP when possible.
When the SP in not changed because of realignment/VLAs etc., we restore the SP
by using the previous value of SP and not the FP. Breaking the dependency will
help in cases when the epilog of a callee is close to the epilog of the caller;
for then "sub sp, fp, #" depends on the load restoring the FP in the epilog of
the callee.

http://reviews.llvm.org/D18060
Patch by Aditya Kumar and Evandro Menezes.

llvm-svn: 263458
2016-03-14 18:17:41 +00:00
Tom Stellard 331f981cc9 AMDGPU/SI: Handle wait states required for DPP instructions
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17543

llvm-svn: 263447
2016-03-14 17:05:56 +00:00
Sanjay Patel 62d707c8d9 [x86, AVX] replace masked load with full vector load when possible
Converting masked vector loads to regular vector loads for x86 AVX should always be a win.
I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any
objections.

1. x86 already does this kind of optimization for multiple scalar loads -> vector load.
2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner.

Differential Revision: http://reviews.llvm.org/D18094

llvm-svn: 263446
2016-03-14 16:54:43 +00:00
Daniel Sanders e8efff373a [mips] MIPS32R6 compact branch support
Summary:
MIPSR6 introduces a class of branches called compact branches. Unlike the
traditional MIPS branches which have a delay slot, compact branches do not
have a delay slot. The instruction following the compact branch is only
executed if the branch is not taken and must not be a branch.

It works by generating compact branches for MIPS32R6 when the delay slot
filler cannot fill a delay slot. Then, inspecting the generated code for
forbidden slot hazards (a compact branch with an adjacent branch or other
CTI) and inserting nops to clear this hazard.

Patch by Simon Dardis.

Reviewers: vkalintiris, dsanders

Subscribers: MatzeB, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16353

llvm-svn: 263444
2016-03-14 16:24:05 +00:00
Marek Olsak ed2213e6ef AMDGPU/SI: Incomplete shader binaries need to finish execution at the end
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D18058

llvm-svn: 263441
2016-03-14 15:57:14 +00:00
Ulrich Weigand cdce026b4d [SystemZ] Avoid LER on z13 due to partial register dependencies
On the z13, it turns out to be more efficient to access a full
floating-point register than just the upper half (as done e.g.
by the LE and LER instructions).

Current code already takes this into account when loading from
memory by using the LDE instruction in place of LE.  However,
we still generate LER, which shows the same performance issues
as LE in certain circumstances.

This patch changes the back-end to emit LDR instead of LER to
implement FP32 register-to-register copies on z13.

llvm-svn: 263431
2016-03-14 13:50:03 +00:00
Zlatko Buljan fba68931ed [mips] Fix an issue with long double when function roundl is defined
Differential Revision: http://reviews.llvm.org/D17760

llvm-svn: 263428
2016-03-14 12:50:23 +00:00
Igor Breger a949100532 AVX512: icmp operation should be always lowered to CMPM (AVX-512) instruction on SKX.
implemented by delena

Differential Revision: http://reviews.llvm.org/D18054

llvm-svn: 263417
2016-03-14 10:26:39 +00:00
Simon Pilgrim 9b7aaafc6a [X86][XOP] Added target shuffle combine tests for XOP's VPPERM 2-op shuffle
Actual combing support will be added in a future patch

llvm-svn: 263402
2016-03-14 00:18:26 +00:00
Simon Pilgrim 5f1326f8cc [X86][SSE] Added truncated vector arithmetic tests.
For cases where we are truncating an integer vector arithmetic result, it may be better to pre-truncate the input operands - no code to support this yet (scalar is done with SimplifyDemandedBits but adding vector support could be a lot of work) but these tests represent the current codegen status.

Example bugs: PR14666, PR22703

llvm-svn: 263384
2016-03-13 19:08:01 +00:00
Simon Pilgrim 035b19ecf5 [X86][SSE41] Avoid variable blend for constant v8i16 shifts
The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.

llvm-svn: 263383
2016-03-13 18:35:59 +00:00
Nemanja Ivanovic bd56e4e25a Fix for PR 26378
This patch corresponds to review:
http://reviews.llvm.org/D17712

We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).

llvm-svn: 263338
2016-03-12 10:23:07 +00:00
Quentin Colombet cf9732b417 [X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer.
cmpxchg[8|16]b uses RBX as one of its argument.
In other words, using this instruction clobbers RBX as it is defined to hold one
the input. When the backend uses dynamically allocated stack, RBX is used as a
reserved register for the base pointer. 

Reserved registers have special semantic that only the target understands and
enforces, because of that, the register allocator don’t use them, but also,
don’t try to make sure they are used properly (remember it does not know how
they are supposed to be used).

Therefore, when RBX is used as a reserved register but defined by something that
is not compatible with that use, the register allocator will not fix the
surrounding code to make sure it gets saved and restored properly around the
broken code. This is the responsibility of the target to do the right thing with
its reserved register.

To fix that, when the base pointer needs to be preserved, we use a different
pseudo instruction for cmpxchg that save rbx.
That pseudo takes two more arguments than the regular instruction:
- One is the value to be copied into RBX to set the proper value for the
  comparison.
- The other is the virtual register holding the save of the value of RBX as the
  base pointer. This saving is done as part of isel (i.e., we emit a copy from
  rbx).

cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp

This gets expanded into:
rbx = copy input_for_rbx_reg
cmpxchg <regular cmpxchg args>
rbx = save_of_rbx_as_bp

Note: The actual modeling of the pseudo is a bit more complicated to make sure
the interferes that appears after the pseudo gets expanded are properly modeled
before that expansion.

This fixes PR26883.

llvm-svn: 263325
2016-03-12 02:25:27 +00:00
Simon Pilgrim 33d57c7547 [X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 263303
2016-03-11 22:18:05 +00:00
Ahmed Bougacha 171f7b9986 [AArch64] Don't blindly lower f16/f128 FCCMPs.
Instead, extend f16 (like we do when lowering a standalone SETCC),
and let f128 be legalized to the RT calls.

Fixes PR26803.

llvm-svn: 263301
2016-03-11 22:02:58 +00:00
Chad Rosier e6a5231628 Update test case to appease bots after 263255.
I'll follow up with Matt to confirm this is the correct fix.

llvm-svn: 263268
2016-03-11 17:33:36 +00:00
Quentin Colombet dd4b137364 [IRTranslator] Translate unconditional branches.
llvm-svn: 263265
2016-03-11 17:28:03 +00:00
Simon Pilgrim 7ca9614c71 [X86][AVX] Fixed issue where a long chain of shuffles could attempt to combine to a single (illegal) PSHUFB instruction.
Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases.

llvm-svn: 263239
2016-03-11 14:39:10 +00:00
Vasileios Kalintiris e2cbc21b6f [mips] MIPSR6 Instruction itineraries
Summary: Defines instruction itineraries for common MIPSR6 instructions.

Patch by Simon Dardis.

Reviewers: vkalintiris

Subscribers: MatzeB, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D17198

llvm-svn: 263229
2016-03-11 13:05:06 +00:00
Nikolay Haustov 6560781c4f [AMDGPU] Assembler: change v_madmk operands to have same order as mad.
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.

Differential Revision: http://reviews.llvm.org/D17984

llvm-svn: 263212
2016-03-11 09:27:25 +00:00
Matt Arsenault 9a19c240c0 AMDGPU: Materialize sign bits with bfrev
If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.

llvm-svn: 263201
2016-03-11 07:42:49 +00:00
Tim Northover 6092de5075 AArch64: only try to use scaled fcvt ops on legal vector types.
Before we ended up calling getSimpleVectorType on a <3 x float>, which
asserted.

llvm-svn: 263169
2016-03-10 23:02:21 +00:00
Simon Pilgrim 61eb49e437 [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).

Differential Revision: http://reviews.llvm.org/D17691

llvm-svn: 263159
2016-03-10 20:40:26 +00:00
Artur Pilipenko 3c8fc57e16 Support arbitrary addrspace pointers in masked load/store intrinsics
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270

llvm-svn: 263158
2016-03-10 20:39:22 +00:00
Balaram Makam 4058e8fbed Fix testicase to turn buildbot green. NFC.
llvm-svn: 263154
2016-03-10 19:07:50 +00:00
Nicolai Haehnle b142770bfe AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.

The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.

For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.

BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17277

llvm-svn: 263140
2016-03-10 18:43:50 +00:00
Michael Kuperstein 8be8de6d62 [X86] Correctly select registers to pop into for x86_64
When trying to replace an add to esp with pops, we need to choose dead
registers to pop into. Registers clobbered by the call and not imp-def'd
by it should be safe. Except that it's not enough to check the register
itself isn't defined, we also need to make sure no overlapping registers
are defined either.

This fixes PR26711.

Differential Revision: http://reviews.llvm.org/D18029

llvm-svn: 263139
2016-03-10 18:43:21 +00:00
Balaram Makam e9b2725287 [AArch64] Optimize compare and branch sequence when the compare's constant operand is power of 2
Summary:
Peephole optimization that generates a single TBZ/TBNZ instruction
for test and branch sequences like in the example below. This handles
the cases that miss folding of AND into TBZ/TBNZ during ISelLowering of BR_CC

Examples:
   and  w8, w8, #0x400
   cbnz w8, L1
 to
   tbnz w8, #10, L1

Reviewers: MatzeB, jmolloy, mcrosier, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17942

llvm-svn: 263136
2016-03-10 17:54:55 +00:00
Sanjay Patel a333dcfc42 give regression test a meaningful name
llvm-svn: 263135
2016-03-10 17:52:19 +00:00
Alexandros Lamprineas 843164242e [ARM] Cortex-R8 support
This patch adds Cortex-R8 to Target Parser and TableGen.
It also adds CodeGen tests for the build attributes.

Patch by Pablo Barrio.

Differential Revision: http://reviews.llvm.org/D17925

llvm-svn: 263132
2016-03-10 17:38:41 +00:00
Changpeng Fang 278a5b31a5 AMDGPU/SI: Define S_GETREG Intrinsic
Summary:
 Define s_getreg intrinsic to generate s_getreg instruction to read
hardware registers.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17892

llvm-svn: 263124
2016-03-10 16:47:15 +00:00
Saleem Abdulrasool 1632fe1f77 ARM: follow up improvements for SVN r263118
The initial change was insufficiently complete for always getting the semantics
of __builtin_longjmp correct.  The builtin is translated into a
`tInt_eh_sjlj_longjmp` DAG node.  This node set R7 as clobbered.  However, the
code would then follow up with a clobber of R11.  I had failed to notice the
imp-def,kill on R7 in the isel.  Unfortunately, it seems that it is not possible
to conditionalise the Defs list via an !if.  Instead, construct a new parallel
WIN node and prefer that when targeting windows.  This ensures that we now both
correctly model the __builtin_longjmp as well as construct the frame in a more
ABI conformant manner.

llvm-svn: 263123
2016-03-10 16:26:37 +00:00
David L Kreitzer 14f0077f38 Unified the handling of returns in the X87 stackifier so that the stackifier
runs successfully on routines containing IRETs. This fixes PR26410.

Differential Revision: http://reviews.llvm.org/D17643

llvm-svn: 263120
2016-03-10 15:14:02 +00:00
Saleem Abdulrasool 8b30f9854e ARM: correct __builtin_longjmp on WoA
WoA uses r11 as the FP even though it is a pure thumb-2 environment in contrast
to AAPCS which states r7.  This adjusts __builtin_longjmp to not clobber r7 and
to properly restore the frame pointer on execution.

llvm-svn: 263118
2016-03-10 15:11:09 +00:00
Elena Demikhovsky cd9967d160 AVX-512: Fixed a bug in i1 vector zero extending. (Skylake-avx512)
(failed on instruction selection phase)

Differential Revision: http://reviews.llvm.org/D17924

llvm-svn: 263111
2016-03-10 13:44:22 +00:00
Simon Pilgrim 13d4056795 [X86][AVX] Improve target shuffle combining of BLEND+zero
The BLEND+zero combine was failing to combine equivalent BLEND masks.

Follow up to D17483 and D17858

llvm-svn: 263105
2016-03-10 11:50:15 +00:00
Simon Pilgrim 16d11785a5 [X86][SSE] Basic combining of unary target shuffles of binary target shuffles.
This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask.

This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining.

There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for.

Differential Revision: http://reviews.llvm.org/D17858

llvm-svn: 263102
2016-03-10 11:23:51 +00:00
Elena Demikhovsky 38f78a2b92 AVX-512: Fixed a bug in shuffle for v64i8 type
Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on".

Differential Revision: http://reviews.llvm.org/D17994

llvm-svn: 263097
2016-03-10 08:32:09 +00:00
Roman Levenstein 2792b3f02f Add support for a preserve_most calling convention to the AArch64 backend.
This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64.

There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention.

Differential Revision: http://reviews.llvm.org/D18016

llvm-svn: 263092
2016-03-10 04:35:09 +00:00
Sanjay Patel 4a8dd89128 [x86, AVX] optimize masked loads with constant masks
Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper.

Differential Revision: http://reviews.llvm.org/D17899

llvm-svn: 263067
2016-03-09 22:12:08 +00:00
Kit Barton a1d6a6f1de [PPC] backend changes to generate xvabs[s,d]p and xvnabs[s,d]p instructions
This has to be committed before the FE changes

Phabricator: http://reviews.llvm.org/D17837
llvm-svn: 263035
2016-03-09 17:48:01 +00:00
Tom Stellard 9f2e00de7b SelectionDAG: Fix a crash on inline asm when output register supports multiple types
Summary:
The code in SelectionDAG did not handle the case where the
register type and output types were different, but had the same size.

Reviewers: arsenm, echristo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17940

llvm-svn: 263022
2016-03-09 16:02:52 +00:00
Sam Kolton dfa29f7c5b [AMDGPU] Assembler: Support DPP instructions.
Supprot DPP syntax as used in SP3 (except several operands syntax).
Added dpp-specific operands in td-files.
Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter.
Support for VOP2 DPP instructions in td-files.
Some tests for DPP instructions.

ToDo:
  - VOP2bInst:
    - vcc is considered as operand
    - AsmMatcher doesn't apply mnemonic aliases when parsing operands
  - v_mac_f32
  - v_nop
  - disable instructions with 64-bit operands
  - change dpp_ctrl assembler representation to conform sp3

Review: http://reviews.llvm.org/D17804
llvm-svn: 263008
2016-03-09 12:29:31 +00:00
Dan Gohman d7a2eea619 [WebAssembly] Implement irreducible control flow.
This implements a very simple conservative transformation that doesn't
require more than linear code size growth. There's room for much more
optimization in this space.

llvm-svn: 262982
2016-03-09 02:01:14 +00:00
Chad Rosier 2a70624403 [AArch64] Disable the MI scheduler to turn bots green after r262942.
llvm-svn: 262944
2016-03-08 17:33:34 +00:00
Quentin Colombet 4340b55593 Revert r262759 and r262760.
The fix consisting in using the library call for atomic compare and swap when
the instruction is not safe to use may be incorrect. Indeed the library call may
not exist on all platform. In other words, we need a better fix! 

llvm-svn: 262943
2016-03-08 17:29:11 +00:00
Hans Wennborg e00b6e7249 Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG"
This caused PR26870.

llvm-svn: 262935
2016-03-08 16:21:41 +00:00
Igor Breger 999ac754f2 AVX512: Add extract_subvector patterns v8i1->v4i1 , v4i1->v2i1.
Differential Revision: http://reviews.llvm.org/D17953

llvm-svn: 262929
2016-03-08 15:21:25 +00:00
Simon Pilgrim d8ac7c9f2d [X86] Regenerated vector float extension tests
llvm-svn: 262919
2016-03-08 09:17:12 +00:00
Dan Gohman 1402606477 [WebAssembly] Update for spec change from tableswitch to br_table.
Also note that the operand order changed; the default label is now listed
after the regular labels.

llvm-svn: 262903
2016-03-08 03:18:12 +00:00
Quentin Colombet dca821683c [AArch64][GlobalISel] Add a test case for the IRTranslator.
llvm-svn: 262898
2016-03-08 01:48:08 +00:00
Quentin Colombet 050b211820 [MIR] Teach the parser/printer that generic virtual registers do not need a register class.
llvm-svn: 262893
2016-03-08 01:17:03 +00:00
Quentin Colombet 287c6bb571 [MIR] Teach the parser how to parse complex types of generic machine instructions.
By complex types, I mean aggregate or vector types.

llvm-svn: 262890
2016-03-08 00:57:31 +00:00
Quentin Colombet 12350a8e13 [MIR] Print the type of generic machine instructions.
llvm-svn: 262880
2016-03-08 00:29:15 +00:00
Quentin Colombet 851996778f [MIR] Teach the mir parser about types on generic machine instructions.
llvm-svn: 262879
2016-03-08 00:20:48 +00:00
Sanjay Patel 8c84f74f3a [x86] add test to show missing optimization
This should make it clearer how this proposed patch:
http://reviews.llvm.org/D11393
...will change codegen.

llvm-svn: 262875
2016-03-07 23:13:06 +00:00
Sanjay Patel 55c0dd4b26 [x86] simplify test and tighten checks
I noticed this test as part of:
http://reviews.llvm.org/D11393
...which is confusing enough as-is. 
Let's show the exact codegen, so the changes will be more obvious.

llvm-svn: 262874
2016-03-07 22:53:23 +00:00
Quentin Colombet 4e14a497a3 [MIR] Teach the MIPrinter about size for generic virtual registers.
llvm-svn: 262867
2016-03-07 21:57:52 +00:00
Matt Arsenault c89f2919a4 AMDGPU: Match more med3 integer patterns
llvm-svn: 262864
2016-03-07 21:54:48 +00:00
Quentin Colombet 2a831fb826 [MIR] Teach the parser how to handle the size of generic virtual registers.
llvm-svn: 262862
2016-03-07 21:48:43 +00:00
Simon Pilgrim 253ca348b2 [X86][AVX512] Fixed VPERMT2* shuffle mask decoding and enabled target shuffle combining.
Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles.

This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed.

Removed non-constant pool mask decode path as we have no way of testing it right now.

Differential Revision: http://reviews.llvm.org/D17916

llvm-svn: 262809
2016-03-06 21:54:52 +00:00
Igor Breger 4d94d4d5f7 AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element types on SKX
Differential Revision: http://reviews.llvm.org/D17913

llvm-svn: 262803
2016-03-06 12:38:58 +00:00
Igor Breger f1bd761e00 AVX512: Remove VSHRI kmask patterns from TD file. It is incorrect to use kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector).
Differential Revision: http://reviews.llvm.org/D17763

llvm-svn: 262797
2016-03-06 07:46:03 +00:00
Simon Pilgrim 40e1a71cdd [X86][AVX] Improved VPERMILPS variable shuffle mask decoding.
Added support for decoding VPERMILPS variable shuffle masks that aren't in the constant pool.

Added target shuffle mask decoding for SCALAR_TO_VECTOR+VZEXT_MOVL cases - these can happen for v2i64 constant re-materialization

Followup to D17681

llvm-svn: 262784
2016-03-05 22:53:31 +00:00
Matthias Braun 4797ec95e4 RegisterCoalescer: Remap subregister lanemasks before exchanging operands
Rematerializing and merging into a bigger register class at the same
time, requires the subregister range lanemasks getting remapped to the
new register class.

This fixes http://llvm.org/PR26805

llvm-svn: 262768
2016-03-05 04:36:13 +00:00
Quentin Colombet 2a7676b442 [X86] Fix the lowering of setjmp intrinsic on i386.
When the lowering of the setjmp intrinsic requires
a global base pointer to be set, make sure such pointer
gets defined by the CGBR pass.

This fixes PR26742.

llvm-svn: 262762
2016-03-05 00:31:04 +00:00
Quentin Colombet fb5be7a37f Add missing triple in my previous commit!
llvm-svn: 262760
2016-03-04 23:36:32 +00:00
Quentin Colombet 13b524597d [X86] Do not use cmpxchgXXb when we need the base pointer (RBX).
cmpxchgXXb uses RBX as one of its implicit argument. I.e., when
we use that instruction we need to clobber RBX. This is generally
fine, expect when RBX is a reserved register because in that case,
the register allocator will not track its value and will not
save and restore it when interferences occur.

rdar://problem/24851412

llvm-svn: 262759
2016-03-04 23:29:39 +00:00
Sanjay Patel 216b275994 [x86] add tests for masked loads with constant masks
llvm-svn: 262758
2016-03-04 23:28:07 +00:00
David Majnemer d2f767d2f6 [X86] Support cleaning more than 2**16 bytes of stack
The x86 ret instruction has a 16 bit immediate indicating how many bytes
to pop off of the stack beyond the return address.

There is a problem when extremely large structs are passed by value: we
might not be able to fit the number of bytes to pop into the return
instruction.

To fix this, expand RET_FLAG a little later and use a special sequence
to clean the stack:

pop  %ecx     ; return address is now in %ecx
add  $n, %esp ; clean the stack
push %ecx     ; bring the return address back on the stack
ret           ; pop the return address and jmp to it's value

llvm-svn: 262755
2016-03-04 22:56:17 +00:00
Michael Kuperstein b89f0fa2a2 [DAGCombine] Fix divrem combine not to assume div/rem type is simple.
The divrem combine assumed the type of the div/rem is simple, which isn't
necessarily true. This probably worked fine until r250825, since it only
saw legal types, but now breaks when it runs as a pre-type-legalization 
combine.

This fixes PR26835.

Differential Revision: http://reviews.llvm.org/D17878

llvm-svn: 262746
2016-03-04 21:23:29 +00:00
Renato Golin 175c6d6d95 [ARM] Merging 64-bit divmod lib calls into one
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.

This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.

Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make
sure we only emit valid types or the ones that were explicitly marked as custom.
Now, passing check-all and test-suite on x86, ARM and AArch64.

This patch fixes PR17193 (and a long time FIXME in the tests).

llvm-svn: 262738
2016-03-04 19:19:36 +00:00
Tom Stellard 649b5db557 AMDGPU/SI: Add support for spiling SGPRs to scratch buffer
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17592

llvm-svn: 262732
2016-03-04 18:31:18 +00:00
Zoran Jovanovic a68b67d1ed [mips][microMIPS] Prevent usage of OR16_MMR6 instruction when code for microMIPS is generated.
Author: milena.vujosevic.janicic
Reviewers: dsanders
Differential Revision: http://reviews.llvm.org/D17373

llvm-svn: 262725
2016-03-04 17:34:31 +00:00
Simon Pilgrim 3c7e94208a [X86][AVX512] Added some basic X86ISD::VPERMV3 shuffle combining tests
None of these actually combine yet as we haven't enabled X86ISD::VPERMV3 for target shuffle combining

llvm-svn: 262718
2016-03-04 15:19:42 +00:00
Simon Pilgrim b4b90fb8d6 [X86][SSSE3] Added combine test for unary shuffle (pshufb) only referencing elements from the second input of a binary shuffle (punpcklbw)
llvm-svn: 262710
2016-03-04 11:15:23 +00:00
Nikolay Haustov 5bf46ac150 AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsics
These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the
GL_ARB_shader_image_load_store extension.

Initial change by Nicolai H.hnle

Differential Revision: http://reviews.llvm.org/D17401

llvm-svn: 262701
2016-03-04 10:39:50 +00:00
NAKAMURA Takumi f2b521ffc5 llvm/test/CodeGen/ARM/rem_crash.ll: Avoid unsupported targets to specify explicit triple.
We will see it for targeting win32;

  LLVM ERROR: CPU: 'generic' does not support ARM mode execution!

llvm-svn: 262668
2016-03-03 22:38:39 +00:00
Simon Pilgrim f33cb61471 [X86][AVX512BW] Fixed 512-bit PSHUFB shuffle mask decode and added combine test.
PSHUFB decoder was assuming that input was 128 or 256-bit vector only.

llvm-svn: 262661
2016-03-03 21:55:01 +00:00
Simon Pilgrim abcee45b7a [X86][AVX] Better support for the variable mask form of VPERMILPD/VPERMILPS
The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic.

This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle.

Differential Revision: http://reviews.llvm.org/D17681

llvm-svn: 262635
2016-03-03 18:13:53 +00:00
Ahmed Bougacha 671795a985 [X86] Don't assume that shuffle non-mask operands starts at #0.
That's not the case for VPERMV/VPERMV3, which cover all possible
combinations (the C intrinsics use a different order; the AVX vs
AVX512 intrinsics are different still).

Since:
  r246981 AVX-512: Lowering for 512-bit vector shuffles.
VPERMV is recognized in getTargetShuffleMask.

This breaks assumptions in most callers, as they expect
the non-mask operands to start at index 0.
VPERMV has the mask as operand #0; VPERMV3 has it in the middle.

Instead of the faulty assumption, have getTargetShuffleMask return
its operands as well.

One alternative we considered was to change the operand order of
VPERMV, but we agreed to stick to the instruction order, as there
are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular).

Differential Revision: http://reviews.llvm.org/D17041

llvm-svn: 262627
2016-03-03 16:53:50 +00:00
Sanjay Patel d6cb4ec2a2 [AArch64] fold 'isPositive' vector integer operations (PR26819)
This is one of the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26819

Shift and negate is what InstCombine prefers to produce (and I tried to make it do more of that
in http://reviews.llvm.org/rL262424 ), so we should recognize that pattern as something that might
come from autovectorization even if it's unlikely to be produced from C NEON intrinsics.

The patch is based on the x86 equivalent:
http://reviews.llvm.org/rL262036

Differential Revision: http://reviews.llvm.org/D17834

llvm-svn: 262623
2016-03-03 15:56:08 +00:00
Igor Breger 639fde79b0 AVX512: Combine AND + TESTM instructions .
Differential Revision: http://reviews.llvm.org/D17844

llvm-svn: 262621
2016-03-03 14:18:38 +00:00
Renato Golin f824ced6a1 Making rem_crash.ll target-specific
This test failed in some ARM bots after a divmod change because it was
running on a native llc, instead of targeted one. This makes sure the test
is target-specific (as intended), and also copies to ARM and AArch64
directories. If it is also supposed to work on other architectures, I'll
leave as an exercise to the respective maintainers.

llvm-svn: 262620
2016-03-03 14:01:10 +00:00
Simon Pilgrim 91dd0a796c [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Differential Revision: http://reviews.llvm.org/D17691

llvm-svn: 262599
2016-03-03 09:43:28 +00:00
Renato Golin 3d78271eac Revert "[ARM] Merging 64-bit divmod lib calls into one"
This reverts commit r262507, which broke some ARM buildbots.

llvm-svn: 262594
2016-03-03 08:57:44 +00:00
Michael Zuckerman c4d054fa4a [LLVM][AVX512] PSRLWI Chnage imm8 to int
Differential Revision: http://reviews.llvm.org/D17753

llvm-svn: 262592
2016-03-03 08:54:05 +00:00
Hans Wennborg 153e4b0f11 [X86] Enable forwarding bool arguments in tail calls (PR26305)
The code was previously not able to track a boolean argument
at a call site back to the formal argument of the caller.

Differential Revision: http://reviews.llvm.org/D17786

llvm-svn: 262575
2016-03-03 02:06:32 +00:00
Tim Shen 6e676a84ad [PPCVSXFMAMutate] Temporarily disable this pass
llvm-svn: 262573
2016-03-03 01:27:35 +00:00
Philip Reames 23d933982a [MBP] Avoid placing random blocks between loop preheader and header
If we have a loop with a rarely taken path, we will prune that from the blocks which get added as part of the loop chain. The problem is that we weren't then recognizing the loop chain as schedulable when considering the preheader when forming the function chain. We'd then fall to various non-predecessors before finally scheduling the loop chain (as if the CFG was unnatural.) The net result was that there could be lots of garbage between a loop preheader and the loop, even though we could have directly fallen into the loop. It also meant we separated hot code with regions of colder code.

The particular reason for the rejection of the loop chain was that we were scanning predecessor of the header, seeing the backedge, believing that was a globally more important predecessor (true), but forgetting to account for the fact the backedge precessor was already part of the existing loop chain (oops!.

Differential Revision: http://reviews.llvm.org/D17830

llvm-svn: 262547
2016-03-03 00:01:42 +00:00
David Majnemer 1ef654024f [X86] Don't give catch objects a displacement of zero
Catch objects with a displacement of zero do not initialize a catch
object.  The displacement is relative to %rsp at the end of the
function's prologue for x86_64 targets.

If we place an object at the top-of-stack, we will end up wit a
displacement of zero resulting in our catch object remaining
uninitialized.

Address this by creating our catch objects as fixed objects.  We will
ensure that the UnwindHelp object is created after the catch objects so
that no catch object will have a displacement of zero.

Differential Revision: http://reviews.llvm.org/D17823

llvm-svn: 262546
2016-03-03 00:01:25 +00:00
Sanjay Patel 840564973f [AArch64] add tests to demonstrate existing codegen for PR26819
llvm-svn: 262540
2016-03-02 23:22:03 +00:00