Commit Graph

33890 Commits

Author SHA1 Message Date
Krzysztof Parzyszek 921722049d [Hexagon] Generate MUX from conditional transfers when dot-new not possible
llvm-svn: 242711
2015-07-20 21:23:25 +00:00
Sanjoy Das 93d608c3c3 [ImplicitNullChecks] Work with implicit defs.
Summary:
This change generalizes the implicit null checks pass to work with
instructions that don't have any explicit register defs.  This lets us
use X86's `cmp` against memory as faulting load instructions.

Reviewers: reames, JosephTremoulet

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11286

llvm-svn: 242703
2015-07-20 20:31:39 +00:00
Chad Rosier 3da0ea7f5d [AArch64] Change EON pattern to match more often.
Phabricator: http://reviews.llvm.org/D11359
Patch by Geoff Berry <gberry@codeaurora.org>

llvm-svn: 242694
2015-07-20 18:42:27 +00:00
Tom Stellard 70580f83cc AMDGPU/SI: Add VI patterns to select FLAT instructions for global memory ops
Summary:
The MUBUF addr64 bit has been removed on VI, so we must use FLAT
instructions when the pointer is stored in VGPRs.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11067

llvm-svn: 242673
2015-07-20 14:28:41 +00:00
Vasileios Kalintiris 974d409259 [mips] Added support for the ERETNC instruction.
Summary: This required adding the instruction predicate HasMips32r5.

Patch by Scott Egerton.

Reviewers: dsanders, vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11136

llvm-svn: 242666
2015-07-20 12:28:56 +00:00
Simon Pilgrim e2c244f3b4 [X86][SSE] Reordered cast vectorization costs. NFCI.
Reordered the data tables at the top and placed the lookups after. The first stage in the yak shaving necessary to get more accurate costs for a variety of targets given the recent improvements to SINT_TO_FP/UINT_TO_FP/SIGN_EXTEND vector lowering.

llvm-svn: 242643
2015-07-19 15:36:12 +00:00
Michael Kuperstein 69e40a4c85 [X86] Add support for tbyte memory operand size for Intel-syntax x86 assembly
Differential Revision: http://reviews.llvm.org/D11257
Patch by: marina.yatsina@intel.com

llvm-svn: 242639
2015-07-19 11:03:08 +00:00
Simon Pilgrim ba51d116c4 Remove TargetInstrInfo::canFoldMemoryOperand
canFoldMemoryOperand is not actually used anywhere in the codebase - all existing users instead call foldMemoryOperand directly when they wish to fold and can correctly deduce what they need from the return value. 

This patch removes the canFoldMemoryOperand base function and the target implementations; only x86 had a real (bit-rotted) implementation, although AMDGPU had a preparatory stub that had never needed to be completed.

Differential Revision: http://reviews.llvm.org/D11331

llvm-svn: 242638
2015-07-19 10:50:53 +00:00
Elena Demikhovsky 17b906058e AVX-512: Floating point conversions for SKX - DAG Lowering.
SKX supports conversion for all FP types. Integer types include doublewords and quardwords.
I added "Legal" status for these nodes and a bunch of tests.
I added "NoVLX" for AVX DAG selection to force VLX instructions selection when VLX is supported.

Differential Revision: http://reviews.llvm.org/D11255

llvm-svn: 242637
2015-07-19 10:17:33 +00:00
Simon Pilgrim 59764dccfb [X86][SSE] Updated SHL/LSHR i64 vectorization costs.
This was missed in D8416.

llvm-svn: 242621
2015-07-18 20:06:30 +00:00
Benjamin Kramer 9a5d788948 [Hexagon] Use composition instead of inheritance from STL types
The standard containers are not designed to be inherited from, as
illustrated by the MSVC hacks for NodeOrdering. No functional change
intended.

llvm-svn: 242616
2015-07-18 17:43:23 +00:00
Matthias Braun 9e85980658 ARM: Enable MachineScheduler and disable PostRAScheduler for swift.
Reapply r242500 now that the swift schedmodel includes LDRLIT.

This is mostly done to disable the PostRAScheduler which optimizes for
instruction latencies which isn't a good fit for out-of-order
architectures. This also allows to leave out the itinerary table in
swift in favor of the SchedModel ones.

This change leads to performance improvements/regressions by as much as
10% in some benchmarks, in fact we loose 0.4% performance over the
llvm-testsuite for reasons that appear to be unknown or out of the
compilers control. rdar://20803802 documents the investigation of
these effects.

While it is probably a good idea to perform the same switch for the
other ARM out-of-order CPUs, I limited this change to swift as I cannot
perform the benchmark verification on the other CPUs.

Differential Revision: http://reviews.llvm.org/D10513

llvm-svn: 242588
2015-07-17 23:18:30 +00:00
Matthias Braun 141d1c9d8f ARM: Add scheduling information for LDRLIT instructions to swift scheduling model
These pseudo instructions are only lowered after register allocation and
are therefore still present when the machine scheduler runs.
Add a run: line to a testcase that uses the uncommon flags necessary to
actually produce a LDRLIT instruction on swift.

llvm-svn: 242587
2015-07-17 23:18:26 +00:00
Adam Nemet 5a6d5bc17b Revert "ARM: Enable MachineScheduler and disable PostRAScheduler for swift."
This reverts commit r242500.

It broke some internal tests and Matthias asked me to revert it while he
is investigating.

llvm-svn: 242553
2015-07-17 18:14:19 +00:00
James Molloy a6702e2f14 [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
No functional change, but it preps codegen for the future when SABSDIFF
will start getting generated in anger.

llvm-svn: 242546
2015-07-17 17:10:55 +00:00
James Molloy faf4e3c33b [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
No functional change, but it preps codegen for the future when SABSDIFF
will start getting generated in anger.

llvm-svn: 242545
2015-07-17 17:10:45 +00:00
Eli Bendersky b09cfb51ca Use inbounds GEPs for memcpy and memset lowering
Follow-up on discussion in http://reviews.llvm.org/D11220

llvm-svn: 242542
2015-07-17 16:42:33 +00:00
Tim Northover a5740e0874 AArch64: add comment missed out from earlier patch.
Helps explain some of the background behind this bit of code.

llvm-svn: 242503
2015-07-17 03:31:50 +00:00
Matthias Braun 2d8315f806 ARM: Enable MachineScheduler and disable PostRAScheduler for swift.
This is mostly done to disable the PostRAScheduler which optimizes for
instruction latencies which isn't a good fit for out-of-order
architectures. This also allows to leave out the itinerary table in
swift in favor of the SchedModel ones.

This change leads to performance improvements/regressions by as much as
10% in some benchmarks, in fact we loose 0.4% performance over the
llvm-testsuite for reasons that appear to be unknown or out of the
compilers control. rdar://20803802 documents the investigation of
these effects.

While it is probably a good idea to perform the same switch for the
other ARM out-of-order CPUs, I limited this change to swift as I cannot
perform the benchmark verification on the other CPUs.

Differential Revision: http://reviews.llvm.org/D10513

llvm-svn: 242500
2015-07-17 01:44:31 +00:00
Rafael Espindola 494a381ede Use small encodings for constants when possible.
llvm-svn: 242493
2015-07-17 00:57:52 +00:00
Matthias Braun da3d0d7342 Arm: Don't define a label twice with two setjmps in a function.
Constructing a name based on the function name didn't give us a unique
symbol if we had more than one setjmp in a function. Using
MCContext::createTempSymbol() always gives us a unique name.

Differential Revision: http://reviews.llvm.org/D9314

llvm-svn: 242482
2015-07-16 22:34:20 +00:00
Matthias Braun 3cd00c1739 Fix __builtin_setjmp in combination with sjlj exception handling.
llvm.eh.sjlj.setjmp was used as part of the SjLj exception handling
style but is also used in clang to implement __builtin_setjmp.  The ARM
backend needs to output additional dispatch tables for the SjLj
exception handling style, these tables however can't be emitted if
llvm.eh.sjlj.setjmp is simply used for __builtin_setjmp and no actual
landing pad blocks exist.

To solve this issue a new llvm.eh.sjlj.setup_dispatch intrinsic is
introduced which is used instead of llvm.eh.sjlj.setjmp in the SjLj
exception handling lowering, so we can differentiate between the case
where we actually need to setup a dispatch table and the case where we
just need the __builtin_setjmp semantic.

Differential Revision: http://reviews.llvm.org/D9313

llvm-svn: 242481
2015-07-16 22:34:16 +00:00
Simon Pilgrim 9152528917 Fix spelling. NFCI.
llvm-svn: 242448
2015-07-16 21:44:53 +00:00
Tim Northover ca0ffc3561 AArch64: make inexact signalling on round Darwin-specific
C11 leaves the choice on whether round-to-integer operations set the inexact
flag implementation-defined. Darwin does expect it to be set, but this seems to
be against the intent of the IEEE document and slower to implement anyway. So
it should be opt-in.

llvm-svn: 242446
2015-07-16 21:30:21 +00:00
Bill Schmidt 54cced54a6 [PowerPC] v4i32 is a VSRCRegClass
I was looking at some vector code generation and kept seeing
unnecessary vector copies into the Altivec half of the VSX registers.
I discovered that we overlooked v4i32 when adding the register classes
for VSX; we only added v4f32 and v2f64.  This means that anything that
canonicalizes into v4i32 (which is a LOT of stuff) ends up being
forced into VRRC on its way to VSRC.

The fix is one line.  The rest of the patch is fixing up some test
cases whose code generation has changed as a result.

This seems like it would be a good candidate for backport to 3.7.

llvm-svn: 242442
2015-07-16 21:14:07 +00:00
Eli Bendersky f871e09d2d Streamline the coding style in NVPTXLowerAggrCopies
Make the style consistent with LLVM style throughout and clang-format.

llvm-svn: 242439
2015-07-16 20:42:38 +00:00
Jingyue Wu e7981cee24 [NVPTX] enable SpeculativeExecution in NVPTX
Summary:
SpeculativeExecution enables a series straight line optimizations (such
as SLSR and NaryReassociate) on conditional code. For example,

  if (...)
    ... b * s ...
  if (...)
    ... (b + 1) * s ...

speculative execution can hoist b * s and (b + 1) * s from then-blocks,
so that we have

  ... b * s ...
  if (...)
    ...
  ... (b + 1) * s ...
  if (...)
    ...

Then, SLSR can rewrite (b + 1) * s to (b * s + s) because after
speculative execution b * s dominates (b + 1) * s.

The performance impact of this change is significant. It speeds up the
benchmarks running EigenFloatContractionKernelInternal16x16
(ba68f42fa6/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h?at=default#cl-526)
by roughly 2%. Some internal benchmarks that have the above code pattern
are improved by up to 40%. No significant slowdowns are observed on
Eigen CUDA microbenchmarks.

Reviewers: jholewinski, broune, eliben

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11201

llvm-svn: 242437
2015-07-16 20:13:48 +00:00
Matthias Braun af7d7709d6 AArch64: Implement conditional compare sequence matching.
This is a new iteration of the reverted r238793 /
http://reviews.llvm.org/D8232 which wrongly assumed that any and/or
trees can be represented by conditional compare sequences, however there
are some restrictions to that. This version fixes this and adds comments
that explain exactly what types of and/or trees can actually be
implemented as conditional compare sequences.

Related to http://llvm.org/PR20927, rdar://18326194

Differential Revision: http://reviews.llvm.org/D10579

llvm-svn: 242436
2015-07-16 20:02:37 +00:00
Tom Stellard 78655fcfdc AMDPGU/SI: Negative offsets aren't allowed in MUBUF's vaddr operand
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11226

llvm-svn: 242434
2015-07-16 19:40:09 +00:00
Tom Stellard c98ee20328 AMDPGU/SI: Use AssertZext node to mask high bit for scratch offsets
Summary:
We can safely assume that the high bit of scratch offsets will never
be set, because this would require at least 128 GB of GPU memory.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11225

llvm-svn: 242433
2015-07-16 19:40:07 +00:00
Pete Cooper f4ce569deb Revert "Add missing load/store flags to thumb2 instructions."
This reverts commit r242300.

This is causing buildbot failures which we are investigating.
I'll reapply once we know whats going on, but for now want to
get the bots green.

llvm-svn: 242428
2015-07-16 18:38:13 +00:00
Benjamin Kramer 5dfd8b6da0 [NVPTX] Don't leak dead instructions after unlinking them from the BasicBlock
llvm-svn: 242417
2015-07-16 16:51:48 +00:00
Eli Bendersky f14af16219 Correct lowering of memmove in NVPTX
This fixes https://llvm.org/bugs/show_bug.cgi?id=24056

Also a bit of refactoring along the way.

Differential Revision: http://reviews.llvm.org/D11220

llvm-svn: 242413
2015-07-16 16:27:19 +00:00
Tom Stellard ff5efb8c03 AMDGPU/R600: Remove unused variable
This fixes a warning introduced by r242410.

llvm-svn: 242412
2015-07-16 16:13:34 +00:00
Tom Stellard 1d46fb2d2f AMDPGU/R600: Replace llvm_unreachable() call with LLVMContext::emitError()
Summary:
This fixes an issue on MIPS where the infinite-loop-evergreen.ll test
was failing to terminate.

Fixes PR24147.

Reviewers: arsenm, dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11260

llvm-svn: 242410
2015-07-16 15:38:29 +00:00
Michael Kuperstein dadb847412 [X86] Reapply r240257 : "Allow more call sequences to use push instructions for argument passing"
This allows more call sequences to use pushes instead of movs when optimizing for size.
In particular, calling conventions that pass some parameters in registers (e.g. thiscall) are now supported.

This should no longer cause miscompiles, now that a bug in emitPrologue was fixed in r242395.

llvm-svn: 242398
2015-07-16 13:54:14 +00:00
Michael Kuperstein e1ea4e7d15 [X86] Fix emitPrologue() to make less assumptions about pushes
When X86FrameLowering::emitPrologue() looks for where to insert the %esp subtraction
to allocate stack space for local allocations, it assumes that any sequence of push
instructions that starts at function entry consists purely of spills of callee-save
registers.
This may be false, since from some point forward, the pushes may pushing arguments
to a subsequent function call.

This caused a miscompile that was exposed by r240257, and is not easily testable
since r240257 was reverted. A test will be committed separately after r240257 is
reapplied.

llvm-svn: 242395
2015-07-16 12:27:59 +00:00
Benjamin Kramer 7d54fab8f0 [Mips] Make helper function static, NFC.
llvm-svn: 242393
2015-07-16 11:12:05 +00:00
Mehdi Amini e029eae634 Add missing break in switch case in R600ISelLowering
Summary: Catched by coverity.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11120

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 242388
2015-07-16 06:23:12 +00:00
Mehdi Amini bd7287ebe5 Move most user of TargetMachine::getDataLayout to the Module one
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

This patch is quite boring overall, except for some uglyness in
ASMPrinter which has a getDataLayout function but has some clients
that use it without a Module (llmv-dsymutil, llvm-dwarfdump), so
some methods are taking a DataLayout as parameter.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11090

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 242386
2015-07-16 06:11:10 +00:00
Mehdi Amini 5c0fa58e91 Remove DataLayout from TargetLoweringObjectFile, redirect to Module
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11079

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 242385
2015-07-16 06:04:17 +00:00
Reid Kleckner 938bd6fc96 Revert "[X86] Allow more call sequences to use push instructions for argument passing"
It miscompiles some code and a reduced test case has been sent to the
author.

This reverts commit r240257.

llvm-svn: 242373
2015-07-16 01:30:00 +00:00
Akira Hatanaka 024d91a00b [ARM] Define a subtarget feature that is used to avoid using movt/movw
pairs for 32-bit immediates.

This change is needed to avoid emitting movt/movw pairs when doing LTO
and do so on a per-function basis.

Out-of-tree projects currently using cl::opt option -arm-use-movt=0 or
false to avoid emitting movt/movw pairs should make changes to add
subtarget feature "+no-movt" (see the changes made to clang in r242368).

rdar://problem/21529937

Differential Revision: http://reviews.llvm.org/D11026

llvm-svn: 242369
2015-07-16 00:58:23 +00:00
Pete Cooper e3c8161736 Clear kill flags in ARMLoadStoreOptimizer.
The pass here was clearing kill flags on instructions which had
their sources killed in the instruction being combined.  But
given that the new instruction is inserted after the existing ones,
any existing instructions with kill flags will lead to the verifier
complaining that we are reading an undefined physreg.

For example, what we had prior to this optimization is
	t2STRi12 %R1, %SP, 12
	t2STRi12 %R1<kill>, %SP, 16
	t2STRi12 %R0<kill>, %SP, 8

and prior to this fix that would generate
	t2STRi12 %R1<kill>, %SP, 16
	t2STRDi8 %R0<kill>, %R1, %SP, 8

This is clearly incorrect as it didn't clear the kill flag on R1
used with offset 16 because there was no kill flag on the instruction
with offset 12.

After this change we clear the kill flag on the offset 16 instruction
because we know it will be used afterwards in the new instruction.

I haven't provided a test case.  I have a small test, but even it is
very sensitive to register allocation order which isn't ideal.

llvm-svn: 242359
2015-07-16 00:09:18 +00:00
Matthias Braun 5d1f12d1f5 TargetRegisterInfo: Provide a way to check assigned registers in getRegAllocationHints()
Pass a const reference to LiveRegMatrix to getRegAllocationHints()
because some targets can prodive better hints if they can test whether a
physreg has been used for register allocation yet.

llvm-svn: 242340
2015-07-15 22:16:00 +00:00
Bruno Cardoso Lopes ad61f34293 Revert "Look through PHIs to find additional register sources"
Likely broke compilation on ARM:

http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/13054

This reverts commit 131ce4a838c081516cbfed039fc986b33e3979d6.

llvm-svn: 242310
2015-07-15 18:10:35 +00:00
Pete Cooper 21ca199cea Add missing load/store flags to thumb2 instructions.
These were the cause of a verifier error when building 7zip with
-verify-machineinstrs.  Running 'make check' with the verifier
triggered the same error on the test here so i've updated the test
to run the verifier on one of its runs instead of adding a new one.

While looking at this code, there was a stale comment that these
instructions were only used for disassembly.  This probably used to
be the case, but they are now used in the 'ARM load / store optimization pass' too.

llvm-svn: 242300
2015-07-15 16:36:38 +00:00
Bill Schmidt 1e77bb12b4 [PPC64LE] Fix vec_sld semantics for little endian
The vec_sld interface provides access to the vsldoi instruction.
Unlike most of the vec_* interfaces, we do not attempt to change the
generated code for vec_sld based on the endian mode.  It is too
difficult to correctly infer the desired semantics because of
different element types, and the corrected instruction sequence is
expensive, involving loading a permute control vector and performing a
generalized permute.

For GCC, this was implemented as "Don't touch the vec_sld"
implementation.  When it came time for the LLVM implementation, I did
the same thing.  However, this was hasty and incorrect.  In LLVM's
version of altivec.h, vec_sld was previously defined in terms of the
vec_perm interface.  Because vec_perm semantics are adjusted for
little endian, this means that leaving vec_sld untouched causes it to
generate something different for LE than for BE.  Not good.

This back-end patch accompanies the changes to altivec.h that change
vec_sld's behavior for little endian.  Those changes mean that we see
slightly different code in the back end when trying to recognize a
VSLDOI instruction in isVSLDOIShuffleMask.  In particular, a
ShuffleKind of 1 (where the two inputs are identical) must now be
treated the same way as a ShuffleKind of 2 (little endian with
different inputs) when little endian mode is in force.  This is
because ShuffleKind of 1 is defined using big-endian numbering.

This has a ripple effect on LowerBUILD_VECTOR, where we create our own
internal VSLDOI instructions.  Because these are a ShuffleKind of 1,
they will now have their shift amounts subtracted from 16 when
recognizing the shuffle mask.  To avoid problems we have to subtract
them from 16 again before creating the VSLDOI instructions.

There are a couple of other uses of BuildVSLDOI, but these do not need
to be modified because the shift amount is 8, which is unchanged when
subtracted from 16.

llvm-svn: 242296
2015-07-15 15:45:30 +00:00
Bruno Cardoso Lopes fadd4fef2a Look through PHIs to find additional register sources
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.

With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:

A:
  psllq %mm1, %mm0
  movd  %mm0, %r9
  jmp C

B:
  por %mm1, %mm0
  movd  %mm0, %r9
  jmp C

C:
  movd  %r9, %mm0
  pshufw  $238, %mm0, %mm0

Becomes:

A:
  psllq %mm1, %mm0
  jmp C

B:
  por %mm1, %mm0
  jmp C

C:
  pshufw  $238, %mm0, %mm0

Differential Revision: http://reviews.llvm.org/D11197

rdar://problem/20404526

llvm-svn: 242295
2015-07-15 15:35:23 +00:00
Benjamin Kramer c11fd3e775 [PPC] Disassemble little endian ppc instructions in the right byte order
PR24122. The test is simply a byte swapped version of ppc64-encoding.txt.

llvm-svn: 242288
2015-07-15 12:56:19 +00:00
Hal Finkel 5d36b230b5 [PowerPC] Use the MachineCombiner to reassociate fadd/fmul
This is a direct port of the code from the X86 backend (r239486/r240361), which
uses the MachineCombiner to reassociate (floating-point) adds/muls to increase
ILP, to the PowerPC backend. The rationale is the same.

There is a lot of copy-and-paste here between the X86 code and the PowerPC
code, and we should extract at least some of this into CodeGen somewhere.
However, I don't want to do that until this code is enhanced to handle FMAs as
well. After that, we'll be in a better position to extract the common parts.

llvm-svn: 242279
2015-07-15 08:23:05 +00:00
Hal Finkel 673b493e98 [PowerPC] Extend physical register live range in PPCVSXFMAMutate
If the source of the copy that defines the addend is a physical register, then
its existing live range may not extend to the FMA being mutated. Make sure we
extend the live range of the register to meet the FMA because it will become
its operand in this case.

I don't have an independent test case, but it will be exposed by change to be
committed shortly enabling the use of the machine combiner to do fadd/fmul
reassociation, and will be covered by one of the associated regression tests.

llvm-svn: 242278
2015-07-15 08:23:03 +00:00
Petr Pavlu 097adfb98c [AArch64] Fix problems in decoding generic MSR instructions
Bitpatterns rejected by the decoder method of `MSR (immediate)` should be
decoded as the `extended MSR (register)` instruction.

Differential Revision: http://reviews.llvm.org/D7174

llvm-svn: 242276
2015-07-15 08:10:30 +00:00
Igor Breger 096e8b0995 AVX : Fix ISA disabling in case AVX512VL , some instructions should be disabled only if AVX512BW present.
Tests added.

Differential Revision: http://reviews.llvm.org/D11122

llvm-svn: 242270
2015-07-15 07:08:10 +00:00
Pete Cooper e46f7ef385 Change conditional to assert. NFC.
This code was breaking from the case statement if the getStoreSizeInBits()
value was not a multiple of 0.  Given that the implementation returns
getStoreSize() * 8, it can only be a multiple of 8.

llvm-svn: 242255
2015-07-15 00:07:57 +00:00
Pete Cooper 7e64ef06e6 Use more foreach loops in SelectionDAG. NFC
llvm-svn: 242249
2015-07-14 23:43:29 +00:00
JF Bastien c8f48c19d3 WebAssembly: fix build breakage.
Summary:
processFunctionBeforeCalleeSavedScan was renamed to determineCalleeSaves and now takes a BitVector parameter as of rL242165, reviewed in http://reviews.llvm.org/D10909

WebAssembly is still marked as experimental and therefore doesn't build by default. It does, however, grep by default! I notice that processFunctionBeforeCalleeSavedScan is still mentioned in a few comments and error messages, which I also fixed.

Reviewers: qcolombet, sunfish

Subscribers: jfb, dsanders, hfinkel, MatzeB, llvm-commits

Differential Revision: http://reviews.llvm.org/D11199

llvm-svn: 242242
2015-07-14 23:06:07 +00:00
Hal Finkel 4012024fea [PowerPC] Support symbolic targets in patchpoints
Follow-up r235483, with the corresponding support in PPC. We use a regular call
for symbolic targets (because they're much cheaper than indirect calls).

llvm-svn: 242239
2015-07-14 22:53:11 +00:00
Hal Finkel 9bbad03b98 [PowerPC] Use the ABI indirect-call protocol for patchpoints
We used to take the address specified as the direct target of the patchpoint
and did no TOC-pointer handling.  This, however, as not all that useful,
because MCJIT tends to create a lot of modules, and they have their own TOC
sections. Thus, to call from the generated code to other generated code, you
really need to switch TOC pointers. Make this work as expected, and under
ELFv1, tread the address as the function descriptor address so that the correct
TOC pointer can be loaded.

llvm-svn: 242217
2015-07-14 22:26:06 +00:00
Pete Cooper 65c69407c8 Add allnodes() iterator range to SelectionDAG. NFC.
SelectionDAG already had begin/end methods for iterating over all
the nodes, but didn't define an iterator_range for us in foreach
loops.

This adds such a method and uses it in some of the eligible places
throughout the backends.

llvm-svn: 242212
2015-07-14 22:10:54 +00:00
JF Bastien d9767a368f WebAssembly: add basic int/fp instruction codegen.
Summary: This patch has the most basic instruction codegen for 32 and 64 bit int/fp.

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11193

llvm-svn: 242201
2015-07-14 21:13:29 +00:00
Krzysztof Parzyszek fdfaae42b6 Fix NDEBUG build warning
llvm-svn: 242200
2015-07-14 21:03:24 +00:00
Krzysztof Parzyszek 2e82883620 Fix Windows build: replace __func__ with LLVM_FUNCTION_NAME
llvm-svn: 242192
2015-07-14 20:11:28 +00:00
Bruno Cardoso Lopes 9e6dea1df8 [MMX] Use the appropriate instructions for GR64 <-> VR64 copies.
MOVSDto64rr and MOV64toSDrr are defined to convert between FR64 (%xmm)
<-> GR64 registers, not VR64 (%mm) <-> GR64. This is wrong.

I found this by inspection and could not find a suitable testcase for it
since (1) we don't handle MMX bitcasts in Peephole optimizer as to
generate COPYs that (2) could be expanded back to the appropriate x86
instruction in ExpandPostRA.

Switch to use the appropriate instructions: MMX_MOVD64from64rr and
MMX_MOVD64to64rr here.

llvm-svn: 242191
2015-07-14 20:09:34 +00:00
Hal Finkel 8acae5276e [PowerPC] Fix the PPCInstrInfo::getInstrLatency implementation
PowerPC uses itineraries to describe processor pipelines (and dispatch-group
restrictions for P7/P8 cores). Unfortunately, the target-independent
implementation of TII.getInstrLatency calls ItinData->getStageLatency, and that
looks for the largest cycle count in the pipeline for any given instruction.
This, however, yields the wrong answer for the PPC itineraries, because we
don't encode the full pipeline. Because the functional units are fully
pipelined, we only model the initial stages (there are no relevant hazards in
the later stages to model), and so the technique employed by getStageLatency
does not really work. Instead, we should take the maximum output operand
latency, and that's what PPCInstrInfo::getInstrLatency now does.

This caused some test-case churn, including two unfortunate side effects.
First, the new arrangement of copies we get from function parameters now
sometimes blocks VSX FMA mutation (a FIXME has been added to the code and the
test cases), and we have one significant test-suite regression:

SingleSource/Benchmarks/BenchmarkGame/spectral-norm
	56.4185% +/- 18.9398%

In this benchmark we have a loop with a vectorized FP divide, and it with the
new scheduling both divides end up in the same dispatch group (which in this
case seems to cause a problem, although why is not exactly clear). The grouping
structure is hard to predict from the bottom of the loop, and there may not be
much we can do to fix this.

Very few other test-suite performance effects were really significant, but
almost all weakly favor this change. However, in light of the issues
highlighted above, I've left the old behavior available via a
command-line flag.

llvm-svn: 242188
2015-07-14 20:02:02 +00:00
Krzysztof Parzyszek 758744706a [Hexagon] Generate instructions for operations on predicate registers
Convert logical operations on general-purpose registers to the correspon-
ding operations on predicate registers.

llvm-svn: 242186
2015-07-14 19:30:21 +00:00
Matt Arsenault 24692118ba AMDGPU: Avoid using 64-bit shift for i64 (shl x, 32)
This can be done only with moves which theoretically
will optimize better later.

Although this transform increases the instruction count,
it should be code size / cycle count neutral in the worst
VALU case. It also seems to slightly improve a couple
of testcases due to other DAG combines this exposes.

This is probably slightly worse for the SALU case, so
it might be better to handle this during moveToVALU,
although then you lose some simplifications like
the load width reducing in the simple testcase.

llvm-svn: 242177
2015-07-14 18:20:33 +00:00
Matt Arsenault 84db5d97b0 AMDGPU/SI: Fix read2 merging into a super register.
If the read2 produced was supposed to be writing into a
super register, it would use the wrong subregister indices.
Fix this by inserting copies, so we only ever write to a vreg_64.
Run the register coalescer again to clean this up, although this
isn't ideal and often does result in an extra move.

Also remove the assert that offset1 > offset0.

There isn't a real reason to not allow this other than a minor
convenience in the compiler, and it doesn't seem worth the effort
of avoiding it.

llvm-svn: 242174
2015-07-14 17:57:36 +00:00
Matthias Braun 9912bb817c MachineRegisterInfo: Remove UsedPhysReg infrastructure
We have a detailed def/use lists for every physical register in
MachineRegisterInfo anyway, so there is little use in maintaining an
additional bitset of which ones are used.

Removing it frees us from extra book keeping. This simplifies
VirtRegMap.

Differential Revision: http://reviews.llvm.org/D10911

llvm-svn: 242173
2015-07-14 17:52:07 +00:00
Nemanja Ivanovic 984a3613b3 Add missing builtins to the PPC back end for ABI compliance (vol. 4)
This patch corresponds to review:
http://reviews.llvm.org/D11183

Back end portion of the fourth round of additions to altivec.h.

llvm-svn: 242167
2015-07-14 17:25:20 +00:00
Matthias Braun 0256486532 PrologEpilogInserter: Rewrite API to determine callee save regsiters.
This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():

- Rename the function to determineCalleeSaves()
- Pass a bitset of callee saved registers by reference, thus avoiding
  the function-global PhysRegUsed bitset in MachineRegisterInfo.
- Without PhysRegUsed the implementation is fine tuned to not save
  physcial registers which are only read but never modified.

Related to rdar://21539507

Differential Revision: http://reviews.llvm.org/D10909

llvm-svn: 242165
2015-07-14 17:17:13 +00:00
Tim Northover c962d4f28b AArch64: add rev64 alias for 64-bit rev instruction.
It could be useful to assembly programmers and makes the permitted variants a
little more uniform.

llvm-svn: 242164
2015-07-14 17:07:29 +00:00
Krzysztof Parzyszek a0ecf07c0b [Hexagon] Generate "extract" instructions more aggressively
Generate extract instructions (via intrinsics) before the DAG combiner
folds shifts into unrecognizable forms.

llvm-svn: 242163
2015-07-14 17:07:24 +00:00
Hans Wennborg 61f9efe73b ARMAsmParser: Take MCInst param by const-ref
(Broken out from http://reviews.llvm.org/D11167)

llvm-svn: 242160
2015-07-14 16:39:01 +00:00
Tom Stellard e48fe2a27a AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructions
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11061

llvm-svn: 242146
2015-07-14 14:15:03 +00:00
Aaron Ballman a927a86c72 Silencing two MSVC warnings; 'argument' : truncation from 'unsigned int' to 'int16_t' and truncation of constant value. NFC intended.
llvm-svn: 242145
2015-07-14 14:14:00 +00:00
Daniel Sanders 03f9c019d2 [mips] Fix li/la differences between IAS and GAS.
Summary:
- Signed 16-bit should have priority over unsigned.
- For la, unsigned 16-bit must use ori+addu rather than directly use ori.
- Correct tests on 32-bit immediates with 64-bit predicates by
  sign-extending the immediate beforehand. For example, isInt<16>(0xffff8000)
  should be true and use addiu.

Also split li/la testing into separate files due to their size.

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10967

llvm-svn: 242139
2015-07-14 12:24:22 +00:00
Yaron Keren d1ba2d9d8b Generate correct asm info for mingw and cygwin ARM targets.
http://reviews.llvm.org/D11075

Patch by Martell Malone
Reviewed by Reid Kleckner

llvm-svn: 242123
2015-07-14 05:51:05 +00:00
NAKAMURA Takumi 0b305db8df Prune trailing whitespaces and CRs.
llvm-svn: 242117
2015-07-14 04:03:49 +00:00
Bill Schmidt 15deb803b4 [PPC64LE] More improvements to VSX swap optimization
This patch allows VSX swap optimization to succeed more frequently.
Specifically, it is concerned with common code sequences that occur
when copying a scalar floating-point value to a vector register.  This
patch currently handles cases where the floating-point value is
already in a register, but does not yet handle loads (such as via an
LXSDX scalar floating-point VSX load).  That will be dealt with later.

A typical case is when a scalar value comes in as a floating-point
parameter.  The value is copied into a virtual VSFRC register, and
then a sequence of SUBREG_TO_REG and/or COPY operations will convert
it to a full vector register of the class required by the context.  If
this vector register is then used as part of a lane-permuted
computation, the original scalar value will be in the wrong lane.  We
can fix this by adding a swap operation following any widening
SUBREG_TO_REG operation.  Additional COPY operations may be needed
around the swap operation in order to keep register assignment happy,
but these are pro forma operations that will be removed by coalescing.

If a scalar value is otherwise directly referenced in a computation
(such as by one of the many XS* vector-scalar operations), we
currently disable swap optimization.  These operations are
lane-sensitive by definition.  A MentionsPartialVR flag is added for
use in each swap table entry that mentions a scalar floating-point
register without having special handling defined.

A common idiom for PPC64LE is to convert a double-precision scalar to
a vector by performing a splat operation.  This ensures that the value
can be referenced as V[0], as it would be for big endian, whereas just
converting the scalar to a vector with a SUBREG_TO_REG operation
leaves this value only in V[1].  A doubleword splat operation is one
form of an XXPERMDI instruction, which takes one doubleword from a
first operand and another doubleword from a second operand, with a
two-bit selector operand indicating which doublewords are chosen.  In
the general case, an XXPERMDI can be permitted in a lane-swapped
region provided that it is properly transformed to select the
corresponding swapped values.  This transformation is to reverse the
order of the two input operands, and to reverse and complement the
bits of the selector operand (derivation left as an exercise to the
reader ;).

A new test case that exercises the scalar-to-vector and generalized
XXPERMDI transformations is added as CodeGen/PowerPC/swaps-le-5.ll.
The patch also requires a change to CodeGen/PowerPC/swaps-le-3.ll to
use CHECK-DAG instead of CHECK for two independent instructions that
now appear in reverse order.

There are two small unrelated changes that are added with this patch.
First, the XXSLDWI instruction was incorrectly omitted from the list
of lane-sensitive instructions; this is now fixed.  Second, I observed
that the same webs were being rejected over and over again for
different reasons.  Since it's sufficient to reject a web only once, I
added a check for this to speed up the compilation time slightly.

llvm-svn: 242081
2015-07-13 22:58:19 +00:00
Benjamin Kramer d8861517bf [Hexagon] Move BitTracker into the llvm namespace and remove redundant qualifications
No functional change intended.

llvm-svn: 242062
2015-07-13 20:38:16 +00:00
Matt Arsenault ca95d44110 AMDGPU: Minor cleanups to always inline pass
llvm-svn: 242053
2015-07-13 19:08:36 +00:00
Mark Heffernan 4c8ca53f7e Enable partial and runtime loop unrolling for NVPTX.
Enable partial and runtime loop unrolling for NVPTX backend via
TTI::UnrollingPreferences with a small threshold. This partially unrolls
small loops which are often unrolled by the PTX to SASS compiler
and unrolling earlier can be beneficial.

llvm-svn: 242049
2015-07-13 18:33:21 +00:00
Reid Kleckner 5f4dd92209 [WinEH] Strip the \01 character from the __CxxFrameHandler3 thunk name
Add another C++ 32-bit EH table test.

llvm-svn: 242044
2015-07-13 17:55:14 +00:00
Tom Stellard db5a11f698 AMDGPU/SI: Select mad patterns to v_mac_f32
The two-address instruction pass will convert these back to v_mad_f32
if necessary.

Differential Revision: http://reviews.llvm.org/D11060

llvm-svn: 242038
2015-07-13 15:47:57 +00:00
Logan Chien 0a43abc9f8 ARM: Fix cttz expansion on vector types.
The 64/128-bit vector types are legal if NEON instructions are
available.  However, there was no matching patterns for @llvm.cttz.*()
intrinsics and result in fatal error.

This commit fixes the problem by lowering cttz to:
a. ctpop((x & -x) - 1)
b. width - ctlz(x & -x) - 1

llvm-svn: 242037
2015-07-13 15:37:30 +00:00
Scott Douglass 69bf1ce03a [ARM] Handle commutativity when converting to tADDhirr in Thumb2
Also, run thumb_rewrite.s tests in Thumb2 now that they pass.

Differential Revision: http://reviews.llvm.org/D11132

llvm-svn: 242036
2015-07-13 15:31:48 +00:00
Scott Douglass d9d8d26458 [ARM] Add Thumb2 ADD with SP narrowing from 3 operand to 2
Differential Revision: http://reviews.llvm.org/D11131

llvm-svn: 242035
2015-07-13 15:31:40 +00:00
Scott Douglass 039f768c42 [ARM] Small refactor of tryConvertingToTwoOperandForm (nfc)
Also, add more Thumb2 ADD tests requested during review of
http://reviews.llvm.org/D11053.

Differential Revision: http://reviews.llvm.org/D11130

llvm-svn: 242034
2015-07-13 15:31:33 +00:00
Aaron Ballman 6d8f785073 Removing several -Wunused-but-set-variable warnings; NFC intended.
llvm-svn: 242028
2015-07-13 14:04:30 +00:00
Elena Demikhovsky 0f370936a0 AVX-512: Added all AVX-512 forms of Vector Convert for Float/Double/Int/Long types.
In this patch I have only encoding. Intrinsics and DAG lowering will be in the next patch.
I temporary removed the old intrinsics test (just to split this patch).
Half types are not covered here.

Differential Revision: http://reviews.llvm.org/D11134

llvm-svn: 242023
2015-07-13 13:26:20 +00:00
Renato Golin 1ef7a0f7c0 [ARM] Add support for nest attribute using r12
Register r12 ('ip') is used by GCC for this purpose
and hence is used here. As discussed on the GCC mailing
list, the register choice is an ABI issue and so
choosing the same register as GCC means
__builtin_call_with_static_chain is compatible.

A similar patch has just gone in the AArch64 backend,
so this is just the ARM counterpart, following the same
discussion.

Patch by Stephen Cross.

llvm-svn: 241996
2015-07-12 18:16:40 +00:00
Simon Pilgrim 4f500525ef [X86][SSE] (V)PMINSB is commutable.
(V)PMINSB is no different to the other (V)PMIN/(V)PMAX B/D/W instructions - it is fully commutable.

llvm-svn: 241994
2015-07-12 16:44:11 +00:00
Simon Pilgrim ae5cd2773d Trim trailing whitespaces. NFC.
llvm-svn: 241990
2015-07-12 11:17:33 +00:00
Simon Pilgrim 64cc4ad0a2 [X86][SSE] Vectorized v4i32 non-uniform shifts.
While the v4i32 shl operation is already vectorized using a cvttps2dq/pmulld pattern, the lshr/ashr opeations are still scalarized.

This patch adds vectorization support for non-uniform v4i32 shift operations - it splats constant shift amounts to allow them to use the immediate sse shift instructions, or extracts/zero-extends non-constant shift amounts. The individual results are then blended together.

Differential Revision: http://reviews.llvm.org/D11063

llvm-svn: 241989
2015-07-12 11:15:19 +00:00
Hal Finkel cbf08925ef [PowerPC] Make use of the TargetRecip system
r238842 added the TargetRecip system for controlling use of reciprocal
estimates for sqrt and division using a set of parameters that can be set by
the frontend. Clang now supports a sophisticated -mrecip option, and this will
allow that option to effectively control the relevant code-generation
functionality of the PPC backend.

llvm-svn: 241985
2015-07-12 02:33:57 +00:00
Hal Finkel 965cea5670 [PowerPC] Support the nest parameter attribute
This adds support for the 'nest' attribute, which allows the static chain
register to be set for functions calls under non-Darwin PPC/PPC64 targets. r11
is the chain register (which the PPC64 ELF ABI calls the "environment
pointer"). For indirect calls under PPC64 ELFv1, this would normally be loaded
from the function descriptor, but providing an explicit 'nest' parameter will
override that process and use the value provided.

This allows __builtin_call_with_static_chain to work as expected on PowerPC.

llvm-svn: 241984
2015-07-12 00:37:44 +00:00
Duncan P. N. Exon Smith e463e470f8 MC: Only allow changing feature bits in MCSubtargetInfo
Disallow all mutation of `MCSubtargetInfo` expect the feature bits.

Besides deleting the assignment operators -- which were dead "code" --
this restricts `InitMCProcessorInfo()` to subclass initialization
sequences, and exposes a new more limited function called
`setDefaultFeatures()` for use by the ARMAsmParser `.cpu` directive.

There's a small functional change here: ARMAsmParser used to adjust
`MCSubtargetInfo::CPUSchedModel` as a side effect of calling
`InitMCProcessorInfo()`, but I've removed that suspicious behaviour.
Since the AsmParser shouldn't be doing any scheduling, there shouldn't
be any observable change...

llvm-svn: 241961
2015-07-10 22:52:15 +00:00
Matt Arsenault cf13d18730 AMDGPU: Fix chains for memory ops dependent on argument loads
Most loads and stores are derived from pointers derived from
a kernel argument load inserted during argument lowering.
This was just using the EntryToken chain for the argument loads,
and any users of these loads were also on the EntryToken chain.

Return the chain of the lowered argument load so that dependent loads
end up on the correct chain.

No test since I'm not aware of any case where this actually
broke.

llvm-svn: 241960
2015-07-10 22:51:36 +00:00
Duncan P. N. Exon Smith 754e21f244 MC: Remove MCSubtargetInfo() default constructor
Force all creators of `MCSubtargetInfo` to immediately initialize it,
merging the default constructor and the initializer into an initializing
constructor.  Besides cleaning up the code a little, this makes it clear
that the initializer is never called again later.

Out-of-tree backends need a trivial change: instead of calling:

    auto *X = new MCSubtargetInfo();
    InitXYZMCSubtargetInfo(X, ...);
    return X;

they should call:

    return createXYZMCSubtargetInfoImpl(...);

There's no real functionality change here.

llvm-svn: 241957
2015-07-10 22:43:42 +00:00
Duncan P. N. Exon Smith bb57d73805 MC: Remove MCSubtargetInfo::InitCPUSched()
Remove all calls to `MCSubtargetInfo::InitCPUSched()` and merge its body
into the only relevant caller, `MCSubtargetInfo::InitMCProcessorInfo()`.
We were only calling the former after explicitly calling the latter with
the same CPU; it's confusing to have both methods exposed.

Besides a minor (surely unmeasurable) speedup in ARM and X86 from
avoiding running the logic twice, no functionality change.

llvm-svn: 241956
2015-07-10 22:33:01 +00:00
Matt Arsenault 0d5197380c AMDGPU: Use requested chain when lowering arguments
No test since I'm not aware of any case where this will
end up being a different chain.

llvm-svn: 241954
2015-07-10 22:28:41 +00:00
Matthias Braun e5a112f5e1 ARM: Use SpecificBumpPtrAllocator to fix leak introduced in r241920
llvm-svn: 241951
2015-07-10 22:23:57 +00:00
Evgeniy Stepanov 00b3020453 Fix AArch64 prologue for empty frame with dynamic allocas.
Fixes PR23804: assertion failure in emitPrologue in the case of a
function with an empty frame and a dynamic alloca that needs stack
realignment. This is a typical case for AddressSanitizer.

llvm-svn: 241943
2015-07-10 21:24:07 +00:00
Jingyue Wu a277561922 [TTI] BasicTTIImpl assumes no vector registers
Summary:
Following the discussion on r241884, it's more reasonable to assume that a
target has no vector registers by default instead of letting every such
target overrides getNumberOfRegisters.

Therefore, this patch modifies BasicTTIImpl::getNumberOfRegisters to
return 0 when Vector is true, and partially reverts r241884 which
modifies NVPTXTTIImpl::getNumberOfRegisters.

It also fixes a performance bug in LoopVectorizer. Even if a target has
no vector registers, vectorization may still help ILP. So, we need both
checks to be false before disabling loop vectorization all together.

Reviewers: hfinkel

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11108

llvm-svn: 241942
2015-07-10 21:14:54 +00:00
Matthias Braun d9bd22b2c4 ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common code
This commit factors out common code from MergeBaseUpdateLoadStore() and
MergeBaseUpdateLSMultiple() and introduces a new function
MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a
strd/ldrd instruction into an strd/ldrd instruction with writeback where
possible.

Differential Revision: http://reviews.llvm.org/D10676

llvm-svn: 241928
2015-07-10 18:37:33 +00:00
Matthias Braun e4ba6b8c24 ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2
Differential Revision: http://reviews.llvm.org/D10623

llvm-svn: 241926
2015-07-10 18:28:49 +00:00
JF Bastien 5ca0baca4a WebAssembly: basic instructions todo, and basic register info.
Summary:
This code is based on AArch64 for modern backend good practice, and NVPTX for
virtual ISA concerns.

Reviewers: sunfish

Subscribers: aemerson, llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11070

llvm-svn: 241923
2015-07-10 18:23:10 +00:00
JF Bastien b73a2ed20e Target RegisterInfo: devirtualize TargetFrameLowering
Summary:
The target frame lowering's concrete type is always known in RegisterInfo, yet it's only sometimes devirtualized through a static_cast. This change adds an auto-generated static function <Target>GenRegisterInfo::getFrameLowering(const MachineFunction &MF) which does this devirtualization, and uses this function in all targets which can.

This change was suggested by sunfish in D11070 for WebAssembly, I figure that I may as well improve the other targets while I'm here.

Subscribers: sunfish, ted, llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11093

llvm-svn: 241921
2015-07-10 18:13:17 +00:00
Matthias Braun a4a3182ded ARMLoadStoreOptimizer: Rewrite LDM/STM matching logic.
This improves the logic in several ways and is a preparation for
followup patches:
- First perform an analysis and create a list of merge candidates, then
  transform. This simplifies the code in that you have don't have to
  care to much anymore that you may be holding iterators to
  MachineInstrs that get removed.
- Analyze/Transform basic blocks in reverse order. This allows to use
  LivePhysRegs to find free registers instead of the RegisterScavenger.
  The RegisterScavenger will become less precise in the future as it
  relies on the deprecated kill-flags.
- Return the newly created node in MergeOps so there's no need to look
  around in the schedule to find it.
- Rename some MBBI iterators to InsertBefore to make their role clear.
- General code cleanup.

Differential Revision: http://reviews.llvm.org/D10140

llvm-svn: 241920
2015-07-10 18:08:49 +00:00
Eli Bendersky 5c0039a014 Actually support volatile memcpys in NVPTX lowering
Differential Revision: http://reviews.llvm.org/D11091

llvm-svn: 241914
2015-07-10 15:40:33 +00:00
Nemanja Ivanovic d9e4b4ff36 NFC. Added a blank line for consistency.
llvm-svn: 241913
2015-07-10 14:25:17 +00:00
Nemanja Ivanovic 5655fb320c Add missing builtins to the PPC back end for ABI compliance (vol. 3)
This patch corresponds to review:
http://reviews.llvm.org/D10973

Back end portion of the third round of additions to altivec.h.

llvm-svn: 241900
2015-07-10 12:38:08 +00:00
Jingyue Wu ad85c8c204 [NVPTX] declare no vector registers
Summary:
Without this patch, LoopVectorizer in certain cases (see loop-vectorize.ll)
produces code with complex control flow which hurts later optimizations. Since
NVPTX doesn't have vector registers in LLVM's sense
(NVPTXTTI::getRegisterBitWidth(true) == 32), we for now declare no vector
registers to effectively disable loop vectorization.

Reviewers: jholewinski

Subscribers: jingyue, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11089

llvm-svn: 241884
2015-07-10 04:31:56 +00:00
Reid Kleckner 85a2450d56 [WinEH] Make sure LSDA tables are 4 byte aligned
Apparently this is important, otherwise _except_handler3 assumes that
the registration node is corrupted and ignores it.

Also fix a bug in WinEHPrepare where we would insert code after a
terminator instruction.

llvm-svn: 241877
2015-07-10 00:08:49 +00:00
Eli Bendersky d880520bc2 Replace index-loops by range-based loops
NFC

llvm-svn: 241875
2015-07-09 23:06:03 +00:00
Sanjay Patel 81beefc541 [x86] enable machine combiner reassociations for scalar double-precision multiplies
llvm-svn: 241873
2015-07-09 22:58:39 +00:00
Sanjay Patel ea81edf351 [x86] enable machine combiner reassociations for scalar double-precision adds
llvm-svn: 241871
2015-07-09 22:48:54 +00:00
Reid Kleckner 8eecb3c160 [WinEH] Give up on using CSRs across 32-bit invokes for now
The runtime does not restore CSRs when transferring control back to the
function handling the exception. According to the experts on IRC, LLVM's
register allocator has no way to model register clobbers that only
happen on one edge of the CFG. For now, don't worry about trying to use
the meager three CSRs available on 32-bit X86 and just say that such
invokes preserve nothing.

llvm-svn: 241865
2015-07-09 22:09:41 +00:00
Tom Stellard dcb9f0907f AMDGPU: Add helper function for implicit parameter offsets.
Patch by: Zoltan Gilian

llvm-svn: 241861
2015-07-09 21:20:37 +00:00
JF Bastien b379643f7c Unbreak WebAssembly build
Summary: D11021 and D11045 didn't update the WebAssembly target's code. It's still experimental so all tests passed.

Reviewers: sunfish, joker.eph, echristo

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11084

llvm-svn: 241859
2015-07-09 21:00:09 +00:00
Matt Arsenault 8b03e6c164 AMDGPU/R600: Return correct chain when lowering loads
The other LowerLOAD should be returning the correct chain.

llvm-svn: 241839
2015-07-09 18:47:03 +00:00
Pat Gavlin a717f255b6 Allow {e,r}bp as the target of {read,write}_register.
This patch allows the read_register and write_register intrinsics to
read/write the RBP/EBP registers on X86 iff the targeted register is
the frame pointer for the containing function.

Differential Revision: http://reviews.llvm.org/D10977

llvm-svn: 241827
2015-07-09 17:40:29 +00:00
Tom Stellard ab6e9c0f94 AMDGPU/SI: The SIShrinkInstructions pass should only fold immediates with one use
This is convered by existing testcases and will be exposed by a future
commit.

llvm-svn: 241817
2015-07-09 16:30:36 +00:00
Tom Stellard 9ebf7ca2f0 AMDGPU/SI: Fix crash on physical registers in SIInstrInfo::isOperandLegal()
No test case for this.  I ran into it while working on some improvements
to SIShrinkInstructions.cpp.

llvm-svn: 241816
2015-07-09 16:30:27 +00:00
Krzysztof Parzyszek 8b26fbf758 [Hexagon] Add missing preamble to a source file
llvm-svn: 241813
2015-07-09 15:40:25 +00:00
Mehdi Amini eaabc51e78 Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT user
A documentation for this function would be nice by the way.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241807
2015-07-09 15:12:23 +00:00
Pawel Bylica d1b818bcf4 Reapply fixed r241790: Fix shift legalization and lowering for big constants.
Summary: If shift amount is a constant value > 64 bit it is handled incorrectly during type legalization and X86 lowering. This patch the type of shift amount argument in function DAGTypeLegalizer::ExpandShiftByConstant from unsigned to APInt.

Reviewers: nadav, majnemer, sanjoy, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: http://reviews.llvm.org/D10767

llvm-svn: 241806
2015-07-09 14:58:04 +00:00
Krzysztof Parzyszek feaf7b8d35 [Hexagon] Add support for atomic RMW operations
llvm-svn: 241804
2015-07-09 14:51:21 +00:00
Arnaud A. de Grandmaison f40f99e3a4 [AArch64] Select SBFIZ or UBFIZ instead of left + right shifts
And rename LSB to Immr / MSB to Imms to match the ARM ARM terminology.

llvm-svn: 241803
2015-07-09 14:33:38 +00:00
Scott Douglass 8143bc25ee [ARM] Thumb1 3 to 2 operand convertion for commutative operations
Differential Revision: http://reviews.llvm.org/D11057

llvm-svn: 241802
2015-07-09 14:13:55 +00:00
Scott Douglass 2740a63725 [ARM] Don't be overzealous converting Thumb1 3 to 2 operands
Differential Revision: http://reviews.llvm.org/D11056

llvm-svn: 241801
2015-07-09 14:13:48 +00:00
Scott Douglass 47a3fce461 [ARM] Add Thumb2 ADD with PC narrowing from 3 operand to 2
Differential Revision: http://reviews.llvm.org/D11055

llvm-svn: 241800
2015-07-09 14:13:41 +00:00
Scott Douglass 8c7803f4c1 [ARM] Refactor converting Thumb1 from 3 to 2 operand (nfc)
Also adds some test cases.

Differential Revision: http://reviews.llvm.org/D11054

llvm-svn: 241799
2015-07-09 14:13:34 +00:00
Renato Golin 17d4efe7c1 Add support for nest attribute to AArch64 backend
The nest attribute is currently supported on the x86 (32-bit) and x86-64
backends, but not on ARM (32-bit) or AArch64. This patch adds support for
nest to the AArch64 backend.

Register x18 is used by GCC for this purpose and hence is used here.
As discussed on the GCC mailing list the register choice is an ABI issue
and so choosing the same register as GCC means __builtin_call_with_static_chain
is compatible.

Patch by Stephen Cross.

llvm-svn: 241794
2015-07-09 10:18:02 +00:00
Pawel Bylica 627762fda5 Revert r241790: Fix shift legalization and lowering for big constants.
llvm-svn: 241792
2015-07-09 09:50:54 +00:00
Pawel Bylica eb122f2baf Fix shift legalization and lowering for big constants.
Summary: If shift amount is a constant value > 64 bit it is handled incorrectly during type legalization and X86 lowering. This patch the type of shift amount argument in function DAGTypeLegalizer::ExpandShiftByConstant from unsigned to APInt.

Reviewers: nadav, majnemer, sanjoy, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: http://reviews.llvm.org/D10767

llvm-svn: 241790
2015-07-09 08:01:36 +00:00
Mehdi Amini 157e5a6d10 Remove getDataLayout() from TargetSelectionDAGInfo (had no users)
Summary:
Remove empty subclass in the process.

This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren, ted

Differential Revision: http://reviews.llvm.org/D11045

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241780
2015-07-09 02:10:08 +00:00
Mehdi Amini a749f2ad47 Remove getDataLayout() from TargetLowering
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11042

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241779
2015-07-09 02:09:52 +00:00
Mehdi Amini 0cdec1e2ab Make isLegalAddressingMode() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11040

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241778
2015-07-09 02:09:40 +00:00
Mehdi Amini 5c183d5239 Make getByValTypeAlignment() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11038

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241777
2015-07-09 02:09:28 +00:00
Mehdi Amini 9639d650bb Make TargetLowering::getShiftAmountTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11037

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241776
2015-07-09 02:09:20 +00:00
Mehdi Amini 44ede33a69 Make TargetLowering::getPointerTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D11028

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241775
2015-07-09 02:09:04 +00:00
Mehdi Amini 5010ebf181 Make TargetTransformInfo keeping a reference to the Module DataLayout
DataLayout is no longer optional. It was initialized with or without
a DataLayout, and the DataLayout when supplied could have been the
one from the TargetMachine.

Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11021

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241774
2015-07-09 02:08:42 +00:00
Mehdi Amini 56228dabfa Redirect DataLayout from TargetMachine to Module in ComputeValueVTs()
Summary:
Avoid using the TargetMachine owned DataLayout and use the Module owned
one instead. This requires passing the DataLayout up the stack to
ComputeValueVTs().

This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, yaron.keren, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D11019

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241773
2015-07-09 01:57:34 +00:00
Sanjay Patel 093fb170a6 [x86] enable machine combiner reassociations for scalar single-precision multiplies
llvm-svn: 241752
2015-07-08 22:35:20 +00:00
Diego Novillo 13e20f1bbf Add missing dependency to Hexagon target.
A recent patch added calls to isInstructionTriviallyDead without the
corresponding dependency on TransformUtils.

llvm-svn: 241731
2015-07-08 21:13:37 +00:00
Reid Kleckner 4f21df2b96 [Win64] Only treat some functions as having the Win64 convention
All the usual X86 target-specific conventions are collapsed to the
normal Win64 convention, but the custom conventions like GHC and webkit
should not be.

Previously we would assume that the caller allocated 32 bytes of shadow
space for us, which is not how webkit_jscc or other custom conventions
are supposed to work.

Based on a patch by peavo@outlook.com.

Fixes PR24051.

llvm-svn: 241725
2015-07-08 21:03:47 +00:00
Krzysztof Parzyszek 79b2433e7c [Hexagon] Implement commoning of GetElementPtr instructions
llvm-svn: 241714
2015-07-08 19:22:28 +00:00
Reid Kleckner ed012dbf2a [SEH] Ensure that empty __except blocks have their own BB
The 32-bit lowering assumed that WinEHPrepare had this invariant.
WinEHPrepare did it for C++, but not SEH. The result was that we would
insert calls to llvm.x86.seh.restoreframe in normal basic blocks, which
corrupted the frame pointer.

llvm-svn: 241699
2015-07-08 18:08:52 +00:00
Duncan P. N. Exon Smith ad98745561 MC: Constify MCSubtargetInfo in getDeprecationInfo(), NFC
There's no reason to be able to mutate `MCSubtargetInfo` in
`getDeprecationInfo()`.  Constify the reference.

llvm-svn: 241693
2015-07-08 17:30:55 +00:00
Eli Bendersky 8e131f8cbc Cosmetic cleanups - NFC
Remove commented lines, trailing whitespace, etc.

llvm-svn: 241687
2015-07-08 16:33:21 +00:00
James Y Knight f238d176eb [SPARC] Cleanup handling of the Y/ASR registers.
- Implement copying ASR to/from GPR regs.
- Mark ASRs as non-allocatable, so it won't try to arbitrarily use
  them inappropriately.
- Instead of inserting explicit WRASR/RDASR nodes in the MUL/DIV
  routines, just do normal register copies.
- Also...mark div as using Y, not just writing it.

Added a test case with some code which previously died with an
assertion failure (with -O0), or produced wrong code (otherwise).

(Third time's the charm?)

Differential Revision: http://reviews.llvm.org/D10401

llvm-svn: 241686
2015-07-08 16:25:12 +00:00
Krzysztof Parzyszek 21b53a5120 [Hexagon] Generate "insert" instructions more aggressively
llvm-svn: 241683
2015-07-08 14:47:34 +00:00
Krzysztof Parzyszek d19b4767ff Revert 241681: causes Windows builds to fail
llvm-svn: 241682
2015-07-08 14:34:13 +00:00
Krzysztof Parzyszek 712b15b45e [Hexagon] Generate "insert" instructions more aggressively
llvm-svn: 241681
2015-07-08 14:22:27 +00:00
Simon Pilgrim 752de5dff2 [X86][SSE] Added (V)ROUNDSD + (V)ROUNDSS stack folding support
llvm-svn: 241671
2015-07-08 08:07:57 +00:00
Mehdi Amini ffc1402fad Remove IsLittleEndian from TargetLowering and redirect to DataLayout
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11017

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241655
2015-07-08 01:00:38 +00:00
Reid Kleckner e69bdb8619 [WinEH] Make llvm.x86.seh.restoreframe work for stack realignment prologues
The incoming EBP value points to the end of a local stack allocation, so
we can use that to restore ESI, the base pointer. Once we do that, we
can use local stack allocations. If we know we need stack realignment,
spill the original frame pointer in the prologue and reload it after
restoring ESI.

llvm-svn: 241648
2015-07-07 23:45:58 +00:00
Reid Kleckner d5afc62ff6 [WinEH] Add localaddress intrinsic instead of using frameaddress
Clang uses this for SEH finally. The new intrinsic will produce the
right value when stack realignment is required.

llvm-svn: 241643
2015-07-07 23:23:03 +00:00
Arnold Schwaighofer 3d43f66c91 Add more nvcasts
Tim Northover has told me that they can occur when the compiler cleverly
constructs constants - as demonstrated in the test case.

rdar://21703486

llvm-svn: 241641
2015-07-07 23:13:18 +00:00
Dan Gohman 489abd7046 [WebAssembly] Set the scheduling preference.
llvm-svn: 241637
2015-07-07 22:38:06 +00:00
Reid Kleckner 60381791b5 Rename llvm.frameescape and llvm.framerecover to localescape and localrecover
Summary:
Initially, these intrinsics seemed like part of a family of "frame"
related intrinsics, but now I think that's more confusing than helpful.
Initially, the LangRef specified that this would create a new kind of
allocation that would be allocated at a fixed offset from the frame
pointer (EBP/RBP). We ended up dropping that design, and leaving the
stack frame layout alone.

These intrinsics are really about sharing local stack allocations, not
frame pointers. I intend to go further and add an `llvm.localaddress()`
intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being
used to address locals, which should not be confused with the frame
pointer.

Naming suggestions at this point are welcome, I'm happy to re-run sed.

Reviewers: majnemer, nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11011

llvm-svn: 241633
2015-07-07 22:25:32 +00:00
Sanjay Patel d4e1bb89e3 fix typo; NFC
llvm-svn: 241629
2015-07-07 21:31:54 +00:00
Arnold Schwaighofer 4bc34b1515 Add a pattern for a nvcast from v2f64 -> v4f32
Since the NvCast is generated by the selection process the concerns about
endianess and bit reversal don't apply.

rdar://21703486

llvm-svn: 241611
2015-07-07 18:31:55 +00:00
Reid Kleckner af04c2a972 Use default member initializers to deduplicate code in X86MachineFunctionInfo, NFC
llvm-svn: 241609
2015-07-07 18:12:06 +00:00
Krzysztof Parzyszek a45971ac94 [Hexagon] Fix unused variable warnings in NDEBUG build caused by r241595
llvm-svn: 241600
2015-07-07 16:02:11 +00:00
Reid Kleckner 9200b2f93b [WinEH] Add a report_fatal_error for 32-bit stack realignment
This type of prologue isn't supported yet. Implementing it should be a
matter of copying the adjusted incoming EBP into ESI (the base pointer)
instead of EBP.  The original EBP can be saved and restored from other
memory afterwards.

llvm-svn: 241597
2015-07-07 15:47:29 +00:00
Krzysztof Parzyszek e53b31a593 [Hexagon] Implement bit-tracking facility with specifics for Hexagon
This includes code that is intended to be target-independent as well
as the Hexagon-specific details. This is just the framework without
any users.

llvm-svn: 241595
2015-07-07 15:16:42 +00:00
Sanjay Patel cf0a80728c use range-based for loops; NFCI
llvm-svn: 241592
2015-07-07 15:03:53 +00:00
Denis Protivensky b612902faa Fix gcc warnings of different enum and non-enum types in ternaries
llvm-svn: 241567
2015-07-07 07:48:48 +00:00
Akira Hatanaka 1bc8af78f4 [ARM] Define a subtarget feature and use it to decide whether long calls should
be emitted.

This is needed to enable ARM long calls for LTO and enable and disable it on a
per-function basis.

Out-of-tree projects currently using EnableARMLongCalls to emit long calls
should start passing "+long-calls" to the feature string (see the changes made
to clang in r241565).

rdar://problem/21529937

Differential Revision: http://reviews.llvm.org/D9364

llvm-svn: 241566
2015-07-07 06:54:42 +00:00
Simon Pilgrim 40343e6b3a [X86][AVX] Add support for shuffle decoding of vperm2f128/vperm2i128 with zero'd lanes
The vperm2f128/vperm2i128 shuffle mask decoding was not attempting to deal with shuffles that give zero lanes. This patch fixes this so that the assembly printer can provide shuffle comments.

As this decoder is also used in X86ISelLowering for shuffle combining, I've added an early-out to match existing behaviour. The hope is that we can add zero support in the future, this would allow other ops' decodes (e.g. insertps) to be combined as well.

Differential Revision: http://reviews.llvm.org/D10593

llvm-svn: 241516
2015-07-06 22:46:46 +00:00
Sanjay Patel 681a56ac58 [x86] extend machine combiner reassociation optimization to SSE scalar adds
Extend the reassociation optimization of http://reviews.llvm.org/rL240361 (D10460)
to SSE scalar FP SP adds in addition to AVX scalar FP SP adds.

With the 'switch' in place, we can trivially add other opcodes and test cases in
future patches.

Differential Revision: http://reviews.llvm.org/D10975

llvm-svn: 241515
2015-07-06 22:35:29 +00:00
Simon Pilgrim 8fbf1c1f4a [X86][SSE] Vectorized i64 uniform constant SRA shifts
This patch adds vectorization support for uniform constant i64 arithmetic shift right operators.

Differential Revision: http://reviews.llvm.org/D9645

llvm-svn: 241514
2015-07-06 22:35:19 +00:00
JF Bastien 86bc91508d WebAssembly: add some TODO
Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D10971

llvm-svn: 241513
2015-07-06 21:41:59 +00:00
Simon Pilgrim d85cae3d52 [X86][SSE4A] Shuffle lowering using SSE4A EXTRQ/INSERTQ instructions
This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it.

As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions.

From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level.

Differential Revision: http://reviews.llvm.org/D10146

llvm-svn: 241508
2015-07-06 20:46:41 +00:00
Simon Pilgrim 8b756596fc [X86][SSE] Use the general SMAX/SMIN/UMAX/UMIN opcodes and remove the X86 implementation
With the completion of D9746 there is now a common implementation of integer signed/unsigned min/max nodes, removing the need for the equivalent X86 specific implementations.

This patch removes the old X86ISD nodes, legalizes the relevant SSE2/SSE41/AVX2/AVX512 instructions for the ISD versions and converts the small amount of existing X86 code.

Differential Revision: http://reviews.llvm.org/D10947

llvm-svn: 241506
2015-07-06 20:30:47 +00:00
Alex Lorenz e2d75239d1 llc: Add a 'run-pass' option.
This commit adds a 'run-pass' option to llc, which instructs the compiler to run
one specific code generation pass only.

Llc already has the 'start-after' and the 'stop-after' options, and this new
option complements the other two by making it easier to write tests that want
to invoke a single pass only.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D10776

llvm-svn: 241476
2015-07-06 17:44:26 +00:00
Matt Arsenault db7781c6e9 AMDGPU: Run SIInsertWaits as pre-emit pass
Running this after the scheduler enables scheduling
waits later so other ALU instructions can run while
this would be waiting.

When combined with enabling the post-RA scheduler, this
gives about a ~20% improvement on sgemm.

llvm-svn: 241473
2015-07-06 17:02:20 +00:00
Daniel Sanders f423f5627c Change the last few internal StringRef triples into Triple objects.
Summary:
This concludes the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.

At this point, the StringRef-form of GNU Triples should only be used in the
public API (including IR serialization) and a couple objects that directly
interact with the API (most notably the Module class). The next step is to
replace these Triple objects with the TargetTuple object that will represent
our authoratative/unambiguous internal equivalent to GNU Triples.

Reviewers: rengolin

Subscribers: llvm-commits, jholewinski, ted, rengolin

Differential Revision: http://reviews.llvm.org/D10962

llvm-svn: 241472
2015-07-06 16:56:07 +00:00
Daniel Sanders fbdab437f0 Where Triple has a suitable predicate, use it rather than the enum values. NFC.
Reviewers: mcrosier

Subscribers: llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10960

llvm-svn: 241469
2015-07-06 16:33:18 +00:00
Matt Arsenault 706f930b72 AMDGPU/SI: Add debugging subtarget feature for DS offsets
We don't have a good way to detect most situations where
DS offsets are usable on SI, so add an option to force using
them even if unsafe for debugging performance problems.

llvm-svn: 241462
2015-07-06 16:01:58 +00:00
James Y Knight 89ac11de32 [Sparc] Add more instruction aliases.
These are mostly from the chart in the SparcV8 spec, section "A.3
Synthetic Instructions".

Differential Revision: http://reviews.llvm.org/D9834

llvm-svn: 241461
2015-07-06 16:01:07 +00:00
James Y Knight 7208a12eef [Sparc] Add support for flush instruction.
Differential Revision: http://reviews.llvm.org/D9833

llvm-svn: 241460
2015-07-06 16:01:04 +00:00
Chad Rosier 85a346395e Fix a bug in the A57FPLoadBalancing register tracking/scavenger.
The code in AArch64A57FPLoadBalancing::scavengeRegister() to handle dead defs
was not correctly handling aliased registers.  E.g. if the dead def was of D2,
then S2 was not being marked as unavailable, so it could potentially be used
across a live-range in which it would be clobbered.

Patch by Geoff Berry <gberry@codeaurora.org>!
Phabricator: http://reviews.llvm.org/D10900

llvm-svn: 241449
2015-07-06 14:46:34 +00:00
Asaf Badouh c6f3c82ffc [X86][AVX512] Multiply Packed Unsigned Integers with Round and Scale
pmulhrsw

review:
http://reviews.llvm.org/D10948

llvm-svn: 241443
2015-07-06 14:03:40 +00:00
Peter Collingbourne 6a9d1774d0 IR: Do not consider available_externally linkage to be linker-weak.
From the linker's perspective, an available_externally global is equivalent
to an external declaration (per isDeclarationForLinker()), so it is incorrect
to consider it to be a weak definition.

Also clean up some logic in the dead argument elimination pass and clarify
its comments to better explain how its behavior depends on linkage,
introduce GlobalValue::isStrongDefinitionForLinker() and start using
it throughout the optimizers and backend.

Differential Revision: http://reviews.llvm.org/D10941

llvm-svn: 241413
2015-07-05 20:52:35 +00:00
Benjamin Kramer 9bfb627a0e [TargetLowering] StringRefize asm constraint getters.
There is some functional change here because it changes target code from
atoi(3) to StringRef::getAsInteger which has error checking. For valid
constraints there should be no difference.

llvm-svn: 241411
2015-07-05 19:29:18 +00:00
Asaf Badouh 73f26f8ffc [x86][AVX512] add Multiply High Op
include encoding and intrinsics tests.

review
http://reviews.llvm.org/D10896

llvm-svn: 241406
2015-07-05 12:23:20 +00:00
Michael Kuperstein 5f05153fbb [X86] Fix incorrect/inefficient pushw encodings for x86-64 targets
Correctly support assembling "pushw $imm8" on x86-64 targets. 
Also some cleanup of the PUSH instructions (PUSH64i16 and PUSHi16 actually
represent the same instruction)

This fixes PR23996

Patch by: david.l.kreitzer@intel.com
Differential Revision: http://reviews.llvm.org/D10878

llvm-svn: 241404
2015-07-05 10:25:41 +00:00
Nemanja Ivanovic d358b8f80d Add missing builtins to the PPC back end for ABI compliance (vol. 2)
This patch corresponds to review:
http://reviews.llvm.org/D10874

Back end portion of the second round of additions to altivec.h.

llvm-svn: 241398
2015-07-05 06:03:51 +00:00
Simon Pilgrim ea1b6ee366 [X86][SSE] Improved i8/i16 to f64 uint2fp vector conversions
Followup to D10433 and D10589 that fixes i8/i16 uint2fp vector conversions by zero extending to i32 and using the sint2fp path (unless the target does actually support uint2fp).

llvm-svn: 241394
2015-07-04 15:33:34 +00:00
Craig Topper de8395229a [X86] Add proper 64-bit mode checks to jrcxz and jcxz.
llvm-svn: 241381
2015-07-04 00:01:07 +00:00
Matt Arsenault 24e33d10a0 AMDGPU: Fix indentation of switch
llvm-svn: 241380
2015-07-03 23:33:38 +00:00
Rafael Espindola ed067c45d4 Return ErrorOr from getSymbolAddress.
It can fail trying to get the section on ELF and COFF. This makes sure the
error is handled.

llvm-svn: 241366
2015-07-03 18:19:00 +00:00
Rafael Espindola e2df87f24b Replace a few more MachO only uses of getSymbolAddress.
llvm-svn: 241365
2015-07-03 18:02:36 +00:00
Simon Pilgrim b504263e4a [X86][SSE] Sign extension for target vector sizes less than 128 bits (pt2)
Add support for v2i8/v2i16 to v2f64 by using a sign extension to v2i32 before conversion to v2f64.

Differential Revision: http://reviews.llvm.org/D10589

llvm-svn: 241325
2015-07-03 08:01:36 +00:00
Simon Pilgrim 385bf00ea2 [X86][SSE] Sign extension for target vector sizes less than 128 bits (pt1)
This patch adds support for sign extension for sub 128-bit vectors, such as to v2i32. It concatenates with UNDEF subvectors up to 128-bits, performs the sign extension (i.e. as v4i32) and then extracts the target subvector.

Patch 1/2 of D10589 - the second patch covers the conversion of v2i8/v2i16 to v2f64.

llvm-svn: 241323
2015-07-03 07:51:01 +00:00
Dan Gohman bfaf7e15d1 [WebAssembly] Set the HasFloatingPointExceptions flag for WebAssembly.
llvm-svn: 241302
2015-07-02 21:36:25 +00:00
Rafael Espindola 5d0c2ffadf Return ErrorOr from SymbolRef::getName.
This function can really fail since the string table offset can be out of
bounds.

Using ErrorOr makes sure the error is checked.

Hopefully a lot of the boilerplate code in tools/* can go away once we have
a diagnostic manager in Object.

llvm-svn: 241297
2015-07-02 20:55:21 +00:00
Bill Schmidt a1c30053e7 [PPC64LE] Remove implicit-subreg restriction from VSX swap removal
In r241285, I removed the SUBREG_TO_REG restriction from VSX swap
removal, determining that this was overly conservative.  We have
another form of the same restriction in that we check for the presence
of implicit subregs in vector operations.  As with SUBREG_TO_REG for
partial register conversions, an implicit subreg is safe in and of
itself, provided no other operation makes a lane-sensitive assumption
about the result.  This patch removes that restriction, by removing
the HasImplicitSubreg flag and all code that relies on it.

I've added a test case that fails to optimize before this patch is
applied, and optimizes properly with the patch.  Test based on a
report from Anton Blanchard.

llvm-svn: 241290
2015-07-02 19:01:22 +00:00
Bill Schmidt 7c691fee1c [PPC64LE] Teach swap optimization about the doubleword splat idiom
With a previous patch, the VSX swap optimization is able to recognize
the doubleword load-splat idiom that can be implemented using lxvdsx.
However, that does not cover a doubleword splat where the source is a
register.  We can implement this using xxspltd (a special form of
xxpermdi).  This patch teaches the swap optimization pass about this
idiom.

As a prerequisite, it also permits swap optimization to succeed for
all forms of SUBREG_TO_REG.  Previously we were conservative and only
allowed SUBREG_TO_REG when it copied a full register.  However, on
reflection any form of SUBREG_TO_REG is safe in and of itself, so long
as an unsafe operation is not performed on its result.  In particular,
a widening SUBREG_TO_REG often occurs as an input to a doubleword
splat idiom, particularly in auto-vectorized code.

The doubleword splat idiom is an XXPERMDI operation where both source
registers are identical, and the selection mask is either 0 (splat the
first element) or 3 (splat the second element).  To determine whether
the registers are identical, we use the existing mechanism for looking
through "copy-like" operations.  That mechanism has a side effect of
marking the XXPERMDI operation as using a physical register, which
would invalidate its presence in a swap-optimized region.  This is
correct for the form of XXPERMDI that performs a swap and hence would
be removed, but is not what we want for a doubleword-splat variety of
XXPERMDI.  Therefore we reset the physical-register flag on the
XXPERMDI when it represents a splat.

A simple test case is added to verify that we generate the splat and
that we also remove the xxswapd instructions that would otherwise be
associated with the load and store of another operand.

llvm-svn: 241285
2015-07-02 17:03:06 +00:00
Eric Christopher e100226879 Implement TargetTransformInfo::hasCompatibleFunctionAttributes for X86.
This checks subtarget feature compatibility for inlining by verifying
that the callee is a strict subset of the caller's features. This includes
the cpu as part of the subtarget we can get via the incoming functions as
the backend takes CPUs as feature sets.

This allows us to inline things like:

int foo() { return baz(); }

int __attribute__((target("sse4.2"))) bar() {
  return foo();
}

so that generic code can be inlined into specialized functions.

llvm-svn: 241221
2015-07-02 01:11:50 +00:00
JF Bastien 03855df197 WebAssembly: start instructions
Summary:
* Add 64-bit address space feature.
* Rename SIMD feature to SIMD128.
* Handle single-thread model with an IR pass (same way ARM does).
* Rename generic processor to MVP, to follow design's lead.
* Add bleeding-edge processors, with all features included.
* Fix a few DEBUG_TYPE to match other backends.

Test Plan: ninja check

Reviewers: sunfish

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D10880

llvm-svn: 241211
2015-07-01 23:41:25 +00:00
Dan Gohman d82494bb75 [WebAssembly] Define separate Target instances for 32-bit and 64-bit.
llvm-svn: 241193
2015-07-01 21:42:34 +00:00
Jingyue Wu a0a56601c0 [NVPTX] expand extload/truncstore for vectors of floats
Summary:
According to PTX ISA:

For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type.

So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of.

As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like
ld.v2.f32 {%fd3, %fd4}, [%rd2]

This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated.

Patched by Gang Hu. 

Test Plan: Test case attached.

Reviewers: jingyue

Reviewed By: jingyue

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D10876

llvm-svn: 241191
2015-07-01 21:32:42 +00:00
Jingyue Wu 77b5b385ee [NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass
Summary:
Offset of frame index is calculated by NVPTXPrologEpilogPass. Before
that the correct offset of stack objects cannot be obtained, which
leads to wrong offset if there are more than 2 frame objects. This patch
move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index
is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check
VRFrame register instead, and try to remove the VRFrame if there
is no usage after NVPTXPeephole pass.

Patched by Xuetian Weng. 

Test Plan:
Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the
offset calculation based on SP and SPL.

Reviewers: jholewinski, jingyue

Reviewed By: jingyue

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10853

llvm-svn: 241185
2015-07-01 20:08:06 +00:00
Bill Schmidt ae94f11d55 [PPC64LE] Enable missing lxvdsx optimization, and related swap optimization
When adding little-endian vector support for PowerPC last year, I
inadvertently disabled an optimization that recognizes a load-splat
idiom and generates the lxvdsx instruction.  This patch moves the
offending logic so lxvdsx is once again generated.

This pattern is frequently generated by the vectorizer for scalar
loads of an effective constant.  Previously the lxvdsx instruction was
wrongly listed as lane-sensitive for the VSX swap optimization (since
both doublewords are identical, swaps are safe).  This patch fixes
this as well, so that vectorized code using lxvdsx can now have swaps
removed from the computation.

There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll
that checks for the missing optimization.  However, vsx.ll was only
being tested for POWER7 with big-endian code generation.  I've added
a little-endian RUN statement and expected LE code generation for all
the tests in vsx.ll to give us a bit better VSX coverage, including
what's needed for this patch.

llvm-svn: 241183
2015-07-01 19:40:07 +00:00
Sanjay Patel 910d5daa4b fix formatting; NFC
llvm-svn: 241175
2015-07-01 17:58:53 +00:00
Sanjay Patel e4d95c6c9a fix typos in comment; NFC
llvm-svn: 241174
2015-07-01 17:55:07 +00:00
Reid Kleckner f80636682c [SEH] Don't assert if the parent function lacks a personality
The EH code might have been deleted as unreachable and the personality
pruned while the filter is still present.  Currently I'm hitting this at
-O0 due to the clang bug PR24009.

llvm-svn: 241170
2015-07-01 16:45:47 +00:00
Arnaud A. de Grandmaison 650c520007 [AArch64] Implement add/adds/sub/subs/cmp/cmn with negative immediate aliases
This patch teaches the AsmParser to accept add/adds/sub/subs/cmp/cmn
with a negative immediate operand and convert them as shown:

  add  Rd, Rn, -imm -> sub  Rd, Rn, imm
  sub  Rd, Rn, -imm -> add  Rd, Rn, imm
  adds Rd, Rn, -imm -> subs Rd, Rn, imm
  subs Rd, Rn, -imm -> adds Rd, Rn, imm
  cmp  Rn, -imm     -> cmn  Rn, imm
  cmn  Rn, -imm     -> cmp  Rn, imm

Those instructions are an alternate syntax available to assembly coders,
and are needed in order to support code already compiling with some other
assemblers (gas). They are documented in the "ARMv8 Instruction Set
Overview", in the "Arithmetic (immediate)" section. This makes llvm-mc
a programmer-friendly assembler !

This also fixes PR20978: "Assembly handling of adding negative numbers
not as smart as gas".

llvm-svn: 241166
2015-07-01 15:05:58 +00:00
James Y Knight a8a8c605ee [Sparc] Rearrange SparcInstrInfo, no change.
Move some instructions into order of sections in the spec, as the rest
already were.

Differential Revision: http://reviews.llvm.org/D9102

llvm-svn: 241163
2015-07-01 14:38:07 +00:00
Igor Breger 15820b072b AVX-512: Implemented missing encoding for FMA scalar instructions
Added tests for encoding

Differential Revision: http://reviews.llvm.org/D10865

llvm-svn: 241159
2015-07-01 13:24:28 +00:00
Michael Kuperstein 21a3c18443 [X86] Avoid over-relaxation of 8-bit immediates in integer arithmetic instructions.
Only consider an instruction a candidate for relaxation if the last operand of the 
instruction is an expression. We previously checked whether any operand is an expression,
which is useless, since for all instructions concerned, the only operand that may be
affected by relaxation is the last one.
In addition, this removes the check for having RIP as an argument, since it was 
plain wrong - even when one of the arguments is RIP, relaxation may still be needed.

This fixes PR9807.

Patch by: david.l.kreitzer@intel.com
Differential Revision: http://reviews.llvm.org/D10766

llvm-svn: 241152
2015-07-01 10:54:42 +00:00
Zoran Jovanovic 2a47d08afd [mips][microMIPS] Implement SLL and NOP instructions
http://reviews.llvm.org/D10474

llvm-svn: 241150
2015-07-01 09:54:51 +00:00
Reid Kleckner 399a2fe400 [SEH] Add new intrinsics for recovering and restoring parent frames
The incoming EBP value established by the runtime is actually a pointer
to the end of the EH registration object, and not the true parent
function frame pointer. Clang doesn't need llvm.x86.seh.exceptioninfo
anymore because we know that the exception info pointer is at a fixed
offset from this incoming EBP.

The llvm.x86.seh.recoverfp intrinsic takes an EBP value provided by the
EH runtime and returns a pointer that is usable with llvm.framerecover.

The llvm.x86.seh.restoreframe intrinsic is inserted by the 32-bit
specific preparation pass in blocks targetted by the EH runtime. It
re-establishes any physical registers used by the parent function to
address the stack, such as the frame, base, and stack pointers.

Neither of these intrinsics correctly handle stack realignment prologues
yet, but it's possible to add that later.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D10848

llvm-svn: 241125
2015-06-30 22:46:59 +00:00
Jingyue Wu b374dc651c [NVPTX] cleanups and refacotring in NVPTXFrameLowering.cpp
Summary: NFC

Test Plan: no regression

Reviewers: wengxt

Reviewed By: wengxt

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10849

llvm-svn: 241118
2015-06-30 21:28:31 +00:00
Nemanja Ivanovic 7df26c9b6f Modified a comment about the reason for the patch (removed commented code).
llvm-svn: 241110
2015-06-30 20:01:16 +00:00
Nemanja Ivanovic 9c8d4cf272 Fixes a bug with __builtin_vsx_lxvdw4x on Little Endian systems
llvm-svn: 241108
2015-06-30 19:45:45 +00:00
Jingyue Wu 9fe08c4bb3 [NVPTX] Fix issue introduced in D10321
Summary:
Really check if %SP is not used in other places, instead of checking only exact
one non-dbg use.

Patched by Xuetian Weng. 

Test Plan:
@foo4 in test/CodeGen/NVPTX/local-stack-frame.ll, create a case that
SP will appear twice.

Reviewers: jholewinski, jingyue

Reviewed By: jingyue

Subscribers: llvm-commits, sfantao, jholewinski

Differential Revision: http://reviews.llvm.org/D10844

llvm-svn: 241099
2015-06-30 18:59:19 +00:00
Samuel Antao 01ee64c2ea Force relocation mode to be default, regardless of what is passed to the backend.
llvm-svn: 241081
2015-06-30 17:18:00 +00:00
Michael Kuperstein 8a6c9ccc98 [X86] Fix a bug in WIN_FTOL_32/64 handling.
Duplicating an FP register "as itself" is a bad idea, since it violates the
invariant that every FP register is mapped to at most one FPU stack slot.
Use the scratch FP register instead.

This fixes PR23957.

llvm-svn: 241069
2015-06-30 14:38:57 +00:00
Toma Tabacu 0f09313051 [mips] [IAS] Add support for the .module softfloat/hardfloat directives.
These directives are used to set the default value of the SoftFloat feature.
They have the same effect as setting -m{soft, hard}-float from the command line.

Differential Revision: http://reviews.llvm.org/D9073

llvm-svn: 241066
2015-06-30 13:46:03 +00:00
Toma Tabacu fc97d8a95a [mips] [IAS] Make .module directives change AssemblerOptions->front().
Differential Revision: http://reviews.llvm.org/D10643

llvm-svn: 241062
2015-06-30 12:41:33 +00:00
Ranjeet Singh 86ecbb7b54 Reverting r241058 because it's causing buildbot failures.
llvm-svn: 241061
2015-06-30 12:32:53 +00:00
Ranjeet Singh 5b119091a1 There are a few places where subtarget features are still
represented by uint64_t, this patch replaces these
usages with the FeatureBitset (std::bitset) type.

Differential Revision: http://reviews.llvm.org/D10542

llvm-svn: 241058
2015-06-30 11:30:42 +00:00
Toma Tabacu 32c72aa099 [mips] [IAS] Add support for the .set oddspreg/nooddspreg directives.
Differential Revision: http://reviews.llvm.org/D10657

llvm-svn: 241052
2015-06-30 09:36:50 +00:00
Michael Kuperstein 5aff75b92a [X86] Add FXSR intrinsics
Add intrinsics for the FXSR instructions (FXSAVE/FXSAVE64/FXRSTOR/FXRSTOR64)

llvm-svn: 241049
2015-06-30 08:49:35 +00:00
Rafael Espindola 99c041b72f Don't return error_code from a function that doesn't fail.
llvm-svn: 241033
2015-06-30 01:53:01 +00:00
Rafael Espindola 71784d611d Cleanup getRelocationAddend.
Realistically, this will be returning ErrorOr for some time as refactoring the
user code to check once per section will take some time.

Given that, use it for checking if a relocation has addend or not.

While at it, add ELFRelocationRef to simplify the users.

llvm-svn: 241028
2015-06-30 00:33:59 +00:00
Dan Gohman 10e730a263 [WebAssembly] Initial WebAssembly backend
This WebAssembly backend is just a skeleton at this time and is not yet
functional.

llvm-svn: 241022
2015-06-29 23:51:55 +00:00
Peter Collingbourne aef3659e18 Teach LTOModule to emit linker flags for dllexported symbols, plus interface cleanup.
This change unifies how LTOModule and the backend obtain linker flags
for globals: via a new TargetLoweringObjectFile member function named
emitLinkerFlagsForGlobal. A new function LTOModule::getLinkerOpts() returns
the list of linker flags as a single concatenated string.

This change affects the C libLTO API: the function lto_module_get_*deplibs now
exposes an empty list, and lto_module_get_*linkeropts exposes a single element
which combines the contents of all observed flags. libLTO should never have
tried to parse the linker flags; it is the linker's job to do so. Because
linkers will need to be able to parse flags in regular object files, it
makes little sense for libLTO to have a redundant mechanism for doing so.

The new API is compatible with the old one. It is valid for a user to specify
multiple linker flags in a single pragma directive like this:

 #pragma comment(linker, "/defaultlib:foo /defaultlib:bar")

The previous implementation would not have exposed
either flag via lto_module_get_*deplibs (as the test in
TargetLoweringObjectFileCOFF::getDepLibFromLinkerOpt was case sensitive)
and would have exposed "/defaultlib:foo /defaultlib:bar" as a single flag via
lto_module_get_*linkeropts. This may have been a bug in the implementation,
but it does give us a chance to fix the interface.

Differential Revision: http://reviews.llvm.org/D10548

llvm-svn: 241010
2015-06-29 22:04:09 +00:00
Tim Northover 83f0fbcc37 ARM: add correct kill flags when combining stm instructions
When the store sequence being combined actually stores the base register, we
should not mark it as killed until the end.

rdar://21504262

llvm-svn: 241003
2015-06-29 21:42:16 +00:00
Matthias Braun abf88a0398 X86: Rework inline asm integer register specification.
This is a new version of http://reviews.llvm.org/D10260.

It turned out that when you specify an integer register in inline asm on
x86 you get the register of the required type size back. That means that
X86TargetLowering::getRegForInlineAsmConstraint() has to accept any of
the integer registers and adapt its size to the given target size which
may be any 8/16/32/64 bit sized type. Surprisingly that means given a
constraint of "{ax}" and a type of MVT::F32 we need to return X86::EAX.

This change makes this face explicit, the previous code seemed like
working by accident because there it never returned an error once a
register was found. On the other hand this rewrite allows to actually
return errors for invalid situations like requesting an integer register
for an i128 type.

Related to rdar://21042280

Differential Revision: http://reviews.llvm.org/D10813

llvm-svn: 241002
2015-06-29 21:35:51 +00:00
Elena Demikhovsky 30bc4ca313 AVX-512: all forms of SCATTER instruction on SKX,
encoding, intrinsics and tests.

llvm-svn: 240936
2015-06-29 12:14:24 +00:00
Javed Absar d5526303b7 [ARM]: Extend -mfpu options for half-precision and vfpv3xd
Some of the the permissible ARM -mfpu options, which are supported in GCC,
are currently not present in llvm/clang.This patch adds the options:
'neon-fp16', 'vfpv3-fp16', 'vfpv3-d16-fp16', 'vfpv3xd' and 'vfpv3xd-fp16.
These are related to half-precision floating-point and single precision.

Reviewers: rengolin, ranjeet.singh

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10645

llvm-svn: 240930
2015-06-29 09:32:29 +00:00
Igor Breger a7a8e9a018 AVX-512: Implemented missing encoding and intrinsics for FMA instructions
Added tests for DAG lowering ,encoding and intrinsics

Differential Revision: http://reviews.llvm.org/D10796

llvm-svn: 240926
2015-06-29 09:10:00 +00:00
NAKAMURA Takumi 7bffb6954d Whitespace.
llvm-svn: 240924
2015-06-29 04:50:09 +00:00
Matt Arsenault 8ebce8f12b AMDGPU/SI: Fix extra space when printing v_div_fmas_*
llvm-svn: 240911
2015-06-28 18:16:14 +00:00
Asaf Badouh 7ec4b7a8bb [x86][AVX512]
Add vscalef support
include encoding and intrinsics


review:
http://reviews.llvm.org/D10730

llvm-svn: 240906
2015-06-28 14:30:39 +00:00
Elena Demikhovsky 6a1a357f1f AVX-512: Added all SKX forms of GATHER instructions.
Added intrinsics.
Added encoding and tests.

llvm-svn: 240905
2015-06-28 10:53:29 +00:00
Daniel Sanders a3134fae17 [mips] Add COP0 register class and use it in M[FT]C0/DM[FT]C0.
Summary:
Previously it (incorrectly) used GPR's.

Patch by Simon Dardis. A couple small corrections by myself.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10567

llvm-svn: 240883
2015-06-27 15:39:19 +00:00
Jingyue Wu 3203818bf7 [NVPTX] noop when kernel pointers are already global
Summary:
Some front ends make kernel pointers global already. In that case,
handlePointerParams does nothing.

Test Plan: more tests in lower-kernel-ptr-arg.ll

Reviewers: grosser

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10779

llvm-svn: 240849
2015-06-26 22:35:43 +00:00
Tom Stellard 4694ed0a14 AMDPGU/SI: Use correct resource descriptors for VI on HSA
Summary: We need to set MTYPE = 2 for VI shaders when targeting the HSA runtime.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D10777

llvm-svn: 240841
2015-06-26 21:58:42 +00:00
Tom Stellard ff7416ba06 AMDGPU/SI: Update amd_kernel_code_t definition and add assembler support
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10772

llvm-svn: 240839
2015-06-26 21:58:31 +00:00
Tom Stellard 833ae4fadd AMDGPU/SI: Remove unused variable
This should fix some bots that were broken by r240831.

llvm-svn: 240838
2015-06-26 21:58:26 +00:00
Tom Stellard 91efe9cebe AMDGPU/SI: Set ELF OS/ABI to ELFOSABI_AMDGPU_HSA
Reviewers: arsenm, rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10708

llvm-svn: 240832
2015-06-26 21:15:11 +00:00
Tom Stellard 347ac79b15 AMDGPU/SI: Add hsa code object directives
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10757

llvm-svn: 240831
2015-06-26 21:15:07 +00:00