Commit Graph

34482 Commits

Author SHA1 Message Date
Chad Rosier 0b15e7c618 [AArch64] Rename variable to improve readability. NFC.
llvm-svn: 249008
2015-10-01 13:33:31 +00:00
Chad Rosier 7a83d770ae [AArch64] Update comment to reflect reality.
llvm-svn: 249007
2015-10-01 13:09:44 +00:00
Zoran Jovanovic 2960f3a346 [mips][microMIPS] Implement CACHEE, WRPGPR and WSBH instructions
Differential Revision: http://reviews.llvm.org/D10337

llvm-svn: 249004
2015-10-01 12:49:27 +00:00
Scott Douglass 290183d734 [ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer
Differential Revision: http://reviews.llvm.org/D13240

llvm-svn: 249002
2015-10-01 11:56:19 +00:00
Tom Stellard 1f0e7bbc5b AMDGPU/SI: Re-order PreloadedValue enum and number entries based on init order
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12451

llvm-svn: 248978
2015-10-01 02:02:46 +00:00
Ahmed Bougacha 23a0d1a1d6 [X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.

Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:

  movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
  pand %xmm0, %xmm1
  por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
  psrld $16, %xmm0
  por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
  addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
  addps %xmm1, %xmm0

Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).

In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).

However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.

When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.

Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.

One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.

Fixes PR24512.

...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.

llvm-svn: 248965
2015-10-01 00:11:07 +00:00
Reid Kleckner 6dec87a8a0 [WinEH] Emit int3 after noreturn calls on Win64
The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.

Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call.  Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.

The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.

llvm-svn: 248959
2015-09-30 23:09:23 +00:00
Sanjay Patel a114a10bbe [x86] enable machine combiner reassociations for 256-bit vector logical integer insts
llvm-svn: 248955
2015-09-30 22:25:55 +00:00
David Blaikie 757908e545 Fix -Wsign-compare warning
llvm-svn: 248942
2015-09-30 20:37:48 +00:00
Chad Rosier 11c825f7db [AArch64] Remove an unnecessary restriction on pre-index instructions.
Previously, the index was constrained to the size of the memory operation for
no apparent reason.  This change removes that constraint so that we can form
pre-index instructions with any valid offset.

llvm-svn: 248931
2015-09-30 19:44:40 +00:00
Hal Finkel 4c45775880 [PowerPC] Disable shrink wrapping
Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for
now until the problem can be fixed.

llvm-svn: 248924
2015-09-30 17:29:03 +00:00
Artyom Skrobov 72ca6b8f3f [ARM] Support for ARMv6-Z / ARMv6-ZK missing
As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121
TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK.

In particular, there were no tests for TrustZone being supported in these
architectures.

The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes
his test case which hadn't really been testing what it was claiming to test.

Differential Revision: http://reviews.llvm.org/D13236

llvm-svn: 248921
2015-09-30 17:25:52 +00:00
Chad Rosier 4f04e2ec87 [AArch64] Use helper function to improve readability. NFC.
llvm-svn: 248914
2015-09-30 16:50:41 +00:00
Jeroen Ketema ab99b59e8c [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Simon Pilgrim 3d11c994f7 [X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.

Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.

Differential Revision: http://reviews.llvm.org/D8690

llvm-svn: 248878
2015-09-30 08:17:50 +00:00
Marek Olsak d1a69a2839 AMDGPU/SI: Don't set DATA_FORMAT if ADD_TID_ENABLE is set
to prevent setting a huge stride, because DATA_FORMAT has a different
meaning if ADD_TID_ENABLE is set.

This is a candidate for stable llvm 3.7.

Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 248858
2015-09-29 23:37:32 +00:00
Reid Kleckner a13dfd539b [WinEH] Setup RBP correctly in Win64 funclet prologues
Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.

llvm-svn: 248857
2015-09-29 23:32:01 +00:00
David Majnemer 91b0ab9172 [WinEH] Ensure that funclets obey the x64 ABI
The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow.  However, we'd insert code to
initialize the return address after stack adjustments.  Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.

llvm-svn: 248839
2015-09-29 22:33:36 +00:00
Maksim Panchenko cce239c45d HHVM calling conventions.
HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.

In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.

When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.

One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)

Differential Revision: http://reviews.llvm.org/D12681

llvm-svn: 248832
2015-09-29 22:09:16 +00:00
Chad Rosier 4315012769 [AArch64] Add support for pre- and post-index LDPSWs.
llvm-svn: 248825
2015-09-29 20:39:55 +00:00
David Majnemer a80c151286 [WinEH] Teach AsmPrinter about funclets
Summary:
Funclets have been turned into functions by the time they hit the object
file.  Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.

Differential Revision: http://reviews.llvm.org/D13261

llvm-svn: 248824
2015-09-29 20:12:33 +00:00
Chad Rosier dabe2534ed [AArch64] Add integer pre- and post-index halfword/byte loads and stores.
llvm-svn: 248817
2015-09-29 18:26:15 +00:00
Nemanja Ivanovic 2c84b29464 Addition of interfaces the BE to conform to Table A-2 of ELF V2 ABI V1.1
This patch corresponds to review:
http://reviews.llvm.org/D13191

Back end portion of the fifth round of additions to altivec.h.

llvm-svn: 248809
2015-09-29 17:41:53 +00:00
Chad Rosier 32d4d37e61 [AArch64] Scale offsets by the size of the memory operation. NFC.
The immediate in the load/store should be scaled by the size of the memory
operation, not the size of the register being loaded/stored.  This change gets
us one step closer to forming LDPSW instructions.  This change also enables
pre- and post-indexing for halfword and byte loads and stores.

llvm-svn: 248804
2015-09-29 16:07:32 +00:00
Chad Rosier a4d3217e81 [AArch64] Remove some redundant cases. NFC.
llvm-svn: 248800
2015-09-29 14:57:10 +00:00
Jeroen Ketema 740f9d79ca Arguments spilled on the stack before a function call may have
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.

Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have

%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)

the x86 back-end emits:

movaps  32(%ecx), %xmm2
movaps  (%ecx), %xmm0
movaps  16(%ecx), %xmm1
movaps  48(%ecx), %xmm3
subl    $20, %esp       <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards 
movaps  %xmm3, (%esp)   <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl    $0, 16(%esp)
calll   __bar

To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:

subl    $32, %esp
movaps  %xmm3, (%esp)

Differential Revision: http://reviews.llvm.org/D12337

llvm-svn: 248786
2015-09-29 10:12:57 +00:00
NAKAMURA Takumi 0c12a3949e [CMake] X86AsmParser: Prune redundant LINK_LIBS.
It is described in LLVMBuild.txt.

llvm-svn: 248771
2015-09-29 01:25:01 +00:00
Sanjay Patel 3a14f1a338 add a FIXME for a CPU model check that should have an attribute instead
llvm-svn: 248746
2015-09-28 22:00:24 +00:00
Matt Arsenault ba6aae785a AMDGPU: Factor switch into separate function
llvm-svn: 248742
2015-09-28 20:54:57 +00:00
Matt Arsenault 73aa8f687a AMDGPU: Fix splitting x16 SMRD loads
When used recursively, this would set the kill flag
on the intermediate step from first splitting
x16 to x8.

llvm-svn: 248741
2015-09-28 20:54:52 +00:00
Matt Arsenault e5d042cd56 AMDGPU: Fix moving SMRD loads with literal offsets on CI
llvm-svn: 248740
2015-09-28 20:54:46 +00:00
Matt Arsenault dd49c5fc1b AMDGPU: Fix splitting SMRD with large offset
The splitting of > 4 dword SMRD instructions
if using an offset in an SGPR instead of an immediate
was not setting the destination register,
resulting an an instruction missing an operand
which would assert later.

Test will be included in a following commit
which fixes a related issue.

llvm-svn: 248739
2015-09-28 20:54:42 +00:00
Andrew Kaylor 16c4da03d5 Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.
Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)

Differential Revision: http://reviews.llvm.org/D11370

llvm-svn: 248735
2015-09-28 20:33:22 +00:00
Daniel Sanders 7727e1098c [mips][p5600] Added P5600 processor and initial scheduler.
Summary:
The P5600 is an out-of-order, superscalar implementation of the MIPS32R5
architecture.

The scheduler has a few missing details (see the 'Tricky Instructions'
section and some quirks of the P5600 are deliberately omitted due to
implementation difficulty and low chance of significant benefit (e.g. the
predicate on P5600WriteEitherALU). However, testing on SingleSource is
showing significant performance benefits on some apps (seven in the 10-30%
range) and only one significant regression (12%) when
-pre-RA-sched=linearize is given. Without -pre-RA-sched=linearize the
results are more variable. Some do even better (up to 55% improvement) but
increased numbers of copies are slowing others down (up to 12%).

Overall, the scheduler as it currently stands is a 2.4% win with
-pre-RA-sched=linearize and a 2.7% win without -pre-RA-sched=linearize.
I'm sure we can improve on this further.

For completeness, the FPGA this was tested on shows some failures with and
without the P5600 scheduler. These appear to be scheduling related since
the two test runs have fairly different sets of failing tests even after
accounting for other factors (e.g. spurious connection failures) however
it's not P5600 specific since we also get some for the generic scheduler.

Reviewers: vkalintiris

Subscribers: mpf, llvm-commits, atrick, vkalintiris

Differential Revision: http://reviews.llvm.org/D12193

llvm-svn: 248725
2015-09-28 18:24:08 +00:00
Dan Gohman 05a17aa82a [WebAssembly] Support for direct call and call_indirect.
llvm-svn: 248716
2015-09-28 16:22:39 +00:00
Zoran Jovanovic cdb64566cc [mips] Handling of immediates bigger than 16 bits
Differential Revision: http://reviews.llvm.org/D10539

llvm-svn: 248706
2015-09-28 11:11:34 +00:00
Artyom Skrobov ad8a0638f7 [ARM] Avoid redundant checks for isThumb1Only() after supportsTailCall()
supportsTailCall() has two callers. Both of them double-check isThumb1Only(),
and refuse to proceed with tail-calling in that case.
Therefore, it makes sense to move this check to
ARMSubtarget::initSubtargetFeatures, where SupportsTailCall is initialized;
and to eliminate the extra checks at the call sites.

Following a review comment, added an "assert(supportsTailCall())"
in IsEligibleForTailCall.

NFC.

llvm-svn: 248703
2015-09-28 09:44:11 +00:00
Craig Topper 862d5d8322 Remove 'const' from some ArrayRefs. ArrayRefs are already immutable. NFC
llvm-svn: 248693
2015-09-28 00:15:34 +00:00
Yaron Keren e5a9dc2f5b Silence clang warning: variable ‘Status’ set but not used.
llvm-svn: 248691
2015-09-27 21:31:33 +00:00
Matt Arsenault 1d36b717a5 AMDGPU: Remove hasPostISelHook from most instructions
Since this is only needed for VOP3 and a few other special
case instructions, stop setting it on everything.

llvm-svn: 248657
2015-09-26 05:06:48 +00:00
Matt Arsenault f32481372c AMDGPU: Switch over reg class size instead of checking all super classes
This gets isSGPRClass out of my profile of SIFixSGPRCopies.

llvm-svn: 248656
2015-09-26 04:59:04 +00:00
Matt Arsenault 6e28010215 AMDGPU: Don't handle invalid reg classes in helper functions
No tests hit these and it would be better to have checks like
this explicit where they are used.

llvm-svn: 248655
2015-09-26 04:53:30 +00:00
Saleem Abdulrasool 9174623b2d AMDGPU: address -Winconsistent-missing-override
Add missing override.  NFC.

llvm-svn: 248652
2015-09-26 04:34:52 +00:00
Matt Arsenault 8e1ddf84fe AMDGPU: Set CopyCost of register classes
These require multiple mov instructions to copy,
but the default value is that 1 instruction is needed.
I'm not sure if this actually changes anything.

llvm-svn: 248651
2015-09-26 04:09:34 +00:00
Matt Arsenault e98a074c42 AMDGPU: VOP3b definition cleanups
llvm-svn: 248647
2015-09-26 02:25:48 +00:00
Matt Arsenault 86095b8dec AMDGPU: Fix sched model for VOP2b instructions
Trying to use the version with the explicit output operand
would complain because of the missing WriteSALU. I'm not sure
why it doesn't complain about this with the implicit VCC def.

llvm-svn: 248646
2015-09-26 02:25:45 +00:00
Dan Gohman d0bf981296 [WebAssembly] Rename several functions and types according to the new spec.
llvm-svn: 248644
2015-09-26 01:09:44 +00:00
Ahmed Bougacha e81610fabb [ARM] Don't generate clrex for pre-v7 targets.
Since r248294, we emit clrex, but it doesn't exist on v6.

llvm-svn: 248640
2015-09-26 00:14:02 +00:00
Matt Arsenault e229c0c45e AMDGPU: Construct new buffer instruction when moving SMRD
It's easier to understand creating a full instruction
than the current situation where sometimes a new
instruction is created and sometimes it is awkwardly
mutated in place.

llvm-svn: 248627
2015-09-25 22:21:19 +00:00
Sanjay Patel bbbf9a1a34 merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711)
This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).

The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner 
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.

This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.

The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.

Differential Revision: http://reviews.llvm.org/D12635
 

llvm-svn: 248622
2015-09-25 21:49:48 +00:00
Tom Stellard e135ffd554 AMDGPU/SI: Use .hsatext section instead of .text for HSA
Reviewers: arsenm, grosbach, rafael

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12424

llvm-svn: 248619
2015-09-25 21:41:28 +00:00
Matthias Braun c2d4befb54 MachineBasicBlock: Factor out common code into isReturnBlock()
llvm-svn: 248617
2015-09-25 21:25:19 +00:00
Matt Arsenault f743b838cb AMDGPU: Make getNamedOperandIdx declaration readonly
This matches how it is defined in the generated implementation.

llvm-svn: 248598
2015-09-25 18:09:15 +00:00
Chad Rosier 1bbd7fb38e [AArch64] Add support for generating pre- and post-index load/store pairs.
llvm-svn: 248593
2015-09-25 17:48:17 +00:00
Matt Arsenault 0a10900070 AMDGPU: Disable some passes that are not meaningful
Don't run passes related to stack maps, garbage collection,
exceptions since these aren't useful for GPUs.

There might be a few more to turn off that I'm less sure about
(e.g. ShrinkWrapping) or I'm not sure how to disable
(SafeStack and StackProtector)

llvm-svn: 248591
2015-09-25 17:41:20 +00:00
Matt Arsenault 4bf43d4e68 AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG
This fixes a select error when the i64 source was also
bitcasted to v2i32 in the original source.

Instead of awkwardly trying to select the modified source value and
the store, replace before isel begins.

Uses a worklist to avoid possible problems from mutating the DAG,
although it seems to work OK without it.

llvm-svn: 248589
2015-09-25 17:27:08 +00:00
Matt Arsenault 0cb8517dc6 AMDGPU: Fix recomputing dominator tree unnecessarily
SIFixSGPRCopies does not modify the CFG, but this was
being recomputed before running SIFoldOperands.

llvm-svn: 248587
2015-09-25 17:21:28 +00:00
Matt Arsenault 2d6fdb8495 AMDGPU: Re-justify workaround and fix worked around problem
When buffer resource descriptors were built, the upper two components
of the descriptor were first composed into a 64-bit register because
legalizeOperands assumed all operands had the same register class.
Fix that problem, but keep the workaround. I'm not sure anything
actually is actually emitting such a REG_SEQUENCE now.

If multiple resource descriptors are set up with different base
pointers, this is copied with a single s_mov_b64. We probably
should fix this better by recognizing a pair of s_mov_b32 later,
but for now delete the dead code.

llvm-svn: 248585
2015-09-25 17:08:42 +00:00
Matt Arsenault 3ad55ec946 AMDGPU: Don't create REG_SEQUENCE with SGPR dest and VGPR sources
This avoids needting to re-legalize the new REG_SEQUENCE.

llvm-svn: 248584
2015-09-25 17:08:40 +00:00
Matt Arsenault 6525aa3529 AMDGPU: Fix not adding exec to defs of cmpx instruction pseudos
This was only set on the final _si/_vi version, but not
on the pseudos most of codegen sees.

No test since these instructions aren't used yet.

llvm-svn: 248583
2015-09-25 16:58:27 +00:00
Matt Arsenault 5f70436c49 AMDGPU: Improve accuracy of instruction rates for VOPC
These were all using the default 32-bit VALU write class,
but the i64/f64 compares are half rate.

I'm not sure this is really correct, because they are still using
the write to VALU write class, even though they really write
to the SALU.

llvm-svn: 248582
2015-09-25 16:58:25 +00:00
Saleem Abdulrasool 8e99f50768 ARM: make -Asserts,-Werror=unused-variable build happy
The value was only used in an assertion.  Sink the variable usage into the
assertion.

llvm-svn: 248562
2015-09-25 05:41:02 +00:00
Saleem Abdulrasool fe83b50289 ARM: address WoA division limitation
We now emit the compiler generated divide by zero check that was needed for the
MSVC routines.  We construct a psuedo-instruction for the DBZ check as the
operation requires splitting up the BB.  For the 64-bit operations, we need to
custom expand the node as we need to insert the DBZ check and then emit the
libcall to the appropriate name.  Because this is target specific, it seemed
better to reproduce the expansion operation from the target-agnostic type
legalization rather than sink this there to avoid the duplication.  The division
library calls now match MSVC semantically.

llvm-svn: 248561
2015-09-25 05:15:46 +00:00
Matt Arsenault 8aa9973696 AMDGPU: Remove unused includes
llvm-svn: 248553
2015-09-25 00:28:43 +00:00
Chad Rosier b02f5a5a1f [AArch64] Improve the readability of the ld/st optimization pass. NFC.
In this context, MI is an add/sub instruction not a loads/store.

llvm-svn: 248540
2015-09-24 21:27:49 +00:00
Simon Pilgrim 68d0050c6a [X86][SSE2] Fix zero/any extension shuffles that don't start from the first element
Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H

llvm-svn: 248536
2015-09-24 21:02:17 +00:00
Matt Arsenault e66621b306 AMDGPU: Add s_dcache_* instructions
llvm-svn: 248533
2015-09-24 19:52:27 +00:00
Matt Arsenault d6adfb401c AMDGPU: Add cache invalidation instructions.
These are necessary for implementing mem_fence for
OpenCL 2.0.

The VI assembler tests are disabled since it seems to be
using the wrong encoding or opcode.

llvm-svn: 248532
2015-09-24 19:52:21 +00:00
Chad Rosier 7cd472b719 [AArch64] The paired post-increment store instruction has an output register.
The pre- and post-increment version update the base register, but the post-
version was defined incorrectly.  There is no test case as we don't currently
generate these instructions, but I plan on changing that in the near future.

llvm-svn: 248528
2015-09-24 19:21:42 +00:00
Artyom Skrobov cf296444ab [ARM] Handle +t2dsp feature as an ArchExtKind in ARMTargetParser.def
Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a
hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with
a FIXME: attached.

This patch changes the handling of +t2dsp to be in line with other
architecture extensions.

Following a revert of r248152 and new review comments, this patch also includes
renaming FeatureDSPThumb2 -> FeatureDSP, hasThumb2DSP() -> hasDSP(), etc.
The spelling of "t2dsp" is preserved, pending a further investigation of its
possible external usage.

Differential Revision: http://reviews.llvm.org/D12937

llvm-svn: 248519
2015-09-24 17:31:16 +00:00
Daniel Sanders 090f6e41c4 [mips] Use PredicateControl for the MSA ASE instructions. NFC.
Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13092

llvm-svn: 248486
2015-09-24 12:10:23 +00:00
Matt Arsenault 68d938649e Introduce target hook for optimizing register copies
Allow a target to do something other than search for copies
that will avoid cross register bank copies.

Implement for SI by only rewriting the most basic copies,
so it should look through anything like a subregister extract.

I'm not entirely satisified with this because it seems like
eliminating a reg_sequence that isn't fully used should work
generically for all targets without them having to override
something. However, it seems to be tricky to have a simple
implementation of this without rewriting to invalid  kinds
of subregister copies on some targets.

I'm not sure if there is currently a generic way to easily check
if a subregister index would be valid for the current use.
The current set of TargetRegisterInfo::get*Class functions don't
quite behave like I would expect (e.g. getSubClassWithSubReg
returns the maximal register class rather than the minimal), so
I'm not sure how to make the generic test keep searching if
SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making
the default implementation to check for simple copies breaks
a variety of ARM and x86 tests by producing illegal subregister uses.

The ARM tests are not actually changed since it should still be using
the same sharesSameRegisterFile implementation, this just relaxes
them to not check for specific registers.

llvm-svn: 248478
2015-09-24 08:36:14 +00:00
Matt Arsenault e068f9a263 AMDGPU: Return after instruction is processed.
llvm-svn: 248476
2015-09-24 07:51:28 +00:00
Matt Arsenault 708586faa2 AMDGPU: Remove another unnecessary check from commuteInstruction
llvm-svn: 248475
2015-09-24 07:51:25 +00:00
Matt Arsenault fa242960fc AMDGPU: Add readonly to InstrMapping functions
llvm-svn: 248474
2015-09-24 07:51:23 +00:00
Matt Arsenault cab64f1c75 AMDGPU: Fix printing trailing whitespace for mubuf atomics
llvm-svn: 248472
2015-09-24 07:51:17 +00:00
Matt Arsenault c8e2ce4046 AMDGPU: Reduce number of copies emitted
Instead of always inserting a copy in case
the super register is itself a subregister,
only extract to the super reg class if this is
actually the case.

This shouldn't really change codegen, but
makes looking at the output of SIFixSGPRCopies
easier to read.

llvm-svn: 248467
2015-09-24 07:16:37 +00:00
Tim Northover beb5bccf88 ARM: fix folding stack adjustment (again again again...)
This time, the issue is that we weren't accounting for the possibility that
aligned DPRs could have been stored after the final "push" in a prologue. When
that happened we effectively moved a "sub sp, #N" from below the aligned stores
to above them, and everything went to pot.

To make it worse, I'd actually committed something testing that we produced
wrong code, so the test update is tiny.

llvm-svn: 248437
2015-09-23 22:21:09 +00:00
Sanjay Patel 1a6534661b [x86] replace integer 'xor' ops with packed SSE FP 'xor' ops when operating on FP scalars
Turn this:

movd %xmm0, %eax
movd %xmm1, %ecx
xorl %eax, %ecx
movd %ecx, %xmm0

into this:

xorps %xmm1, %xmm0

This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

This is an extension of:
http://reviews.llvm.org/rL248395

llvm-svn: 248415
2015-09-23 18:33:42 +00:00
Sanjay Patel aba37553c4 [x86] replace integer 'or' ops with packed SSE FP 'or' ops when operating on FP scalars
Turn this:

movd %xmm0, %eax
movd %xmm1, %ecx
orl %eax, %ecx
movd %ecx, %xmm0

into this:

orps %xmm1, %xmm0

This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

This is an extension of:
http://reviews.llvm.org/rL248395

llvm-svn: 248409
2015-09-23 18:19:07 +00:00
Evgeniy Stepanov a2002b08f7 Android support for SafeStack.
Add two new ways of accessing the unsafe stack pointer:

* At a fixed offset from the thread TLS base. This is very similar to
  StackProtector cookies, but we plan to extend it to other backends
  (ARM in particular) soon. Bionic-side implementation here:
  https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
  neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
  not emutls).

This is a re-commit of a change in r248357 that was reverted in
r248358.

llvm-svn: 248405
2015-09-23 18:07:56 +00:00
Sanjay Patel b14ecd34f7 move call to convertIntLogicToFPLogic up; NFCI
The BEXTR comments didn't make sense before, we may want to extend the
FP logic transform to work on vectors, and this way is more beautiful.

llvm-svn: 248404
2015-09-23 18:03:37 +00:00
Sanjay Patel ade3abd2d9 [x86] move code for converting int logic to FP logic to a helper function; NFCI
This is a follow-on to:
http://reviews.llvm.org/rL248395

so we can add the call to the or/xor combines too.

llvm-svn: 248399
2015-09-23 17:39:41 +00:00
Sanjay Patel df2495f331 [x86] replace integer 'and' ops with packed SSE FP 'and' ops when operating on FP scalars
Turn this:
   movd %xmm0, %eax
   movd %xmm1, %ecx
   andl %eax, %ecx
   movd %ecx, %xmm0

into this:
   andps %xmm1, %xmm0


This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

Differential Revision: http://reviews.llvm.org/D13065

llvm-svn: 248395
2015-09-23 17:00:06 +00:00
Dan Gohman 979840d31f [WebAssembly] Fix hasAddr64 being used before being initializer.
This reverts r248388 and fixes the underlying bug: hasAddr64 was initialized
in runOnMachineFunction, but runOnMachineFunction isn't ever called in
CodeGen/WebAssembly/global.ll since that testcase has no functions. The fix
here is to use AsmPrinter's getPointerSize() as needed to determine the
pointer size instead.

llvm-svn: 248394
2015-09-23 16:59:10 +00:00
Alexander Kornienko a3eaa204e6 Fix CodeGen/WebAssembly/global.ll test under ASAN.
llvm-svn: 248388
2015-09-23 15:41:25 +00:00
Chad Rosier 2dfd35499e [AArch64] Refactor pre- and post-index merge fuctions into a single function. NFC.
llvm-svn: 248377
2015-09-23 13:51:44 +00:00
Oliver Stannard f2ed5c68d2 [ARM] Add option to force fast-isel
The ARM backend has some logic that only allows the fast-isel to be enabled for
subtargets where it is known to be stable. This adds a backend option to
override this and force the fast-isel to be used for any target, to allow it to
be tested.

This is an ARM-specific option, because no other backend disables the fast-isel
on a per-subtarget basis.

llvm-svn: 248369
2015-09-23 09:19:54 +00:00
Simon Pilgrim 9cb018b6b6 [X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.

LLVM counterpart to D12835

Differential Revision: http://reviews.llvm.org/D13002

llvm-svn: 248368
2015-09-23 08:48:33 +00:00
Sanjoy Das 2aacc0ecca [SCEV] Introduce ScalarEvolution::getOne and getZero.
Summary:
It is fairly common to call SE->getConstant(Ty, 0) or
SE->getConstant(Ty, 1); this change makes such uses a little bit
briefer.

I've refactored the call sites I could find easily to use getZero /
getOne.

Reviewers: hfinkel, majnemer, reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12947

llvm-svn: 248362
2015-09-23 01:59:04 +00:00
Evgeniy Stepanov 8d0e3011d8 Revert "Android support for SafeStack."
test/Transforms/SafeStack/abi.ll breaks when target is not supported;
needs refactoring.

llvm-svn: 248358
2015-09-23 01:23:22 +00:00
Evgeniy Stepanov ce2e16f00c Android support for SafeStack.
Add two new ways of accessing the unsafe stack pointer:

* At a fixed offset from the thread TLS base. This is very similar to
  StackProtector cookies, but we plan to extend it to other backends
  (ARM in particular) soon. Bionic-side implementation here:
  https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
  neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
  not emutls).

llvm-svn: 248357
2015-09-23 01:03:51 +00:00
Ahmed Bougacha 81616a72ea [ARM] Emit clrex in the expanded cmpxchg fail block.
ARM counterpart to r248291:

In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.

Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.

Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".

Differential Revision: http://reviews.llvm.org/D13033

llvm-svn: 248294
2015-09-22 17:22:58 +00:00
Ahmed Bougacha 07a844d758 [AArch64] Emit clrex in the expanded cmpxchg fail block.
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.

Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.

Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".

Differential Revision: http://reviews.llvm.org/D13033

llvm-svn: 248291
2015-09-22 17:21:44 +00:00
Daniel Sanders 86cce70010 [mips][sched] Split IIBranch into specific instruction classes.
Summary:
Almost no functional change since the InstrItinData's have been duplicated.
The one functional change is to remove IIBranch from the MSA branches. The
classes will be assigned to the MSA instructions as part of implementing
the P5600 scheduler.

II_IndirectBranchPseudo and II_ReturnPseudo can probably be removed. I've
preserved the itinerary information for the corresponding pseudo
instructions to avoid making a functional change to these pseudos in
this patch.

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12189

llvm-svn: 248273
2015-09-22 13:36:28 +00:00
Daniel Sanders 1af1d275bc [mips][sched] Temporarily rename IIAlu to IIM16Alu. NFC.
Summary:
The only instructions left in IIAlu are MIPS16 specific. We're not
implementing a MIPS16 scheduler at this time so rename the class to make it
obvious that they are MIPS16 instructions.

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12188

llvm-svn: 248267
2015-09-22 12:36:28 +00:00
Stephen Canon 8216d88511 Don't raise inexact when lowering ceil, floor, round, trunc.
The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions *did* raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior.  This also lets us fold some logic into TD patterns, which is nice.

Differential Revision: http://reviews.llvm.org/D12969

llvm-svn: 248266
2015-09-22 11:43:17 +00:00
NAKAMURA Takumi 10c80e7996 Prune trailing whitespaces.
llvm-svn: 248265
2015-09-22 11:19:03 +00:00
NAKAMURA Takumi 0a7d0ad95f Untabify.
llvm-svn: 248264
2015-09-22 11:15:07 +00:00
NAKAMURA Takumi a9cb538a74 Reformat blank lines.
llvm-svn: 248263
2015-09-22 11:14:39 +00:00