Commit Graph

38279 Commits

Author SHA1 Message Date
Rafael Espindola 9f1c1fe428 Use the isPositionIndependent predicate. NFC.
llvm-svn: 273870
2016-06-27 12:48:21 +00:00
Rafael Espindola b2b6a8580c Add an explanation on how mips is special in here.
llvm-svn: 273868
2016-06-27 12:33:33 +00:00
NAKAMURA Takumi 5cbd41e0a8 SIMachineFunctionInfo.cpp: Appease msc18 to use std::array.
llvm-svn: 273860
2016-06-27 10:26:43 +00:00
NAKAMURA Takumi f619b50fab Reformat.
llvm-svn: 273859
2016-06-27 10:26:36 +00:00
NAKAMURA Takumi d377ad806e Reformat blank lines.
llvm-svn: 273858
2016-06-27 10:26:25 +00:00
Benjamin Kramer 020a740b5b [sparc] Simplify slow and verbose string matching code to startswith_lower.
No functionality change intended, found by cppcheck. PR28274.

llvm-svn: 273857
2016-06-27 09:38:56 +00:00
Diana Picus 92423ce194 [ARM] Do not test for CPUs, use SubtargetFeatures (Part 2). NFCI
This is a follow-up for r273544.

The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.

Since the ARM backend seems to have quite a lot of calls to these methods, I
intend to submit 5-6 subtarget features at a time, instead of one big lump.

Differential Revision: http://reviews.llvm.org/D21685

llvm-svn: 273853
2016-06-27 09:08:23 +00:00
Hrvoje Varga 24b975dc66 [mips][micromips] Implement LD, LLD, LWU, SD, DSRL, DSRL32 and DSRLV instructions
Differential Revision: http://reviews.llvm.org/D16625

llvm-svn: 273850
2016-06-27 08:23:28 +00:00
Simon Pilgrim a45da385f8 [X86][AVX] Peek through bitcasts to find the source of broadcasts
AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors.

This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected.

Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node.

Differential Revision: http://reviews.llvm.org/D21660

llvm-svn: 273848
2016-06-27 07:44:32 +00:00
Rafael Espindola 1ac1fa818e Mips: Fix access to private functions.
llvm-svn: 273843
2016-06-27 03:19:40 +00:00
Rafael Espindola e715172791 Use isPositionIndependent. NFC.
llvm-svn: 273829
2016-06-26 22:32:53 +00:00
Rafael Espindola 405e25a970 Use isPositionIndependent predicate. NFC.
llvm-svn: 273827
2016-06-26 22:24:01 +00:00
Rafael Espindola ae0d866f56 Refactor a duplicated predicate. NFC.
llvm-svn: 273826
2016-06-26 22:13:55 +00:00
Craig Topper 8f577fd5b5 [X86] Rewrite lowerVectorShuffleWithPSHUFB to not require a ZeroableMask to be created. We can do everything with the starting mask and zeroable bit vector. This removes the last usage of isSingleInputShuffleMask. NFC
llvm-svn: 273804
2016-06-26 05:10:56 +00:00
Craig Topper 8bba749a48 [X86] Replace calls to isSingleInputShuffleMask with just checking if V2 is UNDEF. Canonicalization and creation of shuffle vector ensures this is equivalent.
llvm-svn: 273803
2016-06-26 05:10:53 +00:00
Craig Topper 9a2e979b3d [X86] Convert ==/!= comparisons with -1 for checking undef in shuffle lowering to comparisons of <0 or >=0. While there do the same for other kinds of index checks that can just check for greater than 0. No functional change intended.
llvm-svn: 273788
2016-06-25 19:05:29 +00:00
Craig Topper 53a39d1a63 [X86] Pull similar bitcasts on different paths to earlier shared point. NFC
llvm-svn: 273787
2016-06-25 19:05:23 +00:00
Jan Vesely 3bc1af2be4 AMDGPU/R600: Fix GlobalValue regressions.
Don't cast GV expression to MCSymbolRefExpr. r272705 changed GV to binary
expressions by including offset even if the offset it 0
(we haven't hit this sooner since tested workloads don't include static offsets)
We don't really care about the type of expression, so set it directly.
Fixes: r272705

Consider section relative relocations. Since all const as data is in one boffer section relative is equivalent to abs32.
Fixes: r273166

Differential Revision: http://reviews.llvm.org/D21633

llvm-svn: 273785
2016-06-25 18:24:16 +00:00
Konstantin Zhuravlyov f2f3d14774 [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.

Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
  - offset 0: work group ID x
  - offset 4: work group ID y
  - offset 8: work group ID z
  - offset 16: work item ID x
  - offset 20: work item ID y
  - offset 24: work item ID z

Set
  - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
  - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
  - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled

Differential Revision: http://reviews.llvm.org/D20335

llvm-svn: 273769
2016-06-25 03:11:28 +00:00
Tom Stellard b164a9843b AMDGPU/SI: Make sure not to fold offsets into local address space globals
Summary:
Offset folding only works if you are emitting relocations, and we don't
emit relocations for local address space globals.

Reviewers: arsenm, nhaustov

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21647

llvm-svn: 273765
2016-06-25 01:59:16 +00:00
Matthias Braun 1e374a7aa6 AMDGPU: Define a schedule class for COPY.
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.

Differential Revision: http://reviews.llvm.org/D21540

llvm-svn: 273751
2016-06-24 23:52:11 +00:00
Rafael Espindola a686f12705 Simplify. NFC.
Also delete out of date comment. This code was always returning .data
since r253436.

llvm-svn: 273739
2016-06-24 22:19:54 +00:00
Krzysztof Parzyszek 709a626015 [Hexagon] Simplify (+fix) instruction selection for indexed loads/stores
llvm-svn: 273733
2016-06-24 21:27:17 +00:00
Rafael Espindola a895a0cd01 Add support for musl-libc on ARM Linux.
Patch by Lei Zhang!

llvm-svn: 273726
2016-06-24 21:14:33 +00:00
Ahmed Bougacha 241e74cbc2 [ARM] Remove dead SDNodes. NFC.
The opcodes are used, but only by DAG->DAG.

llvm-svn: 273717
2016-06-24 20:38:00 +00:00
Ahmed Bougacha 0851ecd1b0 [X86] Remove dead ISD opcodes. NFC.
llvm-svn: 273716
2016-06-24 20:37:55 +00:00
Evandro Menezes 3830479f41 [AArch64] Adjust the model for the vector by element FP multiplies on Exynos M1. (NFC)
llvm-svn: 273708
2016-06-24 18:58:54 +00:00
Rafael Espindola f092cc8a14 Use existing predicate. NFC.
This doesn't handle ELF, but neither did the previous code.

llvm-svn: 273677
2016-06-24 13:28:26 +00:00
Rafael Espindola 01cdf31cab Merge two identical if branches. NFC.
llvm-svn: 273674
2016-06-24 13:08:06 +00:00
Rafael Espindola 41d308689c Merge two identical if branches. NFC.
llvm-svn: 273673
2016-06-24 13:05:20 +00:00
Rafael Espindola ce37f03273 clang-format a region. NFC.
llvm-svn: 273672
2016-06-24 12:58:25 +00:00
Matt Arsenault 86de486d31 AMDGPU: Add stub custom CodeGenPrepare pass
This will do various things including ones
CodeGenPrepare does, but with knowledge of uniform
values.

llvm-svn: 273657
2016-06-24 07:07:55 +00:00
Matt Arsenault c581611e11 AMDGPU: Remove disable-irstructurizer subtarget feature
The only real reason to use it is for testing, so replace
it with a command line option instead of a potentially function
dependent feature.

llvm-svn: 273653
2016-06-24 06:30:22 +00:00
Matt Arsenault 43e92fe306 AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.

llvm-svn: 273652
2016-06-24 06:30:11 +00:00
David Majnemer d770877328 Switch more loops to be range-based
This makes the code a little more concise, no functional change is
intended.

llvm-svn: 273644
2016-06-24 04:05:21 +00:00
Craig Topper 024402dcdf [X86] Combine two nearby calls to isSingleInputShuffleVector. NFC
llvm-svn: 273643
2016-06-24 03:06:11 +00:00
Ahmed Bougacha f0b46ee0aa [ARM] Use aapcs_vfp for ___truncdfhf2 on v7k.
r215348 overrode the f16 libcalls to be soft-float, but
v7k uses the default (hard-float) calling convention.

llvm-svn: 273631
2016-06-24 00:08:01 +00:00
Evandro Menezes 62c70101c3 [AArch64] Model the cost of vector by element FP multiplies on Exynos M1. (NFC)
llvm-svn: 273630
2016-06-23 23:43:23 +00:00
Tom Stellard 14416ae6cd Support/ELF: Add R_AMDGPU_GOTPCREL relocation
Summary:
We will start generating this in a future patch.

Reviewers: arsenm, kzhuravl, rafael, ruiu, tony-tye

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21482

llvm-svn: 273628
2016-06-23 23:11:29 +00:00
Kyle Butt 991df7889b Codegen: [X86] preservere memory refs for folded umul_lohi
Memory references were not being propagated for this folded load. This
prevented optimizations like LICM from hoisting the load.

Added test to verify that this allows LICM to proceed.

llvm-svn: 273617
2016-06-23 21:40:35 +00:00
Rafael Espindola 2d3cce71ee Uses shouldAssumeDSOLocal.
With that SystemZ knows to avoid a GOT for PIE.

llvm-svn: 273614
2016-06-23 21:18:59 +00:00
Rafael Espindola 65787a9e01 Refactor to use shouldAssumeDSOLocal. NFC.
llvm-svn: 273612
2016-06-23 20:50:42 +00:00
Matt Arsenault 8d4b0eddd6 AMDGPU: Add option to disable spilling SGPRs to VGPRs.
This can help debug spilling problems.

llvm-svn: 273605
2016-06-23 20:00:34 +00:00
Rafael Espindola 53fd425e06 Refactor duplicated code. NFC.
llvm-svn: 273595
2016-06-23 18:43:06 +00:00
Michael Kuperstein 0194d30e09 [X86] Extract HiPE prologue constants into metadata
X86FrameLowering::adjustForHiPEPrologue() contains a hard-coded offset
into an Erlang Runtime System-internal data structure (the PCB). As the
layout of this data structure is prone to change, this poses problems
for maintaining compatibility.

To address this problem, the compiler can produce this information as
module-level named metadata. For example (where P_NSP_LIMIT is the
offending offset):

!hipe.literals = !{ !2, !3, !4 }
!2 = !{ !"P_NSP_LIMIT", i32 152 }
!3 = !{ !"X86_LEAF_WORDS", i32 24 }
!4 = !{ !"AMD64_LEAF_WORDS", i32 24 }

Patch by Magnus Lang

Differential Revision: http://reviews.llvm.org/D20363

llvm-svn: 273593
2016-06-23 18:17:25 +00:00
Reid Kleckner 8f4bd1fdf2 Fix the wasm build by including EndianStream.h
llvm-svn: 273591
2016-06-23 18:12:31 +00:00
Nirav Dave bfdb483755 Preserve DebugInfo when replacing values in DAGCombiner
Recommiting after correcting over-eager Debug Value transfer fixing PR28270.

[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.

Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.

This refixes PR9817 which was being incompletely checked in the
testsuite.

Reviewers: jyknight

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D21037

llvm-svn: 273585
2016-06-23 17:52:57 +00:00
Pablo Barrio 7a64346533 [ARM] Lower (select_cc k k (select_cc ~k ~k x)) into (SSAT l_k x)
Summary:
SSAT saturates an integer, making sure that its value lies within
an interval [-k, k]. Since the constant is given to SSAT as the
number of bytes set to one, k + 1 must be a power of 2, otherwise
the optimization is not possible. Also, the select_cc must use <
and > respectively so that they define an interval.

Reviewers: mcrosier, jmolloy, rengolin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D21372

llvm-svn: 273581
2016-06-23 16:53:49 +00:00
Hans Wennborg a8b7d4f73f Revert r273567 "[SystemZ] Let z13 also support FeatureMiscellaneousExtensions."
It broke test/CodeGen/SystemZ/vec-extract-02.ll

llvm-svn: 273575
2016-06-23 16:13:26 +00:00
Jonas Paulsson b1a2b5a708 [SystemZ] Let z13 also support FeatureMiscellaneousExtensions.
This processor feature had been left out by mistake from the z13
ProcessorModel.

Reviewed by Ulrich Weigand.

llvm-svn: 273567
2016-06-23 15:12:06 +00:00
Valery Pykhtin a852d695b8 [AMDGPU] Enable absolute expression initializer for amd_kernel_code_t fields.
Differential Revision: http://reviews.llvm.org/D21380

llvm-svn: 273561
2016-06-23 14:13:06 +00:00
Daniel Sanders de393329b9 [mips] Don't derive the default ABI from the CPU in the backend.
Summary:
The backend has no reason to behave like a driver and should generally do
as it's told (and error out if it can't) instead of trying to figure out
what the API user meant. The default ABI is still derived from the arch
component as a concession to backwards compatibility.

API-users that previously passed an explicit CPU and a triple that was
inconsistent with the CPU (e.g. mips-linux-gnu and mips64r2) may get a
different ABI to what they got before. However, it's expected that there
are no such users on the basis that CodeGen has been asserting that the
triple is consistent with the selected ABI for several releases. API-users
that were consistent or passed '' or 'generic' as the CPU will see no
difference.

Reviewers: sdardis, rafael

Subscribers: rafael, dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21466

llvm-svn: 273557
2016-06-23 12:42:53 +00:00
Diana Picus eb3dd14b95 [ARM] Use member initializers in ARMSubtarget. NFCI
Move most of the initializations in ARMSubtarget::initializeEnvironment to
member initializers.

Change suggested by Matthias Braun (see http://reviews.llvm.org/D21432).

llvm-svn: 273556
2016-06-23 12:04:33 +00:00
Daniel Sanders 8e17bea7d5 [mips][ias] Integers are not registers.
Summary:
When parseAnyRegister() encounters a symbol alias, it parses integers and adds
a corresponding expression to the operand list. This is clearly wrong since the
only operands that parseAnyRegister() should be accepting are registers.

It's not clear why this code was added and there are no test cases that cover
it. I think it might be leftover from when searchSymbolAlias() was more widely
used.

Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21377

llvm-svn: 273555
2016-06-23 10:54:09 +00:00
Diana Picus e440f99913 [AMDGPU] Remove exit-on-error in test (PR27761)
The exit-on-error flag was necessary in order to avoid an assertion when
handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize.

We can avoid the assertion by creating some dummy nodes. This enables us to
remove the exit-on-error flag on the first 2 run lines (SI), but on the third
run line (R600) we would run into another assertion when trying to reserve
indirect registers. This patch also replaces that assertion with an early exit
from the function.

Fixes PR27761.

Differential Revision: http://reviews.llvm.org/D20852

llvm-svn: 273550
2016-06-23 09:19:16 +00:00
Simon Dardis 724e530296 [mips] Fix dext/dins definitions
dext and dins, along with their 'm' and 'u' variants are defined in mips64r2,
not mips64.

Reviewers: dsanders, vkalintiris

Differential Review: http://reviews.llvm.org/D21608

llvm-svn: 273549
2016-06-23 09:06:20 +00:00
Diana Picus c5baa43f53 [ARM] Do not test for CPUs, use SubtargetFeatures (Part 1). NFCI
This is a cleanup commit similar to r271555, but for ARM.

The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.

Since the ARM backend seems to have quite a lot of calls to these methods, I
intend to submit 5-6 subtarget features at a time, instead of one big lump.

Differential Revision: http://reviews.llvm.org/D21432

llvm-svn: 273544
2016-06-23 07:47:35 +00:00
Craig Topper 597aa42fec [AVX512] Remove masked unpack intrinsics and autoupgrade to vectorshuffle and selects.
llvm-svn: 273543
2016-06-23 07:37:33 +00:00
Craig Topper 8f8bd37dd3 [X86] Add assert to ensure only 128-bit vector types are used. 256 or 512-bit would require lane handling which is missing.
llvm-svn: 273542
2016-06-23 07:37:26 +00:00
Eric Christopher 47d372f98e Use C++ comments for large block comment.
llvm-svn: 273526
2016-06-23 01:33:38 +00:00
Matt Arsenault 529cf25e60 AMDGPU: readlane/writelane do not read exec
llvm-svn: 273525
2016-06-23 01:26:16 +00:00
Peter Collingbourne 6717803485 Revert r273456, "Preserve DebugInfo when replacing values in DAGCombiner" as it caused pr28270.
llvm-svn: 273518
2016-06-23 00:06:17 +00:00
Reid Kleckner 5340b279ae [codeview] Add EFLAGS to the list of CodeView register numbers
llvm-svn: 273516
2016-06-22 23:50:19 +00:00
Matt Arsenault 3cb4ddeb4e AMDGPU: Fix liveness when expanding m0 loop
llvm-svn: 273514
2016-06-22 23:40:57 +00:00
Reid Kleckner 858239d5f8 Prune some includes from headers and sink some inline functions
MCSymbol.h shouldn't pull in MCAssembler.h, just MCFragment.h.
MCLinkerOptimizationHint.h shouldn't need MCMachObjectWriter.h.  The
rest is fixing the fallout.

llvm-svn: 273507
2016-06-22 23:23:08 +00:00
Rafael Espindola 928a95d0b0 Use shouldAssumeDSOLocal.
With this it handle -fPIE.

llvm-svn: 273499
2016-06-22 22:09:17 +00:00
Rafael Espindola 45bb5c69a0 Extract a few variables to make 'if' smaller. NFC.
llvm-svn: 273497
2016-06-22 21:56:34 +00:00
Changpeng Fang 47efe1f6db AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32
Reviewers: tstellarAMD, arsenm

Differential Revision: http://reviews.llvm.org/D21533

llvm-svn: 273496
2016-06-22 21:33:49 +00:00
Matt Arsenault 4a07bf6372 AMDGPU: Run verifier after 2nd run of SIShrinkInstructions
llvm-svn: 273469
2016-06-22 20:26:24 +00:00
Matt Arsenault 9babdf4265 AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.

Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.

llvm-svn: 273467
2016-06-22 20:15:28 +00:00
Krzysztof Parzyszek f7f7068109 [Hexagon] Add SDAG preprocessing step to expose shifted addressing modes
Transform: (store ch addr (add x (add (shl y c) e)))
       to: (store ch addr (add x (shl (add y d) c))),
where e = (shl d c) for some integer d.
The purpose of this is to enable generation of loads/stores with
shifted addressing mode, i.e. mem(x+y<<#c). For that, the shift
value c must be 0, 1 or 2.

llvm-svn: 273466
2016-06-22 20:08:27 +00:00
Chad Rosier 8c106bcbe8 [AArch64] Remove an overly aggressive assert.
llvm-svn: 273458
2016-06-22 19:18:52 +00:00
Rafael Espindola 8474fdf90d Start using shouldAssumeDSOLocal on Hexagon.
Include a token test showing that access to private is now the same as
to internal.

llvm-svn: 273457
2016-06-22 19:09:14 +00:00
Nirav Dave 96beb7dee5 Preserve DebugInfo when replacing values in DAGCombiner
Recommiting after fixing over-aggressive assertion

[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.

Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.

This refixes PR9817 which was being incompletely checked in the
testsuite.

Reviewers: jyknight

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D21037

llvm-svn: 273456
2016-06-22 19:03:26 +00:00
Matt Arsenault 0e5befe315 AMDGPU: Make FrameLowering stack alignment 16
We don't need it to be that high. The natural alignment
for a single workitem's stack is 16.

llvm-svn: 273448
2016-06-22 17:47:39 +00:00
Zhan Jun Liau 0df350589f [SystemZ] Recognize RISBG opportunities involving a truncate
Summary:
Recognize RISBG opportunities where the end result is narrower than the
original input - where a truncate separates the shift/and operations.

The motivating case is some code in postgres which looks like:

	srlg	%r2, %r0, 11
	nilh	%r2, 255

Reviewers: uweigand

Author: RolandF

Differential Revision: http://reviews.llvm.org/D21452

llvm-svn: 273433
2016-06-22 16:16:27 +00:00
Krzysztof Parzyszek f228c95f87 [Hexagon] Handle expansion of cmpxchg
llvm-svn: 273432
2016-06-22 16:07:10 +00:00
Krzysztof Parzyszek e116d500a7 [SDAG] Remove FixedArgs parameter from CallLoweringInfo::setCallee
The setCallee function will set the number of fixed arguments based
on the size of the argument list. The FixedArgs parameter was often
explicitly set to 0, leading to a lack of consistent value for non-
vararg functions.

Differential Revision: http://reviews.llvm.org/D20376

llvm-svn: 273403
2016-06-22 12:54:25 +00:00
Rafael Espindola 2b7fef681f Delete more dead code.
Found by gcc 6.

llvm-svn: 273402
2016-06-22 12:44:16 +00:00
Matt Arsenault 180e0d5cef AMDGPU: Fix gcc warnings
Mostly removing dead code. Apparently gcc's warning
for unused functions is better

llvm-svn: 273363
2016-06-22 01:53:49 +00:00
Haicheng Wu a783bac50b [Kryo] Enable loop prefetcher.
Differential Revision: http://reviews.llvm.org/D21535

llvm-svn: 273329
2016-06-21 22:47:56 +00:00
Rafael Espindola 7b4ef068c6 Delete more dead code.
Found by gcc 6.

llvm-svn: 273322
2016-06-21 21:51:41 +00:00
Jan Vesely fea814d531 AMDGPU: Add implicitarg.ptr intrinsic.
Points to the start of implicit arguments (appended after explicit arguments)

Differential Revision: http://reviews.llvm.org/D20297

llvm-svn: 273317
2016-06-21 20:46:20 +00:00
Artem Belevich d7ebcfb291 [NVPTX] Improve lowering of byval args of device functions.
Avoid unnecessary spills of such vars to local space on SASS level and
pointer space conversion.

Instead, make a local copy with appropriate addrspacecasts and let
LLVM optimize them away when possible.

This allows loading value of the argument using [symbol+offset]
instead of converting argument to general space pointer and using it
for indexing (which also implicitly converts param space pointer to
local space one on SASS level and triggers copying of argument into
local space in the process).

This reduces call overhead, uses less registers and reduces overall
SASS size by 2-4%.

Differential Review: http://reviews.llvm.org/D21421

llvm-svn: 273313
2016-06-21 20:30:26 +00:00
Rafael Espindola 463aed879d Add back some dead code.
It was there just to avoid warnings. Add a LLVM_ATTRIBUTE_UNUSED
attribute so that it doesn't produce warnings with gcc 6.

llvm-svn: 273308
2016-06-21 20:09:22 +00:00
Rafael Espindola 48975881ab Delete some dead code.
Found by gcc 6.

llvm-svn: 273303
2016-06-21 19:48:12 +00:00
Etienne Bergeron f6be62f2c8 [StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4
Summary:
Fix the computation of the offsets present in the scopetable when using the
SEH (__except_handler4).

This patch added an intrinsic to track the position of the allocation on the
stack of the EHGuard. This position is needed when producing the ScopeTable.

```
    struct _EH4_SCOPETABLE {
        DWORD GSCookieOffset;
        DWORD GSCookieXOROffset;
        DWORD EHCookieOffset;
        DWORD EHCookieXOROffset;
        _EH4_SCOPETABLE_RECORD ScopeRecord[1];
    };

    struct _EH4_SCOPETABLE_RECORD {
        DWORD EnclosingLevel;
        long (*FilterFunc)();
            union {
            void (*HandlerAddress)();
            void (*FinallyFunc)();
        };
    };
```

The code to generate the EHCookie is added in `X86WinEHState.cpp`.
Which is adding these instructions when using SEH4.

```
Lfunc_begin0:
# BB#0:                                 # %entry
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ebx
	pushl	%edi
	pushl	%esi
	subl	$28, %esp
	movl	%ebp, %eax                <<-- Loading FramePtr
	movl	%esp, -36(%ebp)
	movl	$-2, -16(%ebp)
	movl	$L__ehtable$use_except_handler4_ssp, %ecx
	xorl	___security_cookie, %ecx
	movl	%ecx, -20(%ebp)
	xorl	___security_cookie, %eax  <<-- XOR FramePtr and Cookie
	movl	%eax, -40(%ebp)           <<-- Storing EHGuard
	leal	-28(%ebp), %eax
	movl	$__except_handler4, -24(%ebp)
	movl	%fs:0, %ecx
	movl	%ecx, -28(%ebp)
	movl	%eax, %fs:0
	movl	$0, -16(%ebp)
	calll	_may_throw_or_crash
LBB1_1:                                 # %cont
	movl	-28(%ebp), %eax
	movl	%eax, %fs:0
	addl	$28, %esp
	popl	%esi
	popl	%edi
	popl	%ebx
	popl	%ebp
	retl

```

And the corresponding offset is computed:
```
Luse_except_handler4_ssp$parent_frame_offset = -36
	.p2align	2
L__ehtable$use_except_handler4_ssp:
	.long	-2                      # GSCookieOffset
	.long	0                       # GSCookieXOROffset
	.long	-40                     # EHCookieOffset    <<----
	.long	0                       # EHCookieXOROffset
	.long	-2                      # ToState
	.long	_catchall_filt          # FilterFunction
	.long	LBB1_2                  # ExceptionHandler

```

Clang is not yet producing function using SEH4, but it's a work in progress.
This patch is a step toward having a valid implementation of SEH4.
Unfortunately, it is not yet fully working. The EH registration block is not
allocated at the right offset on the stack.

Reviewers: rnk, majnemer

Subscribers: llvm-commits, chrisha

Differential Revision: http://reviews.llvm.org/D21231

llvm-svn: 273281
2016-06-21 15:58:55 +00:00
Evandro Menezes 230083ff9d [AArch64] Change the preferred alignment for char and short to word alignment
Differential Revision: http://reviews.llvm.org/D21414

llvm-svn: 273279
2016-06-21 15:55:18 +00:00
Silviu Baranga aee40fc61c [AArch64] Restore codegen for AArch64 Cortex-A72/A73 after NFCI
Summary:
Code generation for Cortex-A72/Cortex-A73 was accidentally changed
by r271555, which was a NFCI. The isCortexA57() predicate was not true
for Cortex-A72/Cortex-A73 before r271555 (since it was checking the CPU
string). Because Cortex-A72/Cortex-A73 inherit all features from Cortex-A57,
all decisions previously guarded by isCortexA57() are now taken.

This change restores the behaviour before r271555 by adding separate
ProcA72/ProcA73, which have the required features to preserve code
generation.

Reviewers: kristof.beyls, aadg, mcrosier, rengolin

Subscribers: mcrosier, llvm-commits, aemerson, t.p.northover, MatzeB, rengolin

Differential Revision: http://reviews.llvm.org/D21182

llvm-svn: 273277
2016-06-21 15:53:54 +00:00
Etienne Bergeron 715ec09dcf fix indentation
llvm-svn: 273274
2016-06-21 15:21:04 +00:00
Rafael Espindola 3d6a130fee Define a isPositionIndependent helper for ARMAsmPrinter. NFC.
llvm-svn: 273261
2016-06-21 14:21:53 +00:00
Craig Topper 283418fbb6 [AVX512] Add patterns for any-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX.
llvm-svn: 273253
2016-06-21 07:37:32 +00:00
David Majnemer e61e4bfd87 Replace silly uses of 'signed' with 'int'
llvm-svn: 273244
2016-06-21 05:10:24 +00:00
Craig Topper 0a0fb0fda1 [AVX512] Remove the masked vpcmpeq/vcmpgt intrinsics and autoupgrade them to native icmps.
llvm-svn: 273240
2016-06-21 03:53:24 +00:00
Craig Topper e4cf09ad07 [X86] Pre-allocate SmallVector instead of using push_back in a loop. NFC
llvm-svn: 273234
2016-06-21 03:05:40 +00:00
Rafael Espindola 0d34826218 Simplify PICStyles.
The main difference is that StubDynamicNoPIC is gone. The
dynamic-no-pic mode as the name implies is simply not pic. It is just
conservative about what it assumes to be dso local.

llvm-svn: 273222
2016-06-20 23:41:56 +00:00
Simon Pilgrim 356e823b51 [X86][SSE] Add cost model for BSWAP of vectors
The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization.

Differential Revision: http://reviews.llvm.org/D21521

llvm-svn: 273217
2016-06-20 23:08:21 +00:00
Simon Pilgrim 225b2e37a0 [X86][X87] Fix issue with sitofp i64 -> fp128 on 32-bit targets
Fix for PR27726 - sitofp i64 to fp128 was loading the merged load i64 to a x87 register preventing legalization for conversion to fp128.

Added 32-bit tests for fp128 cast/conversions.

llvm-svn: 273210
2016-06-20 22:41:17 +00:00
Rafael Espindola 94eb31a7a9 Delete dead code. NFC.
llvm-svn: 273206
2016-06-20 22:08:35 +00:00
Rafael Espindola 524bcbf1f3 Add a isPositionIndependent helper to ARMFastISel. NFC.
llvm-svn: 273187
2016-06-20 19:00:05 +00:00
Evandro Menezes 8057265cc2 [AArch64] Adjust the loop buffer size for Exynos M1 (NFC)
llvm-svn: 273185
2016-06-20 18:39:41 +00:00
Matt Arsenault 2209625387 AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32
The implicit operand is added by the initial instruction construction,
so this was adding an additional vcc use. The original one
was missing the undef flag the original condition had,
so the verifier would complain.

llvm-svn: 273182
2016-06-20 18:34:00 +00:00
Matt Arsenault b6d8c37e1a AMDGPU: Fold more custom nodes to undef
This will help sneak undefs past GVN into the DAG for
some tests.

Also add missing intrinsic for rsq_legacy, even though the node
was already selected to the instruction. Also start passing
the debug location to intrinsic errors.

llvm-svn: 273181
2016-06-20 18:33:56 +00:00
Matt Arsenault ff98241f37 Generalize DiagnosticInfoStackSize to support other limits
Backends may want to report errors on resources other than
stack size.

llvm-svn: 273177
2016-06-20 18:13:04 +00:00
Matt Arsenault a9720c67f1 AMDGPU: Use correct method for determining instruction size
llvm-svn: 273172
2016-06-20 17:51:32 +00:00
Rafael Espindola 959e9c8d01 Use shouldAssumeDSOLocal.
With this ARM fast isel knows that PIE variable are not preemptable.

llvm-svn: 273169
2016-06-20 17:45:33 +00:00
Tom Stellard 5350894265 AMDGPU: Add support for R_AMDGPU_REL32 relocations
Reviewers: arsenm, kzhuravl, rafael

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21401

llvm-svn: 273168
2016-06-20 17:33:43 +00:00
Rafael Espindola 9935766458 Simplify. NFC.
llvm-svn: 273167
2016-06-20 17:00:13 +00:00
Tom Stellard 1c89eb7db0 AMDGPU: Emit R_AMDGPU_ABS32_{HI,LO} for scratch buffer relocations
Reviewers: arsenm, rafael, kzhuravl

Subscribers: rafael, arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21400

llvm-svn: 273166
2016-06-20 16:59:44 +00:00
Sam Parker d616cf07b2 [ARM] Enable isel of UMAAL
TargetLowering and DAGToDAG are used to combine ADDC, ADDE and UMLAL
dags into UMAAL. Selection is split into the two phases because it
is easier to match the two patterns at those different times.

Differential Revision: http://http://reviews.llvm.org/D21461

llvm-svn: 273165
2016-06-20 16:47:09 +00:00
Rafael Espindola 0f89833c31 Add a isPositionIndependent predicate.
Reduces a bit of code duplication and clarify where we are interested
just on position independence and no the location of the symbol.

llvm-svn: 273164
2016-06-20 16:43:17 +00:00
Aaron Ballman 86100fc8be Removing an unused switch statement that has only a default label. This happens to also eliminate an instance of switchception. NFC intended.
llvm-svn: 273161
2016-06-20 15:37:15 +00:00
Pankaj Gode 0aab2e398a [AARCH64] Add support for Broadcom Vulcan
Adding core tuning support for new Broadcom Vulcan core (ARMv8.1A).

Differential Revision: http://reviews.llvm.org/D21500

llvm-svn: 273148
2016-06-20 11:13:31 +00:00
Igor Breger e59165ca63 [AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering.
Differential Revision: http://reviews.llvm.org/D20897

llvm-svn: 273138
2016-06-20 07:05:43 +00:00
Craig Topper 4296c025c0 [X86] Pass the SDLoc and Mask ArrayRef down from lowerVectorShuffle through all of the other routines instead of recreating them in the handlers for each type. NFC
llvm-svn: 273137
2016-06-20 04:00:55 +00:00
Craig Topper ddf5d2a4a5 [X86] Use existing ArrayRef variable instead of calling SVOp->getMask() repeatedly. Remove nearby else after return as well. NFC
llvm-svn: 273136
2016-06-20 04:00:53 +00:00
Craig Topper 01ef65dd79 [X86] Avoid making a copy of a shuffle mask until we're sure we really need to. And just use a SmallVector to do the copy because its easy.
llvm-svn: 273135
2016-06-20 04:00:50 +00:00
NAKAMURA Takumi fd92154b20 Reformat blank lines.
llvm-svn: 273131
2016-06-20 01:05:15 +00:00
NAKAMURA Takumi ae7c97d39d Trailing whitespace.
llvm-svn: 273130
2016-06-20 00:49:20 +00:00
NAKAMURA Takumi fe1202c4cb Untabify.
llvm-svn: 273129
2016-06-20 00:37:41 +00:00
Simon Pilgrim 0887d5b02e [X86][AVX512] Added 512-bit BITREVERSE tests and enabled AVX512BW lowering support
llvm-svn: 273125
2016-06-19 20:59:19 +00:00
Simon Pilgrim 0c62bc0324 Strip trailing whitespace. NFCI.
llvm-svn: 273124
2016-06-19 20:22:43 +00:00
Simon Pilgrim 2b007189b0 Fixed signed/unsigned warning.
llvm-svn: 273120
2016-06-19 18:20:44 +00:00
Simon Pilgrim 3d881a0230 [X86][SSE] Allow target shuffle combining to match masks with SM_Sentinel values
We currently only allow exact matches of shuffle mask patterns during target shuffle combining.

This patch relaxes this to permit SM_SentinelUndef in the combined shuffle to always be accepted as well as allowing exact matching of the SM_SentinelZero value.

I've adjusted some tests that were requiring exact shuffle masks to now include undef values.

Differential Revision: http://reviews.llvm.org/D21495

llvm-svn: 273119
2016-06-19 18:03:52 +00:00
Craig Topper bbb9a8d255 [X86] Add an assert to ensure that a routine is only used with 128-bit vectors. Reduce SmallVector size accordingly.
llvm-svn: 273117
2016-06-19 15:37:39 +00:00
Craig Topper 969457e0e3 [X86] Make is128BitLaneRepeatedShuffleMask correct the indices of the second vector for the smaller mask. This removes some custom correction code and can potentially provide other benefits in the future.
llvm-svn: 273116
2016-06-19 15:37:37 +00:00
Craig Topper 54ec3d6b1b [X86] Remove a dead path through one of the shuffle lowering routines. It's only called on single input shuffles masks already. Add an assert instead to verify.
llvm-svn: 273115
2016-06-19 15:37:35 +00:00
Craig Topper ae21810ce4 [X86] Pre-allocate a SmallVector instead of using push_back in a loop. NFC
llvm-svn: 273114
2016-06-19 15:37:33 +00:00
Craig Topper 4181c03c54 [X86] Use SmallVector::assign instead of resize to ensure we really start with a vector of all -1s. Otherwise we're trusting the caller to pass the right thing.
This should be no functional change with current code.

llvm-svn: 273113
2016-06-19 15:37:30 +00:00
Chris Dewhurst d03d5653bc [SPARC] Additional condition required for DelaySlot fixing erratum in revision r273108.
llvm-svn: 273111
2016-06-19 12:56:42 +00:00
Chris Dewhurst 0c1e0026aa [SPARC] Fixes for hardware errata on LEON processor.
Passes to fix three hardware errata that appear on some LEON processor variants.

The instructions FSMULD, FMULS and FDIVS do not work as expected on some LEON processors. This change allows those instructions to be substituted for alternatives instruction sequences that are known to work.

These passes only run when selected individually, or as part of a processor defintion. They are not included in general SPARC processor compilations for non-LEON processors or for those LEON processors that do not have these hardware errata.

llvm-svn: 273108
2016-06-19 11:03:28 +00:00
Joerg Sonnenberger 2298203056 doesSetDirectiveSuppressesReloc -> doesSetDirectiveSuppressReloc, the
former is grammatically incorrect.

llvm-svn: 273100
2016-06-18 23:25:37 +00:00
Zvi Rackover b346eaa647 test commit: remove trailing whitespace
llvm-svn: 273094
2016-06-18 19:13:38 +00:00
Vasileios Kalintiris 0cf68df6cc [mips] Emit a JALR with $rd equal to $zero, instead of a JR in MIPS32R6.
Summary:
JR is an alias of JALR with $rd=0 in the R6 ISA. Also, this fixes recursive
builds in MIPS32R6.

Reviewers: dsanders, sdardis

Subscribers: jfb, dschuff, dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21370

llvm-svn: 273085
2016-06-18 15:39:43 +00:00
Matt Arsenault e935f05a94 AMDGPU: Fix kernel argument alignment impacting stack size
Don't use AllocateStack because kernel arguments have nothing
to do with the stack. The ensureMaxAlignment call was still
changing the stack alignment.

llvm-svn: 273080
2016-06-18 05:15:53 +00:00
Simon Pilgrim f4b2af1b9f [X86][SSE4A] Autoupgrade and remove MOVNTSD/MOVNTSS intrinsics
Required better annotation of the instruction defs upon removal of the builtin intrinsic pattern.

llvm-svn: 273077
2016-06-18 02:38:26 +00:00
Davide Italiano ef5d8bead1 [X86Subtarget] Use isPositionIndependent(). NFC.
Differential Revision:  http://reviews.llvm.org/D21480

llvm-svn: 273071
2016-06-18 00:03:20 +00:00
Matt Arsenault 0bb294b224 AMDGPU: Temporarily select trap to s_endpgm
This should select to s_trap, but that requires
additonal work to setup and enable the trap handler.
For now emit s_endpgm so bugpoint stops getting stuck
on the unsupported call to abort.

Emit a warning that this will only terminate the wave and
not really trap.

llvm-svn: 273062
2016-06-17 22:27:03 +00:00
Tom Stellard 0114bb5aa0 AMDGPU/SI: Simplify code in SITargetLowering::LowerGlobalAddress()
This change were suggested in http://reviews.llvm.org/D21154.

llvm-svn: 273059
2016-06-17 22:22:09 +00:00
Matt Arsenault 8885910f8e AMDGPU: Remove llvm.SI.tid intrinsic
Mesa doesn't emit this for llvm >= 3.8 anymore.

llvm-svn: 273050
2016-06-17 21:18:41 +00:00
Michael Kuperstein 18d6d3d95e [X86] Add missing AVX512 anyext patterns.
Add AVX512 anyext patterns for i16 and i64, modeled on the existing i8 and
i32 patterns.

llvm-svn: 273038
2016-06-17 20:21:17 +00:00
Benjamin Kramer 4dea8f542b Avoid duplicated map lookups. No functionality change intended.
llvm-svn: 273030
2016-06-17 18:59:41 +00:00
Tim Northover 28a9e7f4ba ARM: take account of possible bundle when erasing an instruction.
Fortunately this appears to be the only ARM-specific pass that runs while
bundles might be in play, so no other cases need modifying.

llvm-svn: 273029
2016-06-17 18:40:46 +00:00
James Y Knight 148a6469dc Support expanding partial-word cmpxchg to full-word cmpxchg in AtomicExpandPass.
Many CPUs only have the ability to do a 4-byte cmpxchg (or ll/sc), not 1
or 2-byte. For those, you need to mask and shift the 1 or 2 byte values
appropriately to use the 4-byte instruction.

This change adds support for cmpxchg-based instruction sets (only SPARC,
in LLVM). The support can be extended for LL/SC-based PPC and MIPS in
the future, supplanting the ISel expansions those architectures
currently use.

Tests added for the IR transform and SPARCv9.

Differential Revision: http://reviews.llvm.org/D21029

llvm-svn: 273025
2016-06-17 18:11:48 +00:00
Davide Italiano 4cccc488b7 [Codegen] Change PICLevel.
We convert `Default` to `NotPIC` so that target independent code
can reason about this correctly.

Differential Revision:  http://reviews.llvm.org/D21394

llvm-svn: 273024
2016-06-17 18:07:14 +00:00
Nirav Dave fd91041ce1 Refactor and cleanup Assembly Parsing / Lexing
Recommiting after fixing non-atomic insert to front of SmallVector in
MCAsmLexer.h

Add explicit Comment Token in Assembly Lexing for future support for
outputting explicit comments from inline assembly. As part of this,
CPPHash Directives are now explicitly distinguished from Hash line
comments in Lexer.

Line comments are recorded as EndOfStatement tokens, not Comment tokens
to simplify compatibility with current TargetParsers. This slightly
complicates comment output.

This remove all lexing tasks out of the parser, does minor cleanup
to remove extraneous newlines Asm Output, and some improvements white
space handling.

Reviewers: rtrieu, dwmw2, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20009

llvm-svn: 273007
2016-06-17 16:06:17 +00:00
Benjamin Kramer f690da4864 [ARM] Strength reduce vectors to arrays.
No functionality change intended.

llvm-svn: 273001
2016-06-17 14:14:29 +00:00
Benjamin Kramer 1d67ac5639 [PPC] Strength-reduce SmallVectors into arrays.
No functionality change intended.

llvm-svn: 272999
2016-06-17 13:15:10 +00:00
Craig Topper 1f083543c9 [X86] Pre-size several SmallVectors instead of calling push_back in a loop. NFC
llvm-svn: 272997
2016-06-17 12:20:50 +00:00
Craig Topper 07984f2068 [X86] Fix formatting. NFC
llvm-svn: 272996
2016-06-17 12:20:48 +00:00
Ranjeet Singh 39d2d097d6 [ARM] Add support for mrrc/mrrc2 intrinsics.
Reapplying patch as it was reverted when it was first
committed because of an assertion failure when the
mrrc2 intrinsic was called in ARM mode. The failure
was happening because the instruction was being built
in ARMISelDAGToDAG.cpp and the tablegen description for
mrrc2 instruction doesn't allow you to use a predicate.

The ARM architecture manuals do say that mrrc2 in ARM
mode can be predicated with AL in assembly but this has
no effect on the encoding of the instruction as the top
4 bits will always be 1111 not 1110 which is the encoding
for the condition AL.

Differential Revision: http://reviews.llvm.org/D21408

llvm-svn: 272982
2016-06-17 00:52:41 +00:00
Matt Arsenault f1c3906a5d AArch64: Fix range loop contradicting comment above it
llvm-svn: 272959
2016-06-16 21:21:49 +00:00
Changpeng Fang 3e06e1edac AMDGPU/SI: Propagate the Kill flag in storeRegToStackSlot and eliminateFrameIndex
Reviewers: arsenm, tstellarAMD

Differential Revision:  http://reviews.llvm.org/21438

llvm-svn: 272958
2016-06-16 21:20:47 +00:00
Nirav Dave 280ecf6ff0 Revert "Refactor and cleanup Assembly Parsing / Lexing"
Reverting for unexpected crashes on various platforms.

This reverts commit r272953.

llvm-svn: 272957
2016-06-16 21:19:23 +00:00
Matt Arsenault 01e062f5c6 AMDGPU: Fix maximum instruction size for amdgcn
This was causing the conservative estimate of inline asm
size to be twice as big as expected.

llvm-svn: 272956
2016-06-16 21:14:05 +00:00
Nirav Dave c19c3260df Refactor and cleanup Assembly Parsing / Lexing
Add explicit Comment Token in Assembly Lexing for future support for
outputting explicit comments from inline assembly. As part of this,
CPPHash Directives are now explicitly distinguished from Hash line
comments in Lexer.

Line comments are recorded as EndOfStatement tokens, not Comment tokens
to simplify compatibility with current TargetParsers. This slightly
complicates comment output.

This remove all lexing tasks out of the parser, does minor cleanup
to remove extraneous newlines Asm Output, and some improvements white
space handling.

Reviewers: rtrieu, dwmw2, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20009

llvm-svn: 272953
2016-06-16 20:34:22 +00:00
Rafael Espindola 498b9e06c8 Refactor more duplicated code.
llvm-svn: 272939
2016-06-16 19:30:55 +00:00
Sanjoy Das 0ebc9616b4 NFC; refactor getFrameIndexReferenceFromSP
Summary:
... into getFrameIndexReferencePreferSP.  This change folds the
fail-then-retry logic into getFrameIndexReferencePreferSP.

There is a non-functional but behaviorial change in WinException --
earlier if `getFrameIndexReferenceFromSP` failed we'd trip an assert,
but now we'll silently use the (wrong) offset from the base pointer.  I
could not write the assert I'd like to write ("FrameReg ==
StackRegister", like I've done in X86FrameLowering) since there is no
easy way to get to the stack register from WinException (happy to be
proven wrong here).  One solution to this is to add a `bool
OnlyStackPointer` parameter to `getFrameIndexReferenceFromSP` that
asserts if it could not satisfy its promise of returning an offset from
a stack pointer, but that seems overkill.

Reviewers: rnk

Subscribers: sanjoy, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D21427

llvm-svn: 272938
2016-06-16 18:54:06 +00:00
Rafael Espindola ed44cf6ccd Refactor duplicated code.
llvm-svn: 272936
2016-06-16 18:50:12 +00:00
Sanjay Patel 0e9afea3c8 [x86] autoupgrade and remove AVX2 integer min/max intrinsics
This will (hopefully very temporarily) break clang.
The clang side of this should be the next commit.

llvm-svn: 272932
2016-06-16 18:44:20 +00:00
Wei Ding ab3d91b8f1 AMDGPU: Add v_mad 16-bit instructions definition.
Differential Revision: http://reviews.llvm.org/D21362

llvm-svn: 272919
2016-06-16 16:50:04 +00:00
Rafael Espindola afade35003 Don't print (PLT) on arm.
The R_ARM_PLT32 relocation is deprecated and is not produced by MC.

This means that the code being deleted is dead from the .o point of
view and was making the .s more confusing.

llvm-svn: 272909
2016-06-16 16:09:53 +00:00
Sanjay Patel 51ab757941 [x86] autoupgrade and remove SSE2/SSE41 integer min/max intrinsics
Follow-up to:
http://reviews.llvm.org/rL272806
http://reviews.llvm.org/rL272807

llvm-svn: 272907
2016-06-16 15:48:30 +00:00
Rafael Espindola 9ba9c5bde5 Refactor duplicated code. NFC.
llvm-svn: 272905
2016-06-16 15:44:06 +00:00
Rafael Espindola c1d739f253 Refactor duplicated code. NFC.
llvm-svn: 272904
2016-06-16 15:40:24 +00:00
Rafael Espindola c24f0eeb8d Refactor duplicated code. NFC.
llvm-svn: 272903
2016-06-16 15:31:06 +00:00
Rafael Espindola 3888bdb022 Refactor duplicated code. NFC.
llvm-svn: 272901
2016-06-16 15:22:01 +00:00
Vasileios Kalintiris 22ec97fb24 [mips] Fix small typo. NFC.
llvm-svn: 272895
2016-06-16 14:25:13 +00:00
Daniel Sanders de7816b0cd [mips][mips16] Fix machine verifier errors about incorrect register classes on load/stores.
Summary:
[ls][bh] and [ls][bh]u cannot use sp-relative addresses and must therefore
lower frameindex nodes such that there is a copy to a CPU16Regs register. This
is now done consistently using a separate addressing mode that does not
permit frameindex nodes.

As part of this I've had to remove an optimization that reduced the number of
instructions needed to work around the lack of sp-relative addresses on [ls][bh]
and [ls][bh]u. This optimization used one of the eight CPU16Regs registers as
a copy of the stack pointer and it's implementation was the root cause of many
of the register vs register class mismatches.

lw/sw can use sp-relative addresses but we ought to ensure that we use the
correct version of lw/sw internally for things like IAS. This is not currently
the case and this change does not fix this. However, this change does clean it
up sufficiently well to fix the machine verifier failures.

Also removed irrelevant functions from stchar.ll.

Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21062

llvm-svn: 272882
2016-06-16 10:20:59 +00:00
Daniel Sanders 1d14864bb3 [llvm-objdump] Support detection of feature bits from the object and implement this for Mips.
Summary:
The Mips implementation only covers the feature bits described by the ELF
e_flags so far. Mips stores additional feature bits such as MSA in the
.MIPS.abiflags section.

Also fixed a small bug this revealed where microMIPS wouldn't add the
EF_MIPS_MICROMIPS flag when using -filetype=obj.

Reviewers: echristo, rafael

Subscribers: rafael, mehdi_amini, dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21125

llvm-svn: 272880
2016-06-16 09:17:03 +00:00
Hrvoje Varga f1e0a03d08 [mips][micromips] Implement DCLO, DCLZ, DROTR, DROTR32 and DROTRV instructions
Differential Revision: http://reviews.llvm.org/D16917

llvm-svn: 272876
2016-06-16 07:06:25 +00:00
Craig Topper 97b1fc92e8 [X86] Pre-size some SmallVectors using the constructor in the shuffle lowering code instead of using push_back. Some of these already did this but used resize or assign instead of the constructor. NFC
llvm-svn: 272872
2016-06-16 03:58:45 +00:00
Craig Topper 66f1a8b608 [X86] Remove else after return. NFC
llvm-svn: 272871
2016-06-16 03:58:42 +00:00
Craig Topper ceda65bdc4 [X86] Inline a couple lambdas into their callers since they are only used once and it all fits on a single line. NFC
llvm-svn: 272869
2016-06-16 03:11:00 +00:00
Tim Northover daa1c018b0 AArch64: allow MOV (imm) alias to be printed
The backend has been around for years, it's pretty ridiculous that we can't
even use the preferred form for printing "MOV" aliases. Unfortunately, TableGen
can't handle the complex predicates when printing so it's a bunch of nasty C++.
Oh well.

llvm-svn: 272865
2016-06-16 01:42:25 +00:00
Eric Christopher 87590fae55 Tidy the asm parser: 80-col, whitespace.
llvm-svn: 272861
2016-06-16 01:00:53 +00:00
Krzysztof Parzyszek f2a4f8f10a [Hexagon] Fix/simplify some conditional statements
Fix for PR28138.

llvm-svn: 272836
2016-06-15 21:05:04 +00:00
Kevin B. Smith 4f81990049 [X86]: Fix for uninitialized access introduced in r272797.
llvm-svn: 272835
2016-06-15 20:52:19 +00:00
Tim Northover 389a1e39ea AArch64: stop trying to use 32-bit MOVZs when expanding patchpoints.
Of course the assembly was right but because the opcode was MOVZWi it was
encoded as "movz w16, #65535, lsl #32" which is an unallocated encoding and
would go horribly wrong on a CPU.

No idea how this bug survived this long. It seems nobody is using that aspect
of patchpoints.

llvm-svn: 272831
2016-06-15 20:33:36 +00:00
Sanjay Patel 1a4569df54 [x86] add folds for x86 vector compare nodes (PR27924)
Ideally, we can get rid of most x86 LLVM intrinsics by transforming them to IR (and some of that happened 
with http://reviews.llvm.org/rL272807), but it doesn't cost much to have some simple folds in the backend
too while we're working on that and as a backstop.

This fixes:
https://llvm.org/bugs/show_bug.cgi?id=27924

Differential Revision: http://reviews.llvm.org/D21356

llvm-svn: 272828
2016-06-15 20:26:58 +00:00
Kevin B. Smith acbda9ef30 [X86]: Updated r272801 to promote 16 bit compares with immediate operand
to 32 bits. This is in response to a comment by Eli Friedman.

llvm-svn: 272814
2016-06-15 18:18:05 +00:00
Pankaj Gode a67fea464c Test commit after access grant. Modified comment by adding a period.
llvm-svn: 272808
2016-06-15 17:24:52 +00:00
Sanjay Patel 30e0456562 [x86] fix function name; NFC
llvm-svn: 272805
2016-06-15 17:12:29 +00:00
Kevin B. Smith 54566a0e9a [X86]: Quit promoting 8 and 16 bit compares to 32 bit.
Differential Revision: http://reviews.llvm.org/D21144

llvm-svn: 272801
2016-06-15 16:37:46 +00:00
Nirav Dave 194cb55f37 Revert "Preserve DebugInfo when replacing values in DAGCombiner"
Reverting due to assertion failure in
lib/CodeGen/SelectionDAG/InstrEmitter.cpp

This reverts commit r272792.

llvm-svn: 272799
2016-06-15 16:08:50 +00:00
Kevin B. Smith c3c82cdbd0 [X86]: Improve Liveness checking for X86FixupBWInsts.cpp
Differential Revision: http://reviews.llvm.org/D21085

llvm-svn: 272797
2016-06-15 16:03:06 +00:00
Vasileios Kalintiris 7b4ab98b03 [mips] Eliminate unused code for addrRegReg complex pattern. NFC.
Reviewers: dsanders, sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21381

llvm-svn: 272794
2016-06-15 15:30:07 +00:00
Nirav Dave a72e308403 Preserve DebugInfo when replacing values in DAGCombiner
[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.

Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.

This refixes PR9817 which was being incompletely checked in the
testsuite.

Reviewers: jyknight

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D21037

llvm-svn: 272792
2016-06-15 14:50:08 +00:00
Ranjeet Singh 0db7be886e Reverting r272778 because there's an assertion
failure when running the test CodeGen/ARM/intrinsics-coprocessor.ll

llvm-svn: 272791
2016-06-15 14:23:29 +00:00
Valery Pykhtin 02e2086e41 [AMDGPU] Fix few coding style issues. NFC.
llvm-svn: 272785
2016-06-15 13:55:09 +00:00
Ranjeet Singh 351364fe76 [ARM] Add support for mrrc/mrrc2 intrinsics.
Differential Revision: http://reviews.llvm.org/D21178

llvm-svn: 272778
2016-06-15 11:32:24 +00:00
Daniel Sanders f2895d344d [mips] Replace AdditionalRequires<[IsGP64bit]> with GPR_64. NFC.
Summary: Also fixed one case where HasMips64 was being used instead of IsGP64bit.

Reviewers: sdardis

Subscribers: dsanders, llvm-commits, sdardis

Differential Revision: http://reviews.llvm.org/D21028

llvm-svn: 272771
2016-06-15 10:36:16 +00:00
Daniel Sanders 8015c708ea [mips] clang-format Mips16ISelDAGToDAG.{cpp,h}
llvm-svn: 272768
2016-06-15 09:44:22 +00:00
Daniel Sanders d3bb20821d [mips][msa] Fix register/register-class mismatches in emitINSERT_DF_VIDX().
Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21068

llvm-svn: 272765
2016-06-15 08:43:23 +00:00
Zlatko Buljan d2ed9c6c2c [mips][microMIPS] Add CodeGen support for AND*, OR16, OR*, XOR*, NOT16 and NOR instructions
Differential Revision: http://reviews.llvm.org/D16719

llvm-svn: 272764
2016-06-15 07:46:24 +00:00
Igor Breger 64cfd3a442 [AVX512] Fix BLENDM lowering patterns. Operands should be swapped to match SELECT behavior.
Use BLENDM instead of masked move instruction.

Differential Revision: http://reviews.llvm.org/D21001

llvm-svn: 272763
2016-06-15 07:30:38 +00:00
Sanjoy Das 4f7a86c74d Push a dependent computation into the assert that uses it; NFC
... instead of explicitly conditioning on NDEBUG.  Also use an easier to
read conditional expression.

(Addresses post-commit review from David Blaikie.)

llvm-svn: 272762
2016-06-15 07:27:04 +00:00
Nicolai Haehnle a609259832 AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsics
Summary:
This fixes two related bugs. First, the generic optimization passes
unfortunately generate negative constant offsets but the hardware treats
SOffset as an unsigned value.

Second, there is a hardware bug on SI and CI, where address clamping in MUBUF
instructions does not work correctly when SOffset is larger than the buffer
size. This patch works around this bug by never using SOffset.

An alternative workaround would be to do the clamping manually when SOffset
is too large, but generating the required code sequence during instruction
selection would be rather involved, and in any case the resulting code would
probably be worse.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21326

llvm-svn: 272761
2016-06-15 07:13:05 +00:00
Sanjoy Das 3f59c0c3ab Fix unused variable warning; NFC
TailCallReturnAddrDelta is used only in an assert, so put it under
defined(NDEBUG).

llvm-svn: 272760
2016-06-15 06:53:59 +00:00
Sanjoy Das 0272be206a Don't force SP-relative addressing for statepoints
Summary:
...  when the offset is not statically known.

Prioritize addresses relative to the stack pointer in the stackmap, but
fallback gracefully to other modes of addressing if the offset to the
stack pointer is not a known constant.

Patch by Oscar Blumberg!

Reviewers: sanjoy

Subscribers: llvm-commits, majnemer, rnk, sanjoy, thanm

Differential Revision: http://reviews.llvm.org/D21259

llvm-svn: 272756
2016-06-15 05:35:14 +00:00
Tom Stellard 82785e9fe7 AMDGPU/SI: Correctly encode constant expressions
Summary:
We we have an MCConstantExpr, we can encode it directly into the instruction
instead of emitting fixups.

Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21236

Change-Id: I88b3edf288d48e65c5d705fc4850d281f8e36948
llvm-svn: 272750
2016-06-15 03:09:39 +00:00
Tom Stellard 89049702ce AMDGPU/AsmParser: Add support for parsing symbol operands
Summary:
We can now reference symbols directly in operands, like this:
s_mov_b32 s0, global

Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21038

llvm-svn: 272748
2016-06-15 02:54:14 +00:00
David Majnemer cbf614a93b Remove the ScalarReplAggregates pass
Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM.  LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.

Differential Revision: http://reviews.llvm.org/D21316

llvm-svn: 272737
2016-06-15 00:19:09 +00:00
Matt Arsenault f42c69206d AMDGPU: Run pointer optimization passes
llvm-svn: 272736
2016-06-15 00:11:01 +00:00
Peter Collingbourne 96efdd6107 IR: Introduce local_unnamed_addr attribute.
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.

This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
  the normal rule that the global must have a unique address can be broken without
  being observable by the program by performing comparisons against the global's
  address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
  its own copy of the global if it requires one, and the copy in each linkage unit
  must be the same)
- It is a constant or a function (which means that the program cannot observe that
  the unique-address rule has been broken by writing to the global)

Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.

See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.

Part of the fix for PR27553.

Differential Revision: http://reviews.llvm.org/D20348

llvm-svn: 272709
2016-06-14 21:01:22 +00:00
Tom Stellard bf3e6e5bb4 AMDGPU/SI: Refactor fixup handling for constant addrspace variables
Summary:
We now use a standard fixup type applying the pc-relative address of
constant address space variables, and we have the GlobalAddress lowering
code add the required 4 byte offset to the global address rather than
doing it as part of the fixup.

This refactoring will make it easier to use the same code for global
address space variables and also simplifies the code.

Re-commit this after fixing a bug where we were trying to use a
reference to a Triple object that had already been destroyed.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21154

llvm-svn: 272705
2016-06-14 20:29:59 +00:00
Wei Mi b799a625f9 [X86] Reduce the width of multiplification when its operands are extended from i8 or i16
For <N x i32> type mul, pmuludq will be used for targets without SSE41, which
often introduces many extra pack and unpack instructions in vectorized loop
body because pmuludq generates <N/2 x i64> type value. However when the operands
of <N x i32> mul are extended from smaller size values like i8 and i16, the type
of mul may be shrunk to use pmullw + pmulhw/pmulhuw instead of pmuludq, which
generates better code. For targets with SSE41, pmulld is supported so no
shrinking is needed.

Differential Revision: http://reviews.llvm.org/D20931

llvm-svn: 272694
2016-06-14 18:53:20 +00:00
Tom Stellard b1a523fa68 Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables"
This reverts commit r272675.

llvm-svn: 272677
2016-06-14 15:16:35 +00:00
Tom Stellard 5e6298b0f2 AMDGPU/SI: Refactor fixup handling for constant addrspace variables
Summary:
We now use a standard fixup type applying the pc-relative address of
constant address space variables, and we have the GlobalAddress lowering
code add the required 4 byte offset to the global address rather than
doing it as part of the fixup.

This refactoring will make it easier to use the same code for global
address space variables and also simplifies the code.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21154

llvm-svn: 272675
2016-06-14 15:11:01 +00:00
Artem Tamazov 17091364d1 [AMDGPU][llvm-mc] Predefined symbols to access -mcpu from the assembly source (.option.machine_version...)
The feature allows for conditional assembly etc.
TODO: make those symbols read-only.
Test added.

Differential Revision: http://reviews.llvm.org/D21238

llvm-svn: 272673
2016-06-14 15:03:59 +00:00
Simon Dardis 878c0b1b76 [mips] Optimize stack pointer adjustments.
Instead of always using addu to adjust the stack pointer when the
size out is of the range of an addiu instruction, use subu so that
a smaller constant can be generated.

This can give savings of ~3 instructions whenever a function has a
a stack frame whose size is out of range of an addiu instruction.

This change may break some naive stack unwinders.

Partially resolves PR/26291.

Thanks to David Chisnall for reporting the issue.

Reviewers: dsanders, vkalintiris

Differential Review: http://reviews.llvm.org/D21321

llvm-svn: 272666
2016-06-14 13:39:43 +00:00
James Molloy 65b6be1d3a [Thumb] Fix off-by-one error in r272007
We can only generate immediates up to #510 with a MOV+ADD, not #511, because there's no such instruction as add #256.

Found by Oliver Stannard and csmith!

llvm-svn: 272665
2016-06-14 13:33:07 +00:00
Simon Dardis 4fbf76f7c3 [mips][atomics] Fix atomic instruction descriptions and uses.
PR27458 highlights that the MIPS backend does not have well formed
MIR for atomic operations (among other errors).

This patch adds expands and corrects the LL/SC descriptions and uses
for MIPS(64).

Reviewers: dsanders, vkalintiris

Differential Review: http://reviews.llvm.org/D19719

llvm-svn: 272655
2016-06-14 11:29:28 +00:00
Daniel Sanders e858136d91 [mips][ias] Implement one N32 case (of two) for .cpsetup.
This patch implements the N32 case where -mno-shared is in effect. The case
where -mshared is in effect will be added later since doing that now requires
additional changes to how we handle %hi(%neg(%gp_rel(foo))) expressions to
emit the three relocations as three relocations (currently only one of the
three would be emitted) which then requires further changes to our MCFixup
handling.

While we could fix both cases together, fixing the -mno-shared case allows us
to fix the ELFCLASS bug (where N32 incorrectly uses ELFCLASS64 instead of
ELFCLASS32) in a way that allows cpsetup.s to check for a correct output instead
of another incorrect output.

Reviewers: sdardis

Subscribers: dsanders, llvm-commits, sdardis

Differential Revision: http://reviews.llvm.org/D21131

llvm-svn: 272652
2016-06-14 10:13:47 +00:00
Simon Pilgrim cf1165b86e [X86][SSE4A] Added patterns for nontemporal stores of scalar float/doubles using MOVNTSD/MOVNTSS
llvm-svn: 272651
2016-06-14 09:43:38 +00:00
Simon Dardis e661e528db [mips] MIPS32/64 itineraries
Itineraries for some pre MIPSR6 and EVA instructions. Some pseudo expanded
instructions are marked as having no scheduling info.

Reviewers: dsanders, vkalintiris

Differential Review: http://reviews.llvm.org/D20418

llvm-svn: 272648
2016-06-14 09:35:29 +00:00
Daniel Sanders 435a653437 [mips][dsp] Fix use without def on DSPCtrl registers read by rddsp intrinsic.
Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21063

llvm-svn: 272647
2016-06-14 09:29:46 +00:00
Daniel Sanders d2a49ec3ab [mips][msa] copyPhysReg() should not set RegState::Define on result of CTCMSA.
Summary:
The machine verifier reports 'Explicit operand marked as def' when it is
manually specified even though it agrees with the operand info.

Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21065

llvm-svn: 272646
2016-06-14 09:11:33 +00:00
Craig Topper 34d9707825 [AVX512] Use AND32ri8 instead of AND32ri when anding with 1 to create single bit masks. This results in a smaller encoding.
llvm-svn: 272627
2016-06-14 03:13:03 +00:00
Craig Topper 99e30e6a66 [AVX512] Use MOVZX32 instead of MOVZ16 for loading single v8/v4/v2/v1 masks when KMOVB is not available. This has better behavior with respect to partial register stalls since it won't need to preserve the upper 16-bits of the GPR.
llvm-svn: 272626
2016-06-14 03:13:00 +00:00
Craig Topper ddab395397 [AVX512] Add patterns for zero-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX.
llvm-svn: 272625
2016-06-14 03:12:54 +00:00
Kevin Enderby d2d2ce9b9f Update the AArch64ExternalSymbolizer to print literal strings as escaped strings
so it is the same as the MCExternalSymbolizer.

rdar://17349181

llvm-svn: 272588
2016-06-13 21:08:57 +00:00
David Majnemer 248190ba69 [X86] Remove llvm.x86.bit.scan.{forward,reverse}.32
The need for these intrinsics has been obviated by r272564 which
reimplements their functionality using generic IR.

llvm-svn: 272566
2016-06-13 17:33:13 +00:00
Marek Olsak e93f6d6923 AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1

This makes one particular compute shader 8x faster.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21136

llvm-svn: 272556
2016-06-13 16:05:57 +00:00
Matt Arsenault 80bc355048 AMDGPU: Fix post-RA verifier errors with trackLivenessAfterRegAlloc
The condition reg of the cndmask_b64 expansion can't be killed by
the first one, and the implicit super register implicit def is needed.

llvm-svn: 272554
2016-06-13 15:53:52 +00:00
Ulrich Weigand daae87aa21 [SystemZ] Enable index register memory constraints for inline ASM
This enables use of the 'R' and 'T' memory constraints for inline ASM
operands on SystemZ, which allow an index register as well as an
immediate displacement. This patch includes corresponding documentation
and test case updates.

As with the last patch of this kind, I moved the 'm' constraint to the
most general case, which is now 'T' (base + 20-bit signed displacement +
index register).

Author: colpell
Differential Revision: http://reviews.llvm.org/D21239

llvm-svn: 272547
2016-06-13 14:24:05 +00:00
Ranjeet Singh 933e1aa39f [ARM] Reverting r272544 because clang patch needs
to go in as soon as llvm patch has gone in because
tests will start breaking in Clang.

llvm-svn: 272546
2016-06-13 10:58:24 +00:00
Ranjeet Singh 8feacb330d [ARM] Add mrrc/mrrc2 co-processor intrinsics
MRRC/MRRC2 instruction writes to two registers. The
intrinsic definition returns a single uint64_t to
represent the write, this is a compact way of
representing a write to two 32 bit registers,
the alternative might have been two return a
struct of 2 uint32_t's but this isn't as nice.

Differential Revision: 

llvm-svn: 272544
2016-06-13 10:43:50 +00:00
Haojian Wu 7900ca1e7e Fix an enumeral mismatch warning.
Summary:
The "-Werror=enum-compare" shows that the statement is using two different enums:

enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType'

A follow-up fix on D21235.

Reviewers: klimek

Subscribers: spatel, cfe-commits

Differential Revision: http://reviews.llvm.org/D21278

llvm-svn: 272539
2016-06-13 09:03:45 +00:00
Craig Topper 13cf7cac07 [AVX512] Remove maksed pshufd, pshuflw, and phufhw intrinsics and autoupgrade them to selects and shufflevector.
llvm-svn: 272527
2016-06-13 02:36:48 +00:00
Benjamin Kramer 4ca41fd09e Run clang-tidy's performance-unnecessary-copy-initialization over LLVM.
No functionality change intended.

llvm-svn: 272516
2016-06-12 17:30:47 +00:00
Benjamin Kramer d3f4c05aea Move instances of std::function.
Or replace with llvm::function_ref if it's never stored. NFC intended.

llvm-svn: 272513
2016-06-12 16:13:55 +00:00
Benjamin Kramer bdc4956bac Pass DebugLoc and SDLoc by const ref.
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.

llvm-svn: 272512
2016-06-12 15:39:02 +00:00
Sanjay Patel 977530a8c9 [x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044)
This patch is intended to solve:
https://llvm.org/bugs/show_bug.cgi?id=28044

By changing the definition of X86ISD::CMPP to use float types, we allow it to be created 
and pass legalization for an SSE1-only target where v4i32 is not legal.

The motivational trail for this change includes:
https://llvm.org/bugs/show_bug.cgi?id=28001

and eventually makes this trigger:
http://reviews.llvm.org/D21190

Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86
intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM 
sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86
intrinsics, a big pile of generic optimizations can trigger.

Differential Revision: http://reviews.llvm.org/D21235

llvm-svn: 272511
2016-06-12 15:03:25 +00:00
Craig Topper 1067986c5b [X86] Remove sse2 pshufd/pshuflw/pshufhw intrinsics and upgrade them to shufflevector.
llvm-svn: 272510
2016-06-12 14:11:32 +00:00
Craig Topper 251030babe [AVX512] Remove the masked palignr intrinsics that I forgot to remove when I added auto-upgrade code to turn them into shufflevectors and selects.
llvm-svn: 272497
2016-06-12 04:14:13 +00:00
Simon Pilgrim 3fc09f7be6 [CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targets
To account for the fast PSHUFB implementation now available

llvm-svn: 272484
2016-06-11 19:23:02 +00:00
Simon Pilgrim 5b9bade8dd [X86][SSSE3] Added PSHUFB LUT implementation of BITREVERSE
PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach.

llvm-svn: 272477
2016-06-11 15:44:13 +00:00
Simon Pilgrim b13961d25b Strip trailing whitespace. NFCI.
llvm-svn: 272476
2016-06-11 14:34:10 +00:00
Craig Topper 504fba5c8a [AVX512] Lower v8i64 and v16i32 to pshufd when possible.
llvm-svn: 272473
2016-06-11 13:43:21 +00:00
Simon Pilgrim 6800a45790 [X86][SSE] Added PSLLDQ/PSRLDQ as a target shuffle type
Ensure that PALIGNR/PSLLDQ/PSRLDQ are byte vectors so that they can be correctly decoded for target shuffle combining

llvm-svn: 272471
2016-06-11 13:38:28 +00:00
Simon Pilgrim 255fdd0666 [X86][SSE] Use vXi8 return type for PSLLDQ/PSRLDQ instructions
These are byte shift instructions and it will make shuffle combining a lot more straightforward if we can assume a vXi8 vector of bytes so decoded shuffle masks match the return type's number of elements

llvm-svn: 272468
2016-06-11 12:54:37 +00:00
Simon Pilgrim d386941676 [X86][AVX512] Tidied up VSHUFF32x4/VSHUFF64x2/VSHUFI32x4/VSHUFI64x2 comment generation
Now matches other shuffles

llvm-svn: 272464
2016-06-11 11:18:38 +00:00
Chandler Carruth 4c0e94dce6 Try a bit harder to remove the signed and unsigned comparison warning.
Hopefully this time it actually works and stays away.

llvm-svn: 272463
2016-06-11 09:13:00 +00:00
Chandler Carruth 306e270b83 Compare to an unsigned literal to avoid a -Wsign-compare warning.
llvm-svn: 272459
2016-06-11 08:02:01 +00:00
Craig Topper 40abd1cc61 [AVX512] Add support for lowering v32i16 shuffles with repeated lanes. This allows us to create 512-bit PSHUFLW/PSHUFHW.
llvm-svn: 272450
2016-06-11 03:27:42 +00:00
Craig Topper b9b86fcfff [AVX512] No need to check for BWI being enabled before lowering v32i16 and v64i8 shuffles. If we get this far the types are already legal which means BWI must be enabled.
llvm-svn: 272449
2016-06-11 03:27:37 +00:00
Sanjoy Das 39c226fdba [STLExtras] Introduce and use llvm::count_if; NFC
(This is split out from was D21115)

llvm-svn: 272435
2016-06-10 21:18:39 +00:00
Chad Rosier d1f6c840ee [AArch64] Move comments closer to relevant check. NFC.
llvm-svn: 272430
2016-06-10 20:49:18 +00:00
Chad Rosier c5083c2ccf [AArch64] Refactor a check earlier. NFC.
llvm-svn: 272429
2016-06-10 20:47:14 +00:00