AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors.
This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected.
Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node.
Differential Revision: http://reviews.llvm.org/D21660
llvm-svn: 273848
Don't cast GV expression to MCSymbolRefExpr. r272705 changed GV to binary
expressions by including offset even if the offset it 0
(we haven't hit this sooner since tested workloads don't include static offsets)
We don't really care about the type of expression, so set it directly.
Fixes: r272705
Consider section relative relocations. Since all const as data is in one boffer section relative is equivalent to abs32.
Fixes: r273166
Differential Revision: http://reviews.llvm.org/D21633
llvm-svn: 273785
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.
Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
- offset 0: work group ID x
- offset 4: work group ID y
- offset 8: work group ID z
- offset 16: work item ID x
- offset 20: work item ID y
- offset 24: work item ID z
Set
- amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
- amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
- amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled
Differential Revision: http://reviews.llvm.org/D20335
llvm-svn: 273769
Summary:
Offset folding only works if you are emitting relocations, and we don't
emit relocations for local address space globals.
Reviewers: arsenm, nhaustov
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21647
llvm-svn: 273765
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.
Differential Revision: http://reviews.llvm.org/D21540
llvm-svn: 273751
The only real reason to use it is for testing, so replace
it with a command line option instead of a potentially function
dependent feature.
llvm-svn: 273653
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.
llvm-svn: 273652
Summary:
We will start generating this in a future patch.
Reviewers: arsenm, kzhuravl, rafael, ruiu, tony-tye
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21482
llvm-svn: 273628
Memory references were not being propagated for this folded load. This
prevented optimizations like LICM from hoisting the load.
Added test to verify that this allows LICM to proceed.
llvm-svn: 273617
X86FrameLowering::adjustForHiPEPrologue() contains a hard-coded offset
into an Erlang Runtime System-internal data structure (the PCB). As the
layout of this data structure is prone to change, this poses problems
for maintaining compatibility.
To address this problem, the compiler can produce this information as
module-level named metadata. For example (where P_NSP_LIMIT is the
offending offset):
!hipe.literals = !{ !2, !3, !4 }
!2 = !{ !"P_NSP_LIMIT", i32 152 }
!3 = !{ !"X86_LEAF_WORDS", i32 24 }
!4 = !{ !"AMD64_LEAF_WORDS", i32 24 }
Patch by Magnus Lang
Differential Revision: http://reviews.llvm.org/D20363
llvm-svn: 273593
Recommiting after correcting over-eager Debug Value transfer fixing PR28270.
[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.
Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.
This refixes PR9817 which was being incompletely checked in the
testsuite.
Reviewers: jyknight
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D21037
llvm-svn: 273585
Summary:
SSAT saturates an integer, making sure that its value lies within
an interval [-k, k]. Since the constant is given to SSAT as the
number of bytes set to one, k + 1 must be a power of 2, otherwise
the optimization is not possible. Also, the select_cc must use <
and > respectively so that they define an interval.
Reviewers: mcrosier, jmolloy, rengolin
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D21372
llvm-svn: 273581
Summary:
The backend has no reason to behave like a driver and should generally do
as it's told (and error out if it can't) instead of trying to figure out
what the API user meant. The default ABI is still derived from the arch
component as a concession to backwards compatibility.
API-users that previously passed an explicit CPU and a triple that was
inconsistent with the CPU (e.g. mips-linux-gnu and mips64r2) may get a
different ABI to what they got before. However, it's expected that there
are no such users on the basis that CodeGen has been asserting that the
triple is consistent with the selected ABI for several releases. API-users
that were consistent or passed '' or 'generic' as the CPU will see no
difference.
Reviewers: sdardis, rafael
Subscribers: rafael, dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D21466
llvm-svn: 273557
Move most of the initializations in ARMSubtarget::initializeEnvironment to
member initializers.
Change suggested by Matthias Braun (see http://reviews.llvm.org/D21432).
llvm-svn: 273556
Summary:
When parseAnyRegister() encounters a symbol alias, it parses integers and adds
a corresponding expression to the operand list. This is clearly wrong since the
only operands that parseAnyRegister() should be accepting are registers.
It's not clear why this code was added and there are no test cases that cover
it. I think it might be leftover from when searchSymbolAlias() was more widely
used.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D21377
llvm-svn: 273555
The exit-on-error flag was necessary in order to avoid an assertion when
handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize.
We can avoid the assertion by creating some dummy nodes. This enables us to
remove the exit-on-error flag on the first 2 run lines (SI), but on the third
run line (R600) we would run into another assertion when trying to reserve
indirect registers. This patch also replaces that assertion with an early exit
from the function.
Fixes PR27761.
Differential Revision: http://reviews.llvm.org/D20852
llvm-svn: 273550
dext and dins, along with their 'm' and 'u' variants are defined in mips64r2,
not mips64.
Reviewers: dsanders, vkalintiris
Differential Review: http://reviews.llvm.org/D21608
llvm-svn: 273549
This is a cleanup commit similar to r271555, but for ARM.
The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.
Since the ARM backend seems to have quite a lot of calls to these methods, I
intend to submit 5-6 subtarget features at a time, instead of one big lump.
Differential Revision: http://reviews.llvm.org/D21432
llvm-svn: 273544