This sort of doesn't matter since the setcc type is i1, but
this previously was using the default UndefinedBooleanContent. This
makes it more consistent with R600. This enables more optimizations
which typically give up on UndefinedBooleanContent. For example,
there is already a special case target DAG combine for
setcc + sext which can be eliminated in favor of what the generic
DAG combiner can do if it assumes boolean values are sign extended.
Since -1 is an inline immediate, using it is basically free and the
backend already uses it when a boolean value is needed in a wider type.
llvm-svn: 222850
This fixes moving boolean constants into registers before operating
on them. They get permuted and shrunk down to e32 anyway later. This
is a temporary fix until the patch that removes these pseudos is
committed.
llvm-svn: 222844
Only the super register flat_scr was marked as reserved,
so in some cases with high register usage it would still
try to allocate the subregisters.
llvm-svn: 222737
This s_mov_b32 will write to a virtual register from the M0Reg
class and all the ds instructions now take an extra M0Reg explicit
argument.
This change is necessary to prevent issues with the scheduler
mixing together instructions that expect different values in the m0
registers.
llvm-svn: 222583
A register operand that has a common sub-class with its instruction's
defined register class is not always legal. For example,
SReg_32 and M0Reg both have a common sub-class, but we can't
use an SReg_32 in instructions that expect a M0Reg.
This prevents the llvm.SI.sendmsg.ll test from failing when the fold
operand pass is added.
llvm-svn: 222368
This partially makes up for not having address spaces
used for alias analysis in some simple cases.
This is not yet enabled by default so shouldn't change anything yet.
llvm-svn: 222286
Assuming unmodeled side effects interferes with some scheduling
opportunities.
Don't put it in the base class of DS instructions since there
are a few weird effecting, non load/store instructions there.
llvm-svn: 222285
This should expose more of the actually used VALU
instructions to the machine optimization passes.
This also should help getting i1 handling into a better state.
For not entirly understood reasons, this fixes the split-scalar-i64-add.ll
test where a 64-bit add would only partially be moved to the VALU
resulting in use of undefined VCC.
llvm-svn: 222256
This was resulting in use of a register after a kill.
For some reason this showed up as a problem in many tests
when moving the SIFixSGPRCopies pass closer to instruction
selection.
llvm-svn: 222175
This gets the correct NaN behavior based on the compare type
the hardware uses. This now passes the new piglit test I have
for this on SI.
Add stricter tests for the operand order.
llvm-svn: 222079
This is so it could potentially be used by SI. However, the current
implementation does not always produce correct results, so the
IntegerDivisionPass is being used instead.
llvm-svn: 222072
If we have spilled the value of the m0 register, then we need to restore
it with v_readlane_b32 to a regular sgpr, because v_readlane_b32 can't
write to m0.
v_readlane_b32 can't write to m0, so
llvm-svn: 222036
These were directly using the old base instruction
class, and specifying the wrong register classes
for operands. The operands can be the other special
inputs besides SGPRs. The op name was also being
directly used for the asm string, so this was printed
without any operands.
llvm-svn: 221921
If a function is just an unreachable, this would hit a
"this is not a MachO target" assertion because of setting
HasSubsectionViaSymbols.
llvm-svn: 221920
e.g. v_mad_f32 a, b, c -> v_mad_f32 b, a, c
This simplifies matching v_madmk_f32.
This looks somewhat surprising, but it appears to be
OK to do this. We can commute src0 and src1 in all
of these instructions, and that's all that appears
to matter.
llvm-svn: 221910
Instead, we're going to separate metadata from the Value hierarchy. See
PR21532.
This reverts commit r221375.
This reverts commit r221373.
This reverts commit r221359.
This reverts commit r221167.
This reverts commit r221027.
This reverts commit r221024.
This reverts commit r221023.
This reverts commit r220995.
This reverts commit r220994.
llvm-svn: 221711
This matches the format produced by the AMD proprietary driver.
//==================================================================//
// Shell script for converting .ll test cases: (Pass the .ll files
you want to convert to this script as arguments).
//==================================================================//
; This was necessary on my system so that A-Z in sed would match only
; upper case. I'm not sure why.
export LC_ALL='C'
TEST_FILES="$*"
MATCHES=`grep -v Patterns SIInstructions.td | grep -o '"[A-Z0-9_]\+["e]' | grep -o '[A-Z0-9_]\+' | sort -r`
for f in $TEST_FILES; do
# Check that there are SI tests:
grep -q -e 'verde' -e 'bonaire' -e 'SI' -e 'tahiti' $f
if [ $? -eq 0 ]; then
for match in $MATCHES; do
sed -i -e "s/\([ :]$match\)/\L\1/" $f
done
# Try to get check lines with partial instruction names
sed -i 's/\(;[ ]*SI[A-Z\\-]*: \)\([A-Z_0-9]\+\)/\1\L\2/' $f
fi
done
sed -i -e 's/bb0_1/BB0_1/g' ../../../test/CodeGen/R600/infinite-loop.ll
sed -i -e 's/SI-NOT: bfe/SI-NOT: {{[^@]}}bfe/g'../../../test/CodeGen/R600/llvm.AMDGPU.bfe.*32.ll ../../../test/CodeGen/R600/sext-in-reg.ll
sed -i -e 's/exp_IEEE/EXP_IEEE/g' ../../../test/CodeGen/R600/llvm.exp2.ll
sed -i -e 's/numVgprs/NumVgprs/g' ../../../test/CodeGen/R600/register-count-comments.ll
sed -i 's/\(; CHECK[-NOT]*: \)\([A-Z_0-9]\+\)/\1\L\2/' ../../../test/CodeGen/R600/select64.ll ../../../test/CodeGen/R600/sgpr-copy.ll
//==================================================================//
// Shell script for converting .td files (run this last)
//==================================================================//
export LC_ALL='C'
sed -i -e '/Patterns/!s/\("[A-Z0-9_]\+[ "e]\)/\L\1/g' SIInstructions.td
sed -i -e 's/"EXP/"exp/g' SIInstrInfo.td
llvm-svn: 221350
The problem is mostly that variadic output instruction
aren't handled, so it is rejected for having an inconsistent
number of operands, and then the right number of operands
isn't emitted.
llvm-svn: 221117
It appears to ignore or find ambiguous MachineInstrBuilder's conversion
operators that allow conversion to MachineInstr* and
MachineBasicBlock::bundle_iterator.
As a workaround, add an explicit way to get the MachineInstr.
llvm-svn: 221017
We need to figure out how to track ptrtoint values all the
way until result is converted back to a pointer in order
to correctly rewrite the pointer type.
llvm-svn: 220997
Every target we support has support for assembly that looks like
a = b - c
.long a
What is special about MachO is that the above combination suppresses the
production of a relocation.
With this change we avoid producing the intermediary labels when they don't
add any value.
llvm-svn: 220256
The generic code trying to use findCommutedOpIndices won't
understand that it needs to swap the modifier operands also,
so it should fail if they are set.
llvm-svn: 220064
The SelectDS1Addr1Offset complex pattern always tries to store constant
lds pointers in the offset operand and store a zero value in the addr operand.
Since the addr operand does not accept immediates, the zero value
needs to first be copied to a register.
This newly created zero value will not go through normal instruction
selection, so we need to manually insert a V_MOV_B32_e32 in the complex
pattern.
This bug was hidden by the fact that if there was another zero value
in the DAG that had not been selected yet, then the CSE done by the DAG
would use the unselected node for the addr operand rather than the one
that was just created. This would lead to the zero value being selected
and the DAG automatically inserting a V_MOV_B32_e32 instruction.
llvm-svn: 219848
This effectively reverts revert 219707. After fixing the test to work with
new function name format and renamed intrinsic.
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219710
v2: Add SI lowering
Add test
v3: Place work dimensions after the kernel arguments.
v4: Calculate offset while lowering arguments
v5: rebase
v6: change prefix to AMDGPU
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219705
Currently this only functions to match simple cases
where ds_read2_* / ds_write2_* instructions can be used.
In the future it might match some of the other weird
load patterns, such as direct to LDS loads.
Currently enabled only with a subtarget feature to enable
easier testing.
llvm-svn: 219533
LLVM assumes INSERT_SUBREG will always have register operands, so
we need to legalize non-register operands, like FrameIndexes, to
avoid random assertion failures.
llvm-svn: 219420
The main reason for this is that the MCAsmInfo class,
which we were previously using as the base class, sets
PrivateGlobalPrefix to "L", which causes all global
functions that start with L to be treated as local symbols.
MCAsmInfoELF sets PrivateGlobalPrefix to ".L", which is what
we want, and it is probably a good idea to use this as the
base class anyway, since we are emitting ELF binaries.
llvm-svn: 219237
Added a FIXME coment instead, we need to handle the case where the
two DS instructions being compared have different numbers of operands.
llvm-svn: 219236
No tests for omod since nothing uses it yet, but
this should get rid of the remaining annoying trailing
zeros after some instructions.
llvm-svn: 218692
Instead of moving the first SGPR that is different than the first,
legalize the operand that requires the fewest moves if one
SGPR is used for multiple operands.
This saves extra moves and is also required for some instructions
which require that the same operand be used for multiple operands.
llvm-svn: 218532
Disable the SGPR usage restriction parts of the DAG legalizeOperands.
It now should only be doing immediate folding until it can be replaced
later. The real legalization work is now done by the other
SIInstrInfo::legalizeOperands
llvm-svn: 218531
The base implementation of commuteInstruction is used
in some cases, but it turns out this has been broken for a
long time since modifiers were inserted between the real operands.
The base implementation of commuteInstruction also fails on immediates,
which also needs to be fixed.
llvm-svn: 218530
e.g. v_cndmask_b32 requires the condition operand be an SGPR.
If one of the source operands were an SGPR, that would be considered
the one SGPR use and the condition operand would be illegally moved.
llvm-svn: 218529
This needs a test, but I'm not sure if it is currently possible and
I originally hit it due to a bug. Right now the only global address
operands have no reason to be VALU instructions, although it
theoretically could be a problem.
llvm-svn: 218528
No test since the current SIISelLowering::legalizeOperands
effectively hides this, and the general uses seem to only fire
on SALU instructions which don't have modifiers between
the operands.
When trying to use legalizeOperands immediately after
instruction selection, it now sees a lot more patterns
it did not see before which break on this.
llvm-svn: 218527
No tests hit this, and I don't see any way a GlobalAddress
node would survive beyond lowering on SI. It it would, the
move should probably be inserted by selection.
llvm-svn: 218526
The previous implementation was extending the live range of SGPRs
by modifying the live intervals directly. This was causing a lot
of machine verification errors when the machine scheduler was enabled.
The new implementation adds pseudo instructions with implicit uses to
extend the live ranges of SGPRs, which works much better.
llvm-svn: 218351
Correctly handle special registers: EXEC, EXEC_LO, EXEC_HI, VCC_LO,
VCC_HI, and M0. The previous implementation would assertion fail
when passed these registers.
llvm-svn: 218349
VGPRs are spilled to LDS. This still needs more testing, but
we need to at least enable it at -O0, because the fast register
allocator spills all registers that are live at the end of blocks
and without this some future commits will break the
flat-address-space.ll test.
v2: Only calculate thread id once
v3: Move insertion of spill instructions to
SIRegisterInfo::eliminateFrameIndex()
llvm-svn: 218348
There are new register classes VCSrc_* which represent operands that
can take an SGPR, VGPR or inline constant. The VSrc_* class is now used
to represent operands that can take an SGPR, VGPR, or a 32-bit
immediate.
This allows us to have more accurate checks for legality of
immediates, since before we had no way to distinguish between operands
that supported any 32-bit immediate and operands which could only
support inline constants.
llvm-svn: 218334
This reverts commit r218254.
The global_atomics.ll test fails with asserts disabled. For some reason,
the compiler fails to produce the atomic no return variants.
llvm-svn: 218257
BypassSlowDiv is used by codegen prepare to insert a run-time
check to see if the operands to a 64-bit division are really 32-bit
values and if they are it will do 32-bit division instead.
This is not useful for R600, which has predicated control flow since
both the 32-bit and 64-bit paths will be executed in most cases. It
also increases code size which can lead to more instruction cache
misses.
llvm-svn: 218252
ISD::MUL and ISD:UMULO are the same except that UMULO sets an overflow
bit. Since we aren't using the overflow bit, we should use ISD::MUL.
llvm-svn: 218251
In r217636, the value stored in KernelInfo.Num[VS]GPRSs was changed from
the highest GPR index used to the number of gprs in order to be
consistent with the name of the variable.
The code writing the config values still assumed that the value in this
variable was the highest GPR index used, which caused the compiler to
over report the number of GPRs being used.
https://bugs.freedesktop.org/show_bug.cgi?id=84089
llvm-svn: 218150
shim between the TargetTransformInfo immutable pass and the Subtarget
via the TargetMachine and Function. Migrate a single call from
BasicTargetTransformInfo as an example and provide shims where TargetMachine
begins taking a Function to determine the subtarget.
No functional change.
llvm-svn: 218004
Since read2 / write2 are emitted for 4-byte aligned 8-byte
accesses, these are seen by the scheduler.
The DAG scheduler is semi-deprecated, so just
ignore these for now.
llvm-svn: 217969
Only 1 decimal place should be printed for inline immediates.
Other constants should be hex constants.
Does not include f64 tests because folding those inline
immediates currently does not work.
llvm-svn: 217964
Instructions are now generally selected to the e64 forms originally,
and shrunk down later. Rename foldOperands to legalizeOperands,
since that's really most of what it tries to do.
llvm-svn: 217959
Add some more tests to make sure better operand
choices are still made. Leave some cases that seem
to have no reason to ever be e64 alone.
llvm-svn: 217789
Refactored the R600_LDS_1A2D class a bit to get it to actually work.
It seemed to be previously unused and broken.
We also have to disable the conversion to the noret variant for now in
R600ISelLowering because the getLDSNoRetOp method only handles 1A1D LDS ops.
Someone can feel free to modify the AMDGPU::getLDSNoRetOp method to
work for more than 1A1D variants of LDS operations. It's being left as a
future TODO for now.
Signed-off-by: Aaron Watry <awatry at gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 217596
This was only present for SI before.
Cayman may still be missing, but I am unable to test that currently.
v2: Don't create atomicrmw max tests in separate file
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217589
Need to convert the 64 element offset into bytes, not just the element
size like the normal case instructions.
Noticed by inspection. This can't be hit now because
st64 instructions aren't emitted during instruction selection,
and the post-RA scheduler isn't enabled.
llvm-svn: 217560
"Unroll" is not the appropriate name for this variable. Clang already uses
the term "interleave" in pragmas and metadata for this.
Differential Revision: http://reviews.llvm.org/D5066
llvm-svn: 217528
Assert in scheduler from an inserted copy_to_regclass from
a constant.
This only seems to break sometimes when a constant initializer
address is forced into VGPRs in a non-entry block. No test
since the only case I've managed to hit only happens with a future
patch, and that case will also not be a problem once scalar instructions
are used in non-entry blocks.
llvm-svn: 217380
We must constrain the destination register class of legalized operands
to a VGPR class or else the illegal operand may be folded back into
the instruction by the register coalescer.
This fixes a bug in add.ll that will be uncovered by future commits.
llvm-svn: 217249
This fixes a crash in the OpenCV test:
ImgprocWarpResizeArea/Resize.Mat/16
There is no test case for this, because this failure depends on a
specific ordering of the loads, which could easily change.
llvm-svn: 217040
These pointers are really just offsets and they will always be
less than 16-bits. Using AssertZExt allows us to use computeKnownBits
to prove that these values are positive. We will use this information
in a later commit.
llvm-svn: 216277
isPow2DivCheap
That name doesn't specify signed or unsigned.
Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong:
srl/add/sra
is not the general sequence for signed integer division by power-of-2. We need one more 'sra':
sra/srl/add/sra
That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case.
This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods.
No functional change intended.
Differential Revision: http://reviews.llvm.org/D5010
llvm-svn: 216237
This will simplify the SGPR spilling and also allow us to use
MachineFrameInfo for calculating offsets, which should be more
reliable than our custom code.
This fixes a crash in some cases where a register would be spilled
in a branch such that the VGPR defined for spilling did not dominate
all the uses when restoring.
This fixes a crash in an ocl conformance test. The test requries
register spilling and is too big to include.
llvm-svn: 216217
Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
is only done if the add has one use. If the resulting constant
add can be folded into an addressing mode, force this to happen
for the pointer operand.
This ends up happening a lot because of how LDS objects are allocated.
Since the globals are allocated next to each other, acessing the first
element of the second object is directly indexed by a shifted pointer.
llvm-svn: 215739
The default assumes that a 16-bit signed offset is used.
LDS instruction use a 16-bit unsigned offset, so it wasn't
being used in some cases where it was assumed a negative offset
could be used.
More should be done here, but first isLegalAddressingMode needs
to gain an addressing mode argument. For now, copy most of the rest
of the default implementation with the immediate offset change.
llvm-svn: 215732
This for some reason fixes v1i64 kernel arguments on pre-SI. This
currently breaks some other cases in the kernel-args.ll test for R600,
but I'm not particularly confident in the new output. VTX_READ_* are not
used for some of the scalarized cases, and the code reading from the
constant buffer doesn't make much sense to me.
llvm-svn: 215564
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)
Changes made by clang-tidy with minor tweaks.
llvm-svn: 215558
v2: drop enum keyword
use correct extension mode
don't bother computing the sign in unsinged case
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215462
v2: add tests
rename LowerSDIV24 to LowerSDIVREM24
handle the rem part in this function
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215460
This bit was left uninitialized, which was causing some random failures
of piglit tests.
NOTE: This is a candidate for the 3.5 branch.
llvm-svn: 215396
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 215154
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
llvm-svn: 215111
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.
llvm-svn: 214988
This partially fixes weird looking load scheduling
in memcpy test. The load clustering doesn't seem
particularly smart, but this method seems to be partially
deprecated so it might not be worth trying to fix.
llvm-svn: 214943
This currently has a noticable effect on the kernel argument loads.
LDS and global loads are more problematic, I think because of how copies
are currently inserted to ensure that the address is a VGPR.
llvm-svn: 214942
This allows accessing an SReg subregister with a normal subregister
index, instead of getting a machine verifier error.
Also be sure to include all of these subregisters in SReg_32.
This fixes inferring SGPR instead of SReg when finding a
super register class.
llvm-svn: 214901
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
llvm-svn: 214838
This slipped in in r214467, so something like
V_MOV_B32_e32 v0, ... is now printed with 2 spaces
between the instruction name and first operand.
llvm-svn: 214660
fromulation of the node, which isn't really the desired behavior from
within the combiner or legalizer, but is necessary within ISel. I've
added a hopefully helpful comment and fixed the only two places where
this took place.
Yet another step toward the combiner and legalizer not needing to use
update listeners with virtual calls to manage the worklists behind
legalization and combining.
llvm-svn: 214574
Abs/neg folding has moved out of foldOperands and into the instruction
selection phase using complex patterns. As a consequence of this
change, we now prefer to select the 64-bit encoding for most
instructions and the modifier operands have been dropped from
integer VOP3 instructions.
llvm-svn: 214467
We were incorrectly assuming that all VOP2 instructions can read SGPRs
in Src0, but this is not true for instructions that read carry-in from
VCC.
The old logic has been replaced with new logic which checks the defined
register classes of the VOP2 instruction to determine whether or not to
legalize the operands.
llvm-svn: 214465
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.
This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.
llvm-svn: 214449
neverHasSideEffects is deprecated, and hasSideEffects = 0 is already
set on the base classes of the basic ALU instruction classes. The
base classes also already set mayLoad = 0 and mayStore = 0
llvm-svn: 214283
We can treat ds_read2_* as a single offset if the offsets are adjacent.
No test since emission of read2 instructions for partially
aligned loads isn't implemented yet.
llvm-svn: 214269
Rename to allowsMisalignedMemoryAccess.
On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.
llvm-svn: 214055
SDValues, fixing the two bugs left in the regression suite.
The key for both of these was the use a single value type rather than
a VTList which caused an unintentionally single-result merge-value node.
Fix this by getting the appropriate VTList in place.
Doing this exposed that the comments in x86's code abouth how MUL_LOHI
operands are handle is wrong. The bug with the use of out-of-range
result numbers was hiding the bug about the order of operands here (as
best i can tell). There are more places where the code appears to get
this backwards still...
llvm-svn: 213931
Use ComputeNumSignBits instead of checking for i8 / i16 which only
worked when AMDIL was lying about having legal i8 / i16.
If an integer is known to fit in 24-bits, we can
do division faster with float ops.
llvm-svn: 213843
GCC believes it may be possible to not return a value from the switch:
lib/Target/R600/SIRegisterInfo.cpp:187:1: warning: control reaches end of non-void function [-Wreturn-type]
Add an unreachable label to indicate that this is not possible and still permit
switch coverage checking.
llvm-svn: 213572
There are a few more cleanups to do, but I ran into some problems
with ext loads and trunc stores, when I tried to change some of the
vector loads and stores from custom to legal, so I wasn't able to
get rid of everything.
llvm-svn: 213552
This implements a solution for constant initializers suggested
by Vadim Girlin, where we store the data after the shader code
and then use the S_GETPC instruction to compute its address.
This saves use the trouble of creating a new buffer for constant data
and then having to pass the pointer to the kernel via user SGPRs or the
input buffer.
llvm-svn: 213530
These instructions can only take a limited input range, and return
the constant value 1 out of range. We should do range reduction to
be able to process arbitrary values. Use a FRACT instruction after
normalization to achieve this. Also add a test for constant folding
with the lowered code with unsafe-fp-math enabled.
v2: use DAG lowering instead of intrinsic, adapt test
v3: calculate constant, fold pattern into instruction definition
v4: misc style fixes, add sin-fold testcase, cosmetics
Patch by Grigori Goronzy
llvm-svn: 213458
Unfortunately, we don't seem to have a direct truncation, but the
extension can be legally split into two operations so we should
support that.
llvm-svn: 213357
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.
During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.
Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.
llvm-svn: 213248
v2: use ffbh/l if available
v3: Rebase on top of Matt's SI patches
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 213072
This helps avoid redundant instructions to unpack, and repack
the vectors. Ideally we could recognize that pattern and eliminate
it. Currently v4i8 and other small element type vectors are scalarized,
so this has the added bonus of avoiding that.
llvm-svn: 213031