Addresses are floats when a sampler is present and unsigned integers
when no sampler is present.
Therefore, only zext instructions, not sext instructions should match.
Also match integer constants that can be truncated.
Differential Revision: https://reviews.llvm.org/D118043
If the bias is zero, we can remove it from the image instruction.
Also copy other image optimizations (l->lz, mip->nomip) to IR combines.
Differential Revision: https://reviews.llvm.org/D116042
As the codegen fix in D111754, the LOD bias needs to be converted to 16
bits. Fix this in the combine.
Differential Revision: https://reviews.llvm.org/D116038
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.
I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.
We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause.
Differential Revision: https://reviews.llvm.org/D115387
Mostly this fixes cases where !noalias or !alias.scope were passed
a scope rather than a scope list. In some cases I opted to drop
the metadata entirely instead, because it is not really relevant
to the test.
After 077bff39d4,
isDereferenceableForAllocaSize() can recurse into selects,
which is causing a problem for the new test case,
reduced from https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20210412/904154.html
because the replacement (the select) is defined after the first use
of an alloca, so we'd end up with a verifier error.
Now, this new check is too restrictive.
We likely can handle *some* cases, by trying to sink all uses of an alloca
to after the the def.
I think we can use here same logic as for nonnull.
strlen(X) - X must be noundef => valid pointer.
for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones)
Reviewed By: jdoerfert, aqjune
Differential Revision: https://reviews.llvm.org/D95122
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.
Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)
The order is swapped, but in terms of correctness it is still fine.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93923
This commit copies existing tests at llvm/Transforms containing
'shufflevector X, undef' and replaces them with 'shufflevector X, poison'.
The new copied tests have *-inseltpoison.ll suffix at its file name
(as db7a2f347f did)
See https://reviews.llvm.org/D93793
Test files listed using
grep -R -E "^[^;]*shufflevector <.*> .*, <.*> undef" | cut -d":" -f1 | uniq
Test files copied & updated using
file_org=llvm/test/Transforms/$1
if [[ "$file_org" = *-inseltpoison.ll ]]; then
file=$file_org
else
file=${file_org%.ll}-inseltpoison.ll
if [ ! -f $file ]; then
cp $file_org $file
fi
fi
sed -i -E 's/^([^;]*)shufflevector <(.*)> (.*), <(.*)> undef/\1shufflevector <\2> \3, <\4> poison/g' $file
head -1 $file | grep "Assertions have been autogenerated by utils/update_test_checks.py" -q
if [ "$?" == 1 ]; then
echo "$file : should be manually updated"
# The test is manually updated
exit 1
fi
python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file
This commit copies existing tests at llvm/Transforms and replaces
'insertelement undef' in those files with 'insertelement poison'.
(see https://reviews.llvm.org/D93586)
Tests listed using this script:
grep -R -E '^[^;]*insertelement <.*> undef,' . | cut -d":" -f1 | uniq |
wc -l
Tests updated:
file_org=llvm/test/Transforms/$1
file=${file_org%.ll}-inseltpoison.ll
cp $file_org $file
sed -i -E 's/^([^;]*)insertelement <(.*)> undef/\1insertelement <\2> poison/g' $file
head -1 $file | grep "Assertions have been autogenerated by utils/update_test_checks.py" -q
if [ "$?" == 1 ]; then
echo "$file : should be manually updated"
# I manually updated the script
exit 1
fi
python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file
This follows on from D89558 which added the new intrinsic and D88955
which added similar combines for llvm.amdgcn.fmul.legacy.
Differential Revision: https://reviews.llvm.org/D90028
This was broken by 16295d521e, when
instructions started being handled and not just constant
expressions. This was re-inserting an equivalent bitcast to the
original memcpy operand, which made a non-functional IR change on
every iteration.
This also fixes a secondary problem where it was inserting
addrspacecasts which may not have been legal (i.e. it changed the
source address space). Start visiting all pointer users and fail out
if we can't process them. Also start handling the relevant memory
intrinsic users. These cases can be dealt with by running
InferAddressSpaces separately.
The langref already states it does, but this wasn't implemented. Also
covers inalloca and preallocated. Also helps fix a dependence on
pointer element types.
Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.
This patch only implements it for the selection dag.
GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and
GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.
Differential Revision: https://reviews.llvm.org/D84420
When sampling from images with coordinates that only have 16 bit
accuracy, convert the image intrinsic call to use a16 or g16.
This does only happen if the target hardware supports it.
An alternative would be to always apply this combination, independent of
the target hardware and extend 16 bit arguments to 32 bit arguments
during legalization. To me, this sounds like an unnecessary roundtrip
that could prevent some further InstCombine optimizations.
Differential Revision: https://reviews.llvm.org/D85887
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.
D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).
This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic
A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.
This allows to move about 3000 lines out from InstCombine to the targets.
Differential Revision: https://reviews.llvm.org/D81728
v3i16 and v3f16 currently cannot be legalized and lowered so they should
not be emitted by inst combining.
Moved the check down to still allow extracting 1 or 2 elements via the dmask.
Fixes image intrinsics being combined to return v3x16.
Differential Revision: https://reviews.llvm.org/D84223
This should probably be implied for all the speculatable ones. I think
the only ones where this plausibly doesn't apply is s_sendmsghalt and
maybe kill.
We have to assume undef could be an snan, which would need quieting so
returning qnan is safer than undef. Also consider strictfp, and don't
care if the result rounded.
This really belongs in InstructionSimplify since it doesn't introduce
new instructions. Put it in instcombine to avoid increasing the number
of passes considering target intrinsics.
I also noticed that we seem to now be interpreting strictfp attributes
on call sites, so try to handle that.
Consider any constant memory type, not just global constants. AMDGPU
kernel parameters are effectively global constants, but appear as
either reads from an intrinsic derived pointer or function argument.
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.
This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.
Use the new intrinsic in the atomic optimizer pass.
Differential Revision: https://reviews.llvm.org/D65088
Summary:
Add trimming of unused components of s_buffer_load.
For s_buffer_load and unformatted buffer_load also trim unused
components at the beginning of vector and update offset accordingly.
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71785
Summary:
Add trimming of unused components of s_buffer_load.
Extend trimming of *buffer_load to also include
unused components at the beginning of vectors and update offset.
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70315
Summary:
Allow for ignoring the check for a single use in SimplifyDemandedVectorElts
to be able to simplify operands if DemandedElts is known to contain
the union of elements used by all users.
It is a responsibility of a caller of SimplifyDemandedVectorElts to
supply correct DemandedElts.
Simplify a series of extractelement instructions if only a subset of
elements is used.
Reviewers: reames, arsenm, majnemer, nhaehnle
Reviewed By: nhaehnle
Subscribers: wdng, jvesely, nhaehnle, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67345
llvm-svn: 375395
Summary:
This is something of a workaround to avoid a crash later on in type
legalizer (WidenVectorResult()).
Also added some f16 tests, including a non-working v3f16 case with
a FIXME.
Reviewers: arsenm, tpr, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68865
llvm-svn: 374993
Configure TLI to say that r600/amdgpu does not have any library
functions, such that InstCombine does not do anything like turn sin/cos
into the library function @tan with sufficient fast math flags.
Differential Revision: https://reviews.llvm.org/D67406
Change-Id: I02f907d3e64832117ea9800e9f9285282856e5df
llvm-svn: 371592
Should also be marked writeonly, but I think that would require
splitting the version with done set to a separate intrinsic
Test change is only from renumbering the attribute group numbers,
which for some reason the generated check lines consider.
llvm-svn: 363560
I'm not 100% sure about this, since I'm worried about IR transforms
that might end up introducing divergence downstream once replaced with
a constant, but I haven't come up with an example yet.
llvm-svn: 363406
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546