This reverts commit 25a97c3a43.
We have other constant folds that fold undef funnel shift amounts to 0 - so we need to be consistent.
If we end up with regressions where we lose a splat shift amount pattern we'll have to investigate other canonicalizations, but matchFunnelShift currently protects us from that.
This was broken by 16295d521e, when
instructions started being handled and not just constant
expressions. This was re-inserting an equivalent bitcast to the
original memcpy operand, which made a non-functional IR change on
every iteration.
This also fixes a secondary problem where it was inserting
addrspacecasts which may not have been legal (i.e. it changed the
source address space). Start visiting all pointer users and fail out
if we can't process them. Also start handling the relevant memory
intrinsic users. These cases can be dealt with by running
InferAddressSpaces separately.
This reverts the revert commit 710aceb645
and includes a fix for a memsan failure.
Original message:
This patch turns VPMemoryInstructionRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.
Summary:
This patch does the following:
1. Make InitTargetOptionsFromCodeGenFlags() accepts Triple as a
parameter, because some options' default value is triple dependant.
2. DataSections is turned on by default on AIX for llc.
3. Test cases change accordingly because of the default behaviour change.
4. Clang Driver passes in -fdata-sections by default on AIX.
Reviewed By: MaskRay, DiggerLin
Differential Revision: https://reviews.llvm.org/D88737
m_SpecificInt doesn't accept undef elements in a vector splat value - tweak specific_intval to optionally allow undefs and add the m_SpecificIntAllowUndef variants.
Allows us to remove the m_APIntAllowUndef + comparison hack inside matchFunnelShift
By always performing a modulo on the shift amount constants this was causing undef amounts being replaced with zero, meaning we were losing funnel shift by splat (with undef) patterns.
Tweaked the shift amount bounds check to support (passthrough) undefs, and use Constant::mergeUndefsWith to preserve the undefs after folding.
In order to correctly load an all-ones FP NaN value into a floating point
register with a VGBM, the analyzed 32/64 FP bits must first be shifted left
(into element 0 of the vector register).
SystemZVectorConstantInfo has so far relied on element replication which has
bypassed the need to do this shift, but now it is clear that this must be
done in order to handle NaNs.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D89389
When given the -experimental-debug-variable-locations option (via -Xclang
or to llc), have SelectionDAG generate DBG_INSTR_REF instructions instead
of DBG_VALUE. For now, this only happens in a limited circumstance: when
the value referred to is not a PHI and is defined in the current block.
Other situations introduce interesting problems, addresed in later patches.
Practically, this patch hooks into InstrEmitter and if it can find a
defining instruction for a value, gives it an instruction number, and
points the DBG_INSTR_REF at that <instr, operand> pair.
Differential Revision: https://reviews.llvm.org/D85747
While we haven't encountered an earth-shattering problem with this yet,
by now it is pretty evident that trying to model the ptr->int cast
implicitly leads to having to update every single place that assumed
no such cast could be needed. That is of course the wrong approach.
Let's back this out, and re-attempt with some another approach,
possibly one originally suggested by Eli Friedman in
https://bugs.llvm.org/show_bug.cgi?id=46786#c20
which should hopefully spare us this pain and more.
This reverts commits 1fb6104293,
7324616660,
aaafe350bb,
e92a8e0c74.
I've kept&improved the tests though.
Generate (at runtime) the table used to drive getSubRegFromChannel,
base on AMDGPUSubRegIdxRanges from TableGen data.
The is a step closer to it being staticly generated by TableGen and
allows getSubRegFromChannel handle all bitwidths in the mean time.
Reviewed By: rampitec, arsenm, foad
Differential Revision: https://reviews.llvm.org/D89217
Recently we started looking into sret parameters, though the issue could crop
up elsewhere. If the pointee type is opaque, we should not try to compute its
size because that leads to an assertion failure.
This relands commit 53b3873cf4. The failure
of `ConvertUTFTest.UTF16WrappersForConvertUTF16ToUTF8String` detected the
first time is fixed.
Differential Revision: https://reviews.llvm.org/D88824
This patch defines the MIR format for debug instruction references: it's an
integer trailing an instruction, marked out by "debug-instr-number", much
like how "debug-location" identifies the DebugLoc metadata of an
instruction. The instruction number is stored directly in a MachineInstr.
Actually referring to an instruction comes in a later patch, but is done
using one of these instruction numbers.
I've added a round-trip test and two verifier checks: that we don't label
meta-instructions as generating values, and that there are no duplicates.
Differential Revision: https://reviews.llvm.org/D85746
LV fails with assertion checking that UF > 0. We already set UF to 1 if it is 0 except the case when IC > MaxInterleaveCount. The fix is to set UF to 1 for that case as well.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87679
Replace m_SpecificInt with m_APIntAllowUndef to matching splats containing undefs, then use ConstantExpr::mergeUndefsWith to merge the undefs together in the result.
The undef funnel shift amounts are getting replaced with zero later on - I'll address this in a later patch, otherwise we lose potential shift by splat value patterns.
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack. This is wrong and we'd like
to avoid any silent ABI compatibility issues in future. For now,
I've added a fatal error when this happens until we can get a
proper fix.
Differential Revision: https://reviews.llvm.org/D89326
D85703 will need to create shallow wrappers in order to track the spmd icv. We need to make it available.
Differential Revision: https://reviews.llvm.org/D89342
-loop-extract-single is just -loop-extract on one loop.
-loop-extract depended on -break-crit-edges and -loop-simplify in the
legacy PM, but the NPM doesn't allow specifying pass dependencies like
that, so manually add those passes to the RUN lines where necessary.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89016
There's no way to know whether there's a loclist contribution to parse
if there's no loclistx encoding - and if there is one, there's no need
to walk back from the loclist_base (or, uin the case of
info.dwo/loclist.dwo - starting at 0 in the contribution) to parse the
header, instead rely on the DWARF32/64 and address size in the CU
that's already available.
This would come up in split DWARF (non-split wouldn't try to read a
loclist header in the absence of a loclist_base) when one unit had
location lists and another does not (because the loclists.dwo section
would be non-empty in that case - in the case where it's empty the
parsing would silently skip).
Simplify the testing a bit, rather than needing a whole dwp, etc - by
creating a malformed loclists.dwo section (and use single file Split
DWARF) that would trip up any attempt to parse it - but no attempt
should be made.
This reverts 9b5b305023 and fixes the unwanted re-ordering when generating ThinLTO indexes.
The goal of this patch is to better balance thread utilization during ThinLTO in-process linking (in llvm-lto2 or in LLD). Before this patch, large modules would often be scheduled late during execution, taking a long time to complete, thus starving the thread pool.
We now sort modules in descending order, based on each module's bitcode size, so that larger modules are processed first. By doing so, smaller modules have a better chance to keep the thread pool active, and thus avoid starvation when the bitcode compilation is almost complete.
In our case (on dual Intel Xeon Gold 6140, Windows 10 version 2004, two-stage build), this saves 15 sec when linking `clang.exe` with LLD & -flto=thin, /opt:lldltojobs=all, no ThinLTO cache, -DLLVM_INTEGRATED_CRT_ALLOC=d:\git\rpmalloc.
Before patch: 100 sec
After patch: 85 sec
Inspired by the work done by David Callahan in D60495.
Differential Revision: https://reviews.llvm.org/D87966
https://reviews.llvm.org/D88865
This adds a single combine for GlobalISel to fold:
ptradd (inttoptr C1) C2
Into:
C1 + C2
Additionally, a small test for AArch64 is added.
Patch by pnappa.
llvm-cov reports a poor error message when the -arch specifier is
missing or invalid, and a binary has multiple slices. Make the error
message more specific.
(This version of the patch avoids using llvm::none_of -- the way I used
the utility caused compile errors on many bots, possibly because the
wrong overload of `none_of` was selected.)
rdar://40312677
llvm-cov reports a poor error message when the -arch specifier is
missing or invalid, and a binary has multiple slices. Make the error
message more specific.
rdar://40312677
This instruction was introduced in GFX10.3, reusing the opcode of
v_mac_legacy_f32 from GFX10.1.
Differential Revision: https://reviews.llvm.org/D89247
Split out from https://reviews.llvm.org/D66782, use `Optional<MemoryBufferRef>`
in `line_iterator` so you don't need access to a `MemoryBuffer*`. Follow up
patches in `clang/` will leverage this.
Differential Revision: https://reviews.llvm.org/D89280
As preparation for changing `LineIterator` to work with `MemoryBufferRef`:
- Add an `operator==` that uses buffer pointer identity to ensure two buffers
are equivalent.
- Split out `MemoryBufferRef.h`, to avoid polluting `LineIterator.h` includers
with everything from `MemoryBuffer.h`. This also means moving the
`MemoryBuffer` constructor to a source file.
Differential Revision: https://reviews.llvm.org/D89279
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D87544
While promotion currently always has an AST available, it is only
relevant for invalidation purposes in LoopPromoter, so we do not
need to have it as a hard dependency.
This adds an -enable-memcpyopt-memoryssa option that currently does
nothing apart from requiring MSSA as a dependency. The tests are
split to run both with the option disabled and enabled. I went with
this rather than the separate directory DSE uses, as I found it
convenient to have a direct side-by-side comparison of differences.
Differential Revision: https://reviews.llvm.org/D89206
moveUp() moves instructions, so we should move the corresponding
memory accesses as well. We should also move the store instruction
itself: Even though we'll end up removing it later, this gives us
a correct MemoryDef to replace.
The implementation is somewhat more complicated than it should be,
because we also handle the case where P does not have a memory
access due to a degnerate AA pipeline. Hopefully, the need for this
will go away in the future, when the rest of the pass is based on
MSSA.
Differential Revision: https://reviews.llvm.org/D88778
As being pointed out by @efriedma in
https://reviews.llvm.org/rGaaafe350bb65#inline-4883
of course we can't just call ptrtoint in sign-extending case
and be done with it, because it will zero-extend.
I'm not sure what i was thinking there.
This is very much not an NFC, however looking at the user of
BuildConstantFromSCEV() i'm not sure how to actually show that
it results in a different constant expression.
If the memcpy operands are the same (which is allowed since D86815)
then the memcpy is effectively a no-op and the partially overlapping
memset is not dead.
Differential Revision: https://reviews.llvm.org/D89192
MemCpyOpt can shorten a memset if it is later partially overwritten
by a memcpy. It checks that the destination is not read in between,
but we also need to make sure that the destination cannot be observed
via unwinding.
Differential Revision: https://reviews.llvm.org/D89190
This patch adds support for assemble disassemble intrinsics
for MMA.
Reviewed By: bsaleil, #powerpc
Differential Revision: https://reviews.llvm.org/D88739
The previous code added the scope on each iteration, so that the
same scope was represented many times in the same !noalias metadata.
That's legal, and semantically equivalent to only storing the scope
once, but it's also wasteful and may pessimize further optimization
if AATags get intersected naively, as done by the AliasSetTracker.
Implement computeKnownBitsForTargetInstr for G_AMDGPU_BUFFER_LOAD_UBYTE
and G_AMDGPU_BUFFER_LOAD_USHORT. This allows generic combines to remove
some unnecessary G_ANDs.
Differential Revision: https://reviews.llvm.org/D89316
Adds more testing in basic-assembly.s and a new test tables.s.
Adds support to yaml reading and writing of tables as well.
Differential Revision: https://reviews.llvm.org/D88815
Based on the recent patches D88475 and D88429 where we are losing undef values due to extension/comparisons.
I've added a Constant::mergeUndefsWith method that merges the undef scalar/elements from another Constant into a specific Constant.
Differential Revision: https://reviews.llvm.org/D88687
When the first operand is a null pointer we can avoid making a G_PTR_ADD and
make a G_INTTOPTR with the offset operand.
This helps us avoid making add with 0 later on for targets such as AMDGPU.
Differential Revision: https://reviews.llvm.org/D87140
A dynamic linker with lazy binding support may need to handle variant
PCS function symbols specially, so an ELF symbol table marking
STO_AARCH64_VARIANT_PCS [1] was added to address this.
Function symbols that follow the vector PCS are marked via the
.variant_pcs assembler directive, which takes a single parameter
specifying the symbol name and sets the STO_AARCH64_VARIANT_PCS st_other
flag in the object file.
[1] https://github.com/ARM-software/abi-aa/blob/master/aaelf64/aaelf64.rst#st-other-values
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D89138
Much similar to the ZExt/Trunc handling.
Thanks goes to Alexander Richardson for nudging towards noticing this one proactively.
The appropriate (currently crashing) test coverage added.
This passes existing X86 test but I'm not sure if it handles all type
legalization cases it needs to.
Alternative to D89200
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D89222
This reverts commit 432e4e56d3, which reverted 542523a61a. Two issues from
the original commit have been fixed. First, MSVC does not like when std::array
is initialized with only single braces, so this commit switches to using the
more portable double braces. Second, there was a subtle endianness bug that
prevented the original commit from working correctly on big-endian machines,
which has been fixed by switching to using endianness-agnostic bit twiddling
instead of type punning.
Differential Revision: https://reviews.llvm.org/D88773
This can fix an asan failure like below.
==15856==ERROR: AddressSanitizer: use-after-poison on address ...
READ of size 8 at 0x6210001a3cb0 thread T0
#0 llvm::MachineInstr::getParent()
#1 llvm::LiveVariables::VarInfo::findKill()
#2 TwoAddressInstructionPass::rescheduleMIBelowKill()
#3 TwoAddressInstructionPass::tryInstructionTransform()
#4 TwoAddressInstructionPass::runOnMachineFunction()
We need to update the Kills if we replace instructions. The Kills
may be later accessed within TwoAddressInstruction pass.
Differential Revision: https://reviews.llvm.org/D89092
Happened to notice some of these printing as UnknownCode while running llvm-bcanalyzer on a bc file I had.
Differential Revision: https://reviews.llvm.org/D86900
This relands commit 1c021c64ca which was
reverted in commit 17cec6a11a because
an assertion was being triggered, since `BuildConstantFromSCEV()`
wasn't updated to handle the case where the constant we want to truncate
is actually a pointer. I was unsuccessful in coming up with a test case
where we'd end there with constant zext/sext of a pointer,
so i didn't handle those cases there until there is a test case.
Original commit message:
While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.
This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D88806
This restores commit ab1b4810b5 which was
reverted in 01b9deba76, with a fix for the
issue it caused. We should use a temporary BitstreamCursor when
loading the global decl attachment records so that the abbrev ids held
in the lazy loading IndexCursor are not clobbered. Enhanced the test so
that the issue is exposed there.
Original description:
When performing ThinLTO importing, the metadata loader attempts to lazy
load, by building an index. However, module level global decl attachment
metadata was being parsed early while building the index, since the
associated (module level) global values aren't materialized on demand.
This results in the creation of forward reference temporary metadatas,
which are expensive.
Normally, these module level global values don't have much attached
metadata. However, in the case of -fwhole-program-vtables (e.g. for
whole program devirtualization), the vtables may have many attached type
metadatas. This was resulting in very slow performance when performing
ThinLTO importing with the default lazy loading.
This patch restructures the handling of these global decl attachment
records, delaying their parsing until after the lazy loading index has
been built. Then the parser can use the interface that loads from the
index, which resolves forward references immediately instead of creating
expensive temporaries.
For one ThinLTO backend that imports from modules containing huge
numbers of vtables and associated types, I measured the following
compile times for the metadata materialization during function
importing, rounded to nearest second:
No -fwhole-program-vtables:
Lazy loading on (head): 1s
Lazy loading off (head): 3s
Lazy loading on (patch): 1s
With -fwhole-program-vtables:
Lazy loading on (head): 440s
Lazy loading off (head): 4s
Lazy loading on (patch): 2s
Differential Revision: https://reviews.llvm.org/D87970
This patch turns VPMemoryInstructionRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D84680
> While we indeed can't treat them as no-ops, i believe we can/should
> do better than just modelling them as `unknown`. `inttoptr` story
> is complicated, but for `ptrtoint`, it seems straight-forward
> to model it just as a zext-or-trunc of unknown.
>
> This may be important now that we track towards
> making inttoptr/ptrtoint casts not no-op,
> and towards preventing folding them into loads/etc
> (see D88979/D88789/D88788)
>
> Reviewed By: mkazantsev
>
> Differential Revision: https://reviews.llvm.org/D88806
It caused the following assert during Chromium builds:
llvm/lib/IR/Constants.cpp:1868:
static llvm::Constant *llvm::ConstantExpr::getTrunc(llvm::Constant *, llvm::Type *, bool):
Assertion `C->getType()->isIntOrIntVectorTy() && "Trunc operand must be integer"' failed.
See code review for a link to a reproducer.
This reverts commit 1c021c64ca.
If the known shift amount is bigger than or equal to the bitwidth of the type of the value to be shifted,
the result is target dependent, so don't try to infer any bits.
This fixes a crash we've seen in one of our internal test suites.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D89232
The change starts from LiveRangeMatrix and also checks the users of the
APIs are typed accordingly.
Differential Revision: https://reviews.llvm.org/D89145
It's never null - the reason it's modeled as a pointer is because the
pass can't init it in its ctor. Passing by ref simplifies the code, too,
as the null checks were unnecessary complexity.
Differential Revision: https://reviews.llvm.org/D89171
60b852092c introduced SCEV verification to
deleteDeadLoop, but it appears this check is currently a bit over-eager
and some users of deleteDeadLoop appear to only patch up SE after
calling it (e.g. PR47753).
Remove the extra check for now. We can consider adding it back after we
tracked down the source of the inconsistency for PR47753.
Extend loadSRsrcFromVGPR to allow moving a range of instructions into
the loop. The call instruction is surrounded by copies into physical
registers which should be part of the waterfall loop.
Differential Revision: https://reviews.llvm.org/D88291
If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139).
Reapplied after the shift canonicalization in rG02295e6d1a15 which removed the need to flip the shift values.
Differential Revision: https://reviews.llvm.org/D88783
After rG02295e6d1a15 we no longer need to invert the shift values for fshr - this is just hidden at the moment as funnel shifts only ever match for constant values so never use the fshr "Sub on SHL" path.
Based on a discussion on D88783, if we're promoting a funnel shift to a width at least twice the size as the original type, then we can use the 'double shift' patterns (shifting the concatenated sources).
Differential Revision: https://reviews.llvm.org/D89139
VE doesn't have instruction for copysign, so expand it. Add a
regression test also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D89228
VE doesn't have fneg or frem instruction, so change them to expand. Add
regression tests also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D89205
VE doesn't have BRCOND instruction, so need to expand it. Also add
a regression test.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D89173
While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.
This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D88806
I have introduced a new template PolySize class, where the template
parameter determines the type of quantity, i.e. for an element
count this is just an unsigned value. The ElementCount class is
now just a simple derivation of PolySize<unsigned>, whereas TypeSize
is more complicated because it still needs to contain the uint64_t
cast operator, since there are still many places in the code that
rely upon this implicit cast. As such the class also still needs
some of it's own operators.
I've tried to minimise the amount of code in the base PolySize
class, which led to a couple of changes:
1. In some places we were relying on '==' operator comparisons
between ElementCounts and the scalar value 1. I didn't put this
operator in the new PolySize class, and thought it was actually
clearer to use the isScalar() function instead.
2. I removed the isByteSized function and replaced it with calls
to isKnownMultipleOf(8).
I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so
that it's more consistent with coefficientDivideBy.
Differential Revision: https://reviews.llvm.org/D88409
And another step towards transforms not introducing inttoptr and/or
ptrtoint casts that weren't there already.
As we've been establishing (see D88788/D88789), if there is a int<->ptr cast,
it basically must stay as-is, we can't do much with it.
I've looked, and the most source of new such casts being introduces,
as far as i can tell, is this transform, which, ironically,
tries to reduce count of casts..
On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in
-33.58% less `IntToPtr`s (19014 -> 12629)
and +76.20% more `PtrToInt`s (18589 -> 32753),
which is an increase of +20.69% in total.
However just on RawSpeed, where i know there are basically
none `IntToPtr` in the original source code,
this results in -99.27% less `IntToPtr`s (2724 -> 20)
and +82.92% more `PtrToInt`s (4513 -> 8255).
which is again an increase of 14.34% in total.
To me this does seem like the step in the right direction,
we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`,
which seems like a reasonable trade-off.
See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995
for some more discussion on the subject.
(Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair`
should be taught about this, yes)
Reviewed By: nlopes, nikic
Differential Revision: https://reviews.llvm.org/D88979
This expands upon the inloop reductions added in e9761688e41cb9e976,
allowing them to be inserted into tail folded loops. Reductions are
generates with the form:
x = select(mask, vecop, zero)
v = vecreduce.add(x)
c = add chain, v
Where zero here is chosen as the identity value for add reductions. The
backend is then expected to fold the select and the vecreduce into a
single predicated instruction.
Most of the code is fairly straight forward, except for the creation of
blockmasks which need to ensure they are created in dominance order. The
order they are added is altered to be after any phis, keeping the
requirements for the underlying IR.
Differential Revision: https://reviews.llvm.org/D84451
As shown in the affected test, we could increase instruction
count without this limitation. There's another test with extra
use that shows we still convert directly to a real "sext" if
possible.
This is my first LLVM patch, so please tell me if there are any process issues.
The main observation for this patch is that we can lower UMIN/UMAX with v8i16 by using unsigned saturated subtractions in a clever way. Previously this operation was lowered by turning the signbit of both inputs and the output which turns the unsigned minimum/maximum into a signed one.
We could use this trick in reverse for lowering SMIN/SMAX with v16i8 instead. In terms of latency/throughput this is the needs one large move instruction. It's just that the sign bit turning has an increased chance of being optimized further. This is particularly apparent in the "reduce" test cases. However due to the slight regression in the single use case, this patch no longer proposes this.
Unfortunately this argument also applies in reverse to the new lowering of UMIN/UMAX with v8i16 which regresses the "horizontal-reduce-umax", "horizontal-reduce-umin", "vector-reduce-umin" and "vector-reduce-umax" test cases a bit with this patch. Maybe some extra casework would be possible to avoid this. However independent of that I believe that the benefits in the common case of just 1 to 3 chained min/max instructions outweighs the downsides in that specific case.
Patch By: @TomHender (Tom Hender) ActuallyaDeviloper
Differential Revision: https://reviews.llvm.org/D87236
If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139).
Differential Revision: https://reviews.llvm.org/D88783
This exposes the helper for other power-of-2 instcombine folds that I'm intending to add vector support to.
The helper only operated on power-of-2 constants so getExactLogBase2 is a more accurate name.
This patch is a refactoring of how we process spills and allocas during CoroSplit.
In the previous implementation, everything that needs to go to the heap is put into Spills, including all the values defined by allocas.
And the way to identify a Spill, is to check whether there exists a use-def relationship that crosses suspension points.
This approach is fundamentally confusing, and unfortunately, incorrect.
First of all, allocas are always process differently than spills, hence it's quite confusing to put them together. It's a much cleaner to separate them and process them separately.
Doing so simplify lots of code and makes the logic more clear and easier to reason about.
Secondly, use-def relationship is insufficient to decide whether a value defined by AllocaInst needs to go to the heap.
There are many cases where a value defined by AllocaInst can implicitly be used across suspension points without a direct use-def relationship.
For example, you can store the address of an alloca into the heap, and load that address after suspension. Or you can escape the address into an object through a function call.
Or you can have a PHINode that takes two allocas, and this PHINode is used across suspension point (when this happens, the existing implementation will spill the PHINode, a.k.a a stack adddress to the heap!).
All these issues suggest that we need to separate spill and alloca in order to properly implement this.
This patch does not yet fix these bugs, however it sets up the code in a better shape so that we can start fixing them in the next patch.
The core idea of this patch is to add a new struct called FrameDataInfo, which contains all Spills, all Allocas, and a map from each definition to its layout index in the frame (FieldIndexMap).
Spills and Allocas are identified, stored and processed independently. When they are initially added to the frame, we record their field index through FieldIndexMap. When the frame layout is finalized, we update each index into their final layout index.
In doing so, I also cleaned up a few things and also discovered a few other bugs.
Cleanups:
1. Found out that PromiseFieldId is not used, delete it.
2. Previously, SpillInfo is a vector, which is strange because every def can have multiple users. This patch cleans it up by turning it into a map from def to users.
3. Previously, a frame Field struct contains a list of Spills that field corresponds to. This isn't necessary since we only need the layout index for each given definition. This patch removes that list. Instead, we connect each field and definition using the FieldIndexMap.
4. All the loops that process Spills are simplified now because we use a map instead of a vector.
Bugs:
It seems that we are only keeping llvm.dbg.declare intrinsics in the .resume part of the function. The ramp function will no longer has it. This means we are dropping some debug information in the ramp function.
The next step is to start fixing the bugs where the implementation fails to identify some allocas that should live on the frame.
Differential Revision: https://reviews.llvm.org/D88872
The bextri intrinsic has a ImmArg attribute which will be converted
in SelectionDAG using TargetConstant. We previously converted this
to a plain Constant to allow X86ISD::BEXTR to call SimplifyDemandedBits
on it.
But while trying to decide if D89178 was safe, I realized that
this conversion of TargetConstant to Constant would be one case
where that would break.
So this patch adds a new opcode specifically for the immediate case.
And then teaches computeKnownBits and SimplifyDemandedBits to also
handle it, but not try to SimplifyDemandedBits on it. To make up
for that, I immediately masked the constant to 16 bits when
converting from the intrinsic node to the X86ISD node.
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:
* The "Oland" and "Hainan" variants of gfx601 are now split out into
gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
that to avoid using the shaderZExport workaround on gfx602.
* One variant of gfx703 is now split out into gfx705. LLPC and other
front-ends could use that to avoid using the
shaderSpiCsRegAllocFragmentation workaround on gfx705.
* The "TongaPro" variant of gfx802 is now split out into gfx805.
TongaPro has a faster 64-bit shift than its former friends in gfx802,
and a subtarget feature could be set up for that to take advantage of
it. This commit does not make that change; it just adds the target.
V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
so fix the GPUKind order.
Differential Revision: https://reviews.llvm.org/D88916
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
There are a number of places in RDA where we assume the block will not
be empty. This isn't necessarily true for tail predicated loops where we
have removed instructions. This attempt to make the pass more resilient
to empty blocks, not casting pointers to machine instructions where they
would be invalid.
The test contains a case that was previously failing, but recently been
hidden on trunk. It contains an empty block to begin with to show a
similar error.
Differential Revision: https://reviews.llvm.org/D88926
This patch adds support for DWARF attribute DW_AT_rank.
Summary:
Fortran assumed rank arrays have dynamic rank. DWARF attribute
DW_AT_rank is needed to support that.
Testing:
unit test cases added (hand-written)
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89141
MemCpyOpt can hoist stores while load+store pairs into memcpy.
This hoisting can currently result in stores being executed that
weren't guaranteed to execute in the original problem.
Differential Revision: https://reviews.llvm.org/D89154
Currently we allow passing pointers from deopt bundle on VReg only if
they were seen in list of gc-live pointers passed on VRegs.
This means that for the case of empty gc-live bundle we spill deopt
bundle's pointers. This change allows lowering deopt pointers to VRegs
in case of empty gc-live bundle. In case of non-empty gc-live bundle,
behavior does not change.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D88999
This patch introduce files that just enough for lib/Target/CSKY to compile.
Notably a basic CSKYTargetMachine and CSKYTargetInfo.
Differential Revision: https://reviews.llvm.org/D88466
Summary:
The current instruction sink pass uses findNearestCommonDominator of all users to find block to sink the instruction to.
However, a user may be in a dead block, which will result in unexpected behavior.
This patch handles such cases by skipping dead blocks. This patch fixes:
https://bugs.llvm.org/show_bug.cgi?id=47415
Reviewers:
MaskRay, arsenm
Differential Revision:
https://reviews.llvm.org/D89166
The notrack prefix is a relaxation of CET policies which makes it possible to indirectly call targets which do not have an ENDBR instruction in the landing address. To emit a call with this prefix, the special attribute "nocf_check" is used. When used as a function attribute, a CallInst targeting the respective function will return true for the method "doesNoCfCheck()", no matter if it is a direct call (and such should remain like this, as the information that the to-be-called function won't perform control-flow checks is useful in other contexts). Yet, when emitting an X86ISD::NT_CALL, the respective CallInst should be verified for its indirection, allowing that the prefixed calls are only emitted in the right situations.
Update the respective testing unit to also verify for direct calls to functions with ''nocf_check'' attributes.
The bug can also be reproduced through compiling the following C code using the -fcf-protection=full flag.
int __attribute__((nocf_check)) foo(int a) {};
int main() {
foo(42);
}
Differential Revision: https://reviews.llvm.org/D87320
If a module has many values that need to be resolved by
ResolvedUndefsIn, compilation takes quadratic time overall. Solve should
do a small amount of work, since not much is added to the worklists each
time markOverdefined is called. But ResolvedUndefsIn is linear over the
length of the function/module, so resolving one undef at a time is
quadratic in general.
To solve this, make ResolvedUndefsIn resolve every undef value at once,
instead of resolving them one at a time. This loses a little
optimization power, but can be a lot faster.
We still need a loop around ResolvedUndefsIn because markOverdefined
could change the set of blocks that are live. That should be uncommon,
hopefully. We could optimize it by tracking which blocks transition from
dead to live, instead of iterating over the whole module to find them.
But I'll leave that for later. (The whole function will become a lot
simpler once we start pruning branches on undef.)
The regression test changes seem minor. The specific cases in question
could probably be optimized with a bit more work, but they seem like
edge cases that don't really matter.
Fixes an "infinite" compile issue my team found on an internal workoad.
Differential Revision: https://reviews.llvm.org/D89080
This reverts commit 6537004913. This is causing test failures internally, and while a few of the cases turned out to be bad user code (relying on a specific order of static initialization across translation units), some cases are less clear. Temporarily reverting for now, and Teresa is going to follow up with more details.
It is only used in weightCalcHelper, and cleared upon its finishing its
job there.
The patch further cleans up style guide discrepancies, and simplifies
CopyHint by removing duplicate 'IsPhys' information (it's what the Reg
field would report).
The selection of HVX shuffles can produce more nodes in the DAG,
which need special handling, or otherwise they would be left
unselected by the main selection code. Make the handling of such
nodes more general.
I suspect getAddressFromInstr and addFullAddress are not handling
all addresses cases properly based on a report from MaskRay.
So just copy the operands directly. This should be more efficient
anyway.
The expansion code creates a copy to RBX before the real LCMPXCHG16B.
It's possible this copy uses a register that is also used by the
real LCMPXCHG16B. If we set the kill flag on the use in the copy,
then we'll fail the machine verifier on the use on the LCMPXCHG16B.
Differential Revision: https://reviews.llvm.org/D89151
Or else on optnone functions we get the following during instruction selection:
fatal error: error in backend: Cannot select: intrinsic %llvm.preserve.struct.access.index
Currently the -O0 pipeline doesn't properly run passes registered via
TargetMachine::registerPassBuilderCallbacks(), so don't add that RUN
line yet. That will be fixed after this.
Reviewed By: yonghong-song
Differential Revision: https://reviews.llvm.org/D89083
There are cases that generated OpenMP code consists of multiple,
consecutive OpenMP parallel regions, either due to high-level
programming models, such as RAJA, Kokkos, lowering to OpenMP code, or
simply because the programmer parallelized code this way. This
optimization merges consecutive parallel OpenMP regions to: (1) reduce
the runtime overhead of re-activating a team of threads; (2) enlarge the
scope for other OpenMP optimizations, e.g., runtime call deduplication
and synchronization elimination.
This implementation defensively merges parallel regions, only when they
are within the same BB and any in-between instructions are safe to
execute in parallel.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D83635
In the NPM, a pass cannot depend on another non-analysis pass. So pin
the test that tests that -lowerswitch is run automatically to legacy PM.
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.org/D89051
Following on from D88890, this makes the newly added patterns
conditional on NoFP32Denormals. mad/mac f32 instructions always flush
denormals regardless of the MODE register setting, and I believe the
legacy variants do the same.
Differential Revision: https://reviews.llvm.org/D89123
There might be a better way to specify the pre-conditions,
but this is hopefully clearer than the way it was written:
https://rise4fun.com/Alive/Jhk3
Pre: C2 < 0 && isShiftedMask(C2) && (C1 == C1 & C2)
%a = and %x, C2
%r = add %a, C1
=>
%a2 = add %x, C1
%r = and %a2, C2
We cannot guarantee that the replacement expression is loop-invariant in
all AddRecs in the source expression. Use a rewriter that skips
AddRecExpr for now.
Fixes PR47776.
IV widening is sometimes a strictly harmful transform (some examples
of this are shown in tests 11, 12 in widen-loop-comp.ll). One of the
reasons of this is that sometimes SCEV fails to prove some facts after
part of guards has been widened.
Though each single such case looks like a bug that can be addressed,
it seems that disabling of IV widening may be profitable in some cases.
We want to have an option to do so. By default, existing behavior is
preserved and IV widening is on.
Summary: This patch is derived from D87384.
In this patch we expand the existing decomposition of mul-by-constant to be more general by implementing 2 patterns:
```
mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M))
mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M))
```
The conversion will be trigged if the multiplier is a big constant that the target can't use a single multiplication instruction to handle. This is controlled by the hook `decomposeMulByConstant`.
More over, the conversion benefits from an ILP improvement since the instructions are independent. A case with the sequence like following also gets benefit since a shift instruction is saved.
```
*res1 = a * 0x8800;
*res2 = a * 0x8080;
```
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D88201
It is possible to get a fltSemantics of a particular Type,
but there is no way to produce a Type based on a
fltSemantics.
This adds the function Type::getFloatingPointTy, which
will return the appropriate floating point Type for a given
fltSemantics.
ConstantFP is modified to use this function instead of
implementing it itself. Also some minor refactors to use
Type::getFltSemantics instead of a hand-rolled version.
Differential Revision: https://reviews.llvm.org/D87512
This patch makes the opcode_base and the standard_opcode_lengths fields
of the line table optional. When both of them are not specified,
yaml2obj emits them according to the line table's version.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D88355
In `rescheduleKillAboveMI`, current implementation uses `SmallSet` to track reg's defs and uses. When comparing, use `SmallSet.count` to find out if it's clobbered or used. It's not correct if involving subregisters. This patch uses `regOverlapsSet` already used by `rescheduleMIBelowKill` to fix the issue.
Fixed https://bugs.llvm.org/show_bug.cgi?id=47707.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D88716
Annoyingly vectors aren't supported by shouldChangeType(), but we have precedents for always performing this on vector types (e.g. narrowBinOp).
Differential Revision: https://reviews.llvm.org/D89067
The IRTranslator depends on the branch probability info pass when the
optimization level is different than None and it depends all the time on
the StackProtector pass.
We have to explicitly call out pass dependencies otherwise the pass manager
may not be able to schedule the IRTranslator.
Before this patch, we were lucky because previous passes depend on the branch
probability info pass (like the Global Variable Optimization) and the stack
protector pass is initialized in initializeCodeGen.
However, if the target has a custom pipeline without any passes like Global
Variable Optimization, the pipeline creation will fail, at least because of
the branch probability info pass dependency (it is unlikely that
initializeCodeGen is not called).
This patch adds the missing dependencies to the IRTranslator.
Differential Revision: https://reviews.llvm.org/D89063
Use cast<> as we immediately dereference the pointer afterwards - cast<> will assert if we fail.
Prevents clang static analyzer warning that we could deference a null pointer.
Pre-conditions seem to be optimal, but we don't need a use check
because we are only replacing an add with a sub.
https://rise4fun.com/Alive/hzN
Pre: (~C1 | C2 == -1) && isPowerOf2(C2+1)
%m = and i8 %x, C1
%f = xor i8 %m, C2
%r = add i8 %f, C3
=>
%r = sub i8 C2 + C3, %m
In LowerEmscriptenEHSjLj, `longjmp` used to be replaced with
`emscripten_longjmp_jmpbuf(jmp_buf*, i32)`, which will eventually be
lowered to `emscripten_longjmp(i32, i32)`. The reason we used two
different names was because they had different signatures in the IR
pass.
D88697 fixed this by only using `emscripten_longjmp(i32, i32)` and
adding a `ptrtoint` cast to its first argument, so
```
longjmp(buf, 0)
```
becomes
```
emscripten_longjmp((i32)buf, 0)
```
But this assumed all uses of `longjmp` was a direct call to it, which
was not the case. This patch handles indirect uses of `longjmp` by
replacing
```
longjmp
```
with
```
(i32(*)(jmp_buf*, i32))emscripten_longjmp
```
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D89032
This patch refactors the logic in ValueTracking.cpp so that
computeKnownBitsForMul now uses a helper function from KnownBits.
NFC
Differential Revision: https://reviews.llvm.org/D88935
This patch lets the bb_addr_map (renamed to __llvm_bb_addr_map) section use a special section type (SHT_LLVM_BB_ADDR_MAP) instead of SHT_PROGBITS. This would help parsers, dumpers and other tools to use the sh_type ELF field to identify this section rather than relying on string comparison on the section name.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D88199
Use cast<> as we immediately dereference the pointer afterwards - cast<> will assert if we fail.
Prevents clang static analyzer warning that we could deference a null pointer.
We were checking if the ConstantSDNode was null but then immediately dereferencing it afterward - fold these both into a single check. Use the APInt::ult() helper as well.
Found by clang static analyzer.
Note that all subtargets up to GFX10.1 have v_mad_legacy_f32, but GFX8/9
lack v_mac_legacy_f32. GFX10.3 has no mad/mac f32 instructions at all.
Differential Revision: https://reviews.llvm.org/D88890
SUMMARY:
In IBM compiler xlclang , there is an option -fnovisibility which suppresses visibility. For more details see: https://www.ibm.com/support/knowledgecenter/SSGH3R_16.1.0/com.ibm.xlcpp161.aix.doc/compiler_ref/opt_visibility.html.
We need to add the option -mignore-xcoff-visibility for compatibility with the IBM AIX OS (as the option is enabled by default in AIX). With this option llvm does not emit any visibility attribute to ASM or XCOFF object file.
The option only work on the AIX OS, for other non-AIX OS using the option will report an unsupported options error.
In AIX OS:
1.1 the option -mignore-xcoff-visibility is enabled by default , if there is not -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command .
1.2 if there is -fvisibility=* explicitly but not -mignore-xcoff-visibility explicitly in the clang command. it will generate visibility attributes.
1.3 if there are both -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command. The option "-mignore-xcoff-visibility" wins , it do not emit the visibility attribute.
The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR.
Reviewer: daltenty,Jason Liu
Differential Revision: https://reviews.llvm.org/D87451
Complete basic PR46895 fixes by refactoring D87452/D88402 to allow us to match non-uniform constant values.
We still don't handle non-uniform vectors that contain undef elements, but that can wait until we have a decent generic mechanism for this.
Differential Revision: https://reviews.llvm.org/D88420
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before transformation compute SCEV for each
@llvm.dbg.value in the loop body and store it (along side its current
DIExpression). After transformation update those @llvm.dbg.value now
referencing undef by comparing its stored SCEV to the SCEV of the
current loop-header PHI-nodes. Allow match with offset by inserting
compensation code in the DIExpression.
Includes fix for the nullptr deref that caused the original commit
to be reverted in 9d63029770.
Fixes : PR38815
Differential Revision: https://reviews.llvm.org/D87494
First step towards extending the existing rotation support to full funnel shift handling now that the backend legalization support has improved.
This enables us to match the shift by constant cases, which are pretty trivial to expand again if necessary.
D88420 will add non-uniform support for funnel shifts as well once its been finalized.
Differential Revision: https://reviews.llvm.org/D88834
The existing code ignores undef values which matches m_SpecificInt_ICMP, although m_SpecificInt_ICMP returns false for an all-undef constant, I've added test coverage at rGfe0197e194a64f9 to show that undef folding should already have dealt with that case.
ExpandUnalignedLoad/Store can sometimes produce unnecessary copies to
temporary stack slot. We should prefer splitting vectors if possible.
Differential Revision: https://reviews.llvm.org/D88882
We currently collect the ICmp and Add from an induction variable,
marking them as dead so that vplan values are not created for them. This
extends that to include any single use trunk from the ICmp, which allows
the Add to more readily be removed too.
This can help with costing vplan nodes, as the ICmp and Add are more
reliably removed and are not double-counted.
Differential Revision: https://reviews.llvm.org/D88873
The initial version of the patch was reverted because it missed the check that
the predicate being proved is actually guarded by this check on 1st iteration.
If it was not executed on 1st iteration (but possibly executes after that), then
it is incorrect to use reasoning about IV start to prove it.
Added the test where the miscompile was seen. Unfortunately, my attempts
to reduce it with bugpoint did not succeed; it can further be reduced when
we understand how to do it without losing the initial bug's notion.
Returning assuming the miscompiles are now gone.
Differential Revision: https://reviews.llvm.org/D88208
This removes "VerifyEachPass" parameters from a lot of functions which is nice.
Don't verify after special passes or VerifierPass.
This introduces verification on loop and cgscc passes, verifying the corresponding function/module.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D88764
Currently, bpf backend Instruction section DAG2DAG phase has
an optimization to replace loading constant struct memeber
or array element with direct values. The reason is that these
locally defined struct or array variables may have their
initial values stored in a readonly section and early bpf
ecosystem is not able to handle such cases.
Bpf ecosystem now can not only handle readonly sections,
but also global variables. global variable can also have
initialized data and global variable may or may not be constant,
i.e., global variable data can be put in .data section or .rodata
section. This exposed a bug in DAG2DAG Load optimization
as it did not check whether the global variable is constant
or not.
This patch fixed the bug by checking whether global variable,
representing the initial data, is constant or not and will not
do optimization if it is not a constant.
Another bug is also fixed in this patch to check whether
the load is simple (not volatile/atomic) or not. If it is
not simple, we will not do optimization. To summary for
globals:
- struct t var = { ... } ; // no load optimization
- const struct t var = { ... }; // load optimization is possible
- volatile const struct t var = { ... }; // no load optimization
Differential Revision: https://reviews.llvm.org/D89021
We saw the same assertion failure mentioned here
https://bugs.llvm.org/show_bug.cgi?id=42063 in our internal tests.
The failure happens in the same circumstance as D47898 and D66814 where
uniqueing of DICompositeTypes causes `Mapper::mapValue` to be called on
GlobalValues(`G`) from a not-yet-linked module(`M`). The following type-mapping for
`G` may not complete correctly (fail to unique types etc. depending on the
the complexity of the types) because IRLinker::computeTypeMapping is not done for `M`
in this path.
D47898 and D66814 fixed some type-mapping issue after Mapper::mapValue
is called on `G`. However, it seems it did not handle some complex cases. I
think we should delay linking globals like `G` until its owing module is
linked. In this way, we could save unnecessary type mapping and prune
these corner cases. It is also supposed to reduce the total number of structs
ending up in the combined module.
D47898 is reverted (its test is kept) because it regresses the test case here.
D66814 could also be reverted (the `check-all` looks good). But it looks reasonable
anyway, so I thought I should keep it.
Also tested the patch with clang self-host regularLTO/ThinLTO build, things look
good as well.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D87001
The pass is updated to handle loads through complex addressing mode,
specifically, when we have a scaled register and a scale.
It requires two API updates in TII which have been implemented for X86.
See added IR and MIR testcases.
Tests-Run: make check
Reviewed-By: reames, danstrushin
Differential Revision: https://reviews.llvm.org/D87148
We need to use LCMPXCHG16B_SAVE_RBX if RBX/EBX is being used as
the frame pointer. We previously checked for this during type
legalization, but that's too early to know for sure if the base
pointer is needed.
This patch adds a new pseudo instruction to emit from isel that
uses a virtual register for the RBX input. Then we use the custom
inserter hook to emit LCMPXCHG16B if RBX isn't needed as a base
pointer or LCMPXCHG16B_SAVE_RBX if it is.
Fixes PR42064.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D88808
Currently, AsmPrinter code is organized in a way in which the labels of address-taken blocks are emitted in the previous section, which makes the relocation incorrect.
This patch reorganizes the code to switch to the basic block section before handling address-taken blocks.
Reviewed By: snehasish, MaskRay
Differential Revision: https://reviews.llvm.org/D88517
Currently LAA uses getScalarSizeInBits to compute the size of an element
when computing the end bound of an access.
This does not work as expected for pointers to pointers, because
getScalarSizeInBits will return 0 for pointer types.
By using DataLayout to get the size of the element we can also correctly
handle pointer element types.
Note the changes to the existing test, which seems to also use the wrong
offset for the end.
Fixes PR47751.
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D88953
The STRICT was causing unnecessary confusion. I think SEQ is a more accurate
name for what they actually do, and the other obvious option of "ORDERED"
has the issue of already having a meaning in FP contexts.
Differential Revision: https://reviews.llvm.org/D88791
Renaming for some Emscripten EH functions has so far been done in
wasm-emscripten-finalize tool in Binaryen. But recently we decided to
make a compilation/linking path that does not rely on
wasm-emscripten-finalize for modifications, so here we move that
functionality to LLVM.
Invoke wrappers are generated in LowerEmscriptenEHSjLj pass, but final
wasm types are not available in the IR pass, we need to rename them at
the end of the pipeline.
This patch also removes uses of `emscripten_longjmp_jmpbuf` in
LowerEmscriptenEHSjLj pass, replacing that with `emscripten_longjmp`.
`emscripten_longjmp_jmpbuf` is lowered to `emscripten_longjmp`, but
previously we generated calls to `emscripten_longjmp_jmpbuf` in
LowerEmscriptenEHSjLj pass because it takes `jmp_buf*` instead of `i32`.
But we were able use `ptrtoint` to make it use `emscripten_longjmp`
directly here.
Addresses:
https://github.com/WebAssembly/binaryen/issues/3043https://github.com/WebAssembly/binaryen/issues/3081
Companions:
https://github.com/WebAssembly/binaryen/pull/3191https://github.com/emscripten-core/emscripten/pull/12399
Reviewed By: dschuff, tlively, sbc100
Differential Revision: https://reviews.llvm.org/D88697
(Based on D87170 by dsanders)
I recently had need to call out to an external API to emit a JSON object as part
of one an LLVM tool was emitting. However, our JSON support didn't provide a way
to delegate part of the JSON output to that API.
Add rawValueBegin() and rawValueEnd() to maintain and check the internal state
while something else is writing to the stream. It's the users responsibility to
ensure that the resulting JSON output is still valid.
Differential Revision: https://reviews.llvm.org/D88902
Add an IR phase right before main module optimization.
This is to modify IR to restrict certain downward optimizations
in order to generate verifier friendly code.
> prevent certain instcombine optimizations, handling both
in-block/cross-block instcombines.
> avoid speculative code motion if the variable used in
condition is also used in the later blocks.
Internally, a bpf IR builtin
result = __builtin_bpf_passthrough(seq_num, result)
is used to enforce ordering. This builtin is only used
during target independent IR optimizations and it will
be removed at the beginning of target dependent IR
optimizations.
For example, removing the following workaround,
--- a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
+++ b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
@@ -47,7 +47,7 @@ int sysctl_tcp_mem(struct bpf_sysctl *ctx)
/* a workaround to prevent compiler from generating
* codes verifier cannot handle yet.
*/
- volatile int ret;
+ int ret;
this patch is able to generate code which passed the verifier.
To disable optimization, users need to use "opt" command like below:
clang -target bpf -O2 -S -emit-llvm -Xclang -disable-llvm-passes test.c
// disable icmp serialization
opt -O2 -bpf-disable-serialize-icmp test.ll | llvm-dis > t.ll
// disable avoid-speculation
opt -O2 -bpf-disable-avoid-speculation test.ll | llvm-dis > t.ll
llc t.ll
Differential Revision: https://reviews.llvm.org/D85570
This allows overload sets containing function_ref arguments to work correctly
Otherwise they're ambiguous as anything "could be" converted to a function_ref.
This matches proposed std::function_ref, absl::function_ref, etc.
Differential Revision: https://reviews.llvm.org/D88901
The `Group` class represents a group section and it is
named inconsistently with other sections which all has
the "Section" suffix. It is sometimes confusing,
this patch addresses the issue.
Differential revision: https://reviews.llvm.org/D88892
In some cases, we can negate instruction if only one of it's operands
negates. Previously, we assumed that constants would have been
canonicalized to RHS already, but that isn't guaranteed to happen,
because of InstCombine worklist visitation order,
as the added test (previously-hanging) shows.
So if we only need to negate a single operand,
we should ensure ourselves that we try constant operand first.
Do that by re-doing the complexity sorting ourselves,
when we actually care about it.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47752
Summary:
This implements a workaround for a hardware bug in gfx8 and gfx9,
where register usage is not estimated correctly for image_store and
image_gather4 instructions when D16 is used.
Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81172
We were already doing this for integer constants. This patch implements
the same thing for floating point constants.
Differential Revision: https://reviews.llvm.org/D88570
`LLVM-Unit :: Support/./SupportTests/ConvertUTFTest.ConvertUTF16LittleEndianToUTF8String`
`FAIL`s on Solaris/sparcv9:
In `llvm/lib/Support/ConvertUTFWrapper.cpp` (`convertUTF16ToUTF8String`)
the `SrcBytes` arg is reinterpreted/accessed as `UTF16` (`unsigned short`,
which requires 2-byte alignment on strict-alignment targets like Sparc)
without anything guaranteeing the alignment, so the access yields a
`SIGBUS`.
This patch avoids this by enforcing the required alignment in the callers.
Tested on `sparcv9-sun-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D88824
And another step towards transformss not introducing inttoptr and/or
ptrtoint casts that weren't there already.
In this case, when load/store uses have conflicting types,
instead of falling back to the iN, we can try to use allocated sub-type.
As disscussed, this isn't the best idea overall (we shouldn't rely on
allocated type), but it works fine as a temporary measure.
I've measured, and @ `-O3` as of vanilla llvm test-suite + RawSpeed,
this results in +0.05% more bitcasts, -5.51% less inttoptr
and -1.05% less ptrtoint (at the end of middle-end opt pipeline)
See https://bugs.llvm.org/show_bug.cgi?id=47592
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D88788
This patch fixed two issues related with relocation globals.
In LLVM, if a global, e.g. with name "g", is created and
conflict with another global with the same name, LLVM will
rename the global, e.g., with a new name "g.2". Since
relocation global name has special meaning, we do not want
llvm to change it, so internally we have logic to check
whether duplication happens or not. If happens, just reuse
the previous global.
The first bug is related to non-btf-id relocation
(BPFAbstractMemberAccess.cpp). Commit 54d9f743c8
("BPF: move AbstractMemberAccess and PreserveDIType passes
to EP_EarlyAsPossible") changed ModulePass to FunctionPass,
i.e., handling each function at a time. But still just
one BPFAbstractMemberAccess object is created so module
level de-duplication still possible. Commit 40251fee00
("[BPF][NewPM] Make BPFTargetMachine properly adjust NPM optimizer
pipeline") made a change to create a BPFAbstractMemberAccess
object per function so module level de-duplication is not
possible any more without going through all module globals.
This patch simply changed the map which holds reloc globals
as class static, so it will be available to all
BPFAbstractMemberAccess objects for different functions.
The second bug is related to btf-id relocation
(BPFPreserveDIType.cpp). Before Commit 54d9f743c8, the pass
is a ModulePass, so we have a local variable, incremented for
each instance, and works fine. But after Commit 54d9f743c8,
the pass becomes a FunctionPass. Local variable won't work
properly since different functions will start with the same
initial value. Fix the issue by change the local count variable
as static, so it will be truely unique across the whole module
compilation.
Differential Revision: https://reviews.llvm.org/D88942
we now get noAlias result for a call instruction and other
load/store/call instructions if we query mayAlias.
This is not right as call instruction is not with mayloadorstore,
but it may alter the memory.
This patch fixes this wrong alias query.
Differential Revision: https://reviews.llvm.org/D87490
The old function attribute deduction pass ignores reads of constant
memory and we need to copy this behavior to replace the pass completely.
First step are constant globals. TBAA can also describe constant
accesses and there are other possibilities. We might want to consider
asking the alias analyses that are available but for now this is simpler
and cheaper.
If the function is not assumed `noreturn` we should not wait for an
update to mark the call site as "may-return".
This has two kinds of consequences:
- We have less iterations in many tests.
- We have less deductions based on "known information" (since we ask
earlier, point 1, and therefore assumed information is not "known"
yet).
The latter is an artifact that we might want to tackle properly at some
point but which is not easily fixable right now.
Report a fatal error if an IMAGE_REL_AMD64_ADDR32NB cannot be applied due to an
out-of-range target. Previously we emitted a diagnostic to llvm::errs and
continued.
Patch by Dale Martin. Thanks Dale!
This is one of many subsequent similar changes. Note that we're ok with
the parameter being typed as MCPhysReg, as MCPhysReg -> MCRegister is a
correct conversion; Register -> MCRegister assumes the former is indeed
physical, so we stop relying on the implicit conversion and use the
explicit, value-asserting asMCReg().
Differential Revision: https://reviews.llvm.org/D88862
Previously we wrote multi-byte values out as-is from host memory. Use
the `emitIntN` helpers in `MCStreamer` to produce a valid descriptor
irrespective of the host endianness.
Reviewed By: arsenm, rochauha
Differential Revision: https://reviews.llvm.org/D88858
Introduce a utility function to make it more
convenient to write code that is the same on
the GFX9 and GFX10 subtargets.
Use isGFX9Plus in the AsmParser for AMDGPU.
Authored By: Joe_Nash
Differential Revision: https://reviews.llvm.org/D88908
As part of PR45974, we're getting closer to not creating 'padded' vectors on-the-fly in combineX86ShufflesRecursively, and only pad the source inputs if we have a definite match inside combineX86ShuffleChain.
At the moment combineX86ShuffleChain just has to bitcast an input to the correct shuffle type, but eventually we'll need to pad them as well. So, move the bitcast into a 'CanonicalizeShuffleInput helper for now, making the diff for future padding support a lot smaller.
The call slot optimization has some home-grown code for checking
whether the destination is dereferenceable. Replace this with the
generic isDereferenceableAndAlignedPointer() helper.
I'm not checking alignment here, because that is currently handled
separately and may be an enforced alignment for allocas. The clean
way of integrating that part would probably be to accept a callback
in isDereferenceableAndAlignedPointer() for the actual isAligned check,
which would then have a chance to use an enforced alignment instead.
This allows the destination to be a GEP (among other things), though
the two open TODOs may prevent it from working in practice.
Differential Revision: https://reviews.llvm.org/D88805
When performing call slot optimization for a non-local destination,
we need to check whether there may be throwing calls between the
call and the copy. Otherwise, the early write to the destination
may be observable by the caller.
This was already done for call slot optimization of load/store,
but not for memcpys. For the sake of clarity, I'm moving this check
into the common optimization function, even if that does need an
additional instruction scan for the load/store case.
As efriedma pointed out, this check is not sufficient due to
potential accesses from another thread. This case is left as a TODO.
Differential Revision: https://reviews.llvm.org/D88799
PR47632
This allows MC to match `data32 ...` as one instruction instead of two (data32 without insn + insn).
The compatibility with GNU as improves: `data32 ljmp` will be matched as ljmpl.
`data32 lgdt 4(%eax)` will be matched as `lgdtl` (prefixes: 0x67 0x66, instead
of 0x66 0x67).
GNU as supports many other `data32 *w` as `*l`. We currently just hard code
`data32 callw` and `data32 ljmpw`. Generalizing the suffix replacement is
tricky and requires a think about the "bwlq" appending suffix rules in MatchAndEmitATTInstruction.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88772
This involves porting BPFAbstractMemberAccess and BPFPreserveDIType to
NPM, then adding them BPFTargetMachine::registerPassBuilderCallbacks
(the NPM equivalent of adjustPassManager()).
Reviewed By: yonghong-song, asbirlea
Differential Revision: https://reviews.llvm.org/D88855
When we assume a return value is dead we might still visit return
instructions via `Attributor::checkForAllReturnedValuesAndReturnInsts(..)`.
When we do so the "returned value" is potentially simplified to `undef`
as it is the assumed "returned value". This is a problem if there was a
preexisting `noundef` attribute that will only be removed as we manifest
the `undef` return value. We should not use this combination to derive
`unreachable` though. Two test cases fixed.
In AAMemoryBehaviorFloating we used to track benign uses in a SetVector.
With this change we look through benign uses eagerly to reduce the
number of elements (=Uses) we look at during an update.
The test does actually not fail prior to this commit but I already wrote
it so I kept it.
We know V is a IntToPtrInst or PtrToIntInst type so we know its a CastInst - so use cast<> directly.
Prevents clang static analyzer warning that we could deference a null pointer.
This still only gets used for scalar types but now always uses ConstantExpr in preparation for vector support - it was using APInt methods in some places.
This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV.
Differential Revision: https://reviews.llvm.org/D87836
This patch makes the parser
- reject higher vector registers (>=16) in operands where they should not
be accepted.
- accept higher integers (>=16) in vector register operands.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D88888
This diff adds support for universal binaries to llvm-objcopy.
This is a recommit of 32c8435ef7 with the asan issue fixed.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D88400
Current Statepoint MI format is this:
STATEPOINT
<id>, <num patch bytes >, <num call arguments>, <call target>,
[call arguments...],
<StackMaps::ConstantOp>, <calling convention>,
<StackMaps::ConstantOp>, <statepoint flags>,
<StackMaps::ConstantOp>, <num deopt args>, [deopt args...],
<gc base/derived pairs...> <gc allocas...>
Note that GC pointers are listed in pairs <base,derived>.
This causes base pointers to appear many times (at least twice) in
instruction, which is bad for us when VReg lowering is ON.
The problem is that machine operand tiedness is 1-1 relation, so
it might look like this:
%vr2 = STATEPOINT ... %vr1, %vr1(tied-def0)
Since only one instance of %vr1 is tied, that may lead to incorrect
codegen (see PR46917 for more details), so we have to always spill
base pointers. This mostly defeats new VReg lowering scheme.
This patch changes statepoint instruction format so that every
gc pointer appears only once in operand list. That way they all can
be tied. Additional set of operands is added to preserve base-derived
relation required to build stackmap.
New statepoint has following format:
STATEPOINT
<id>, <num patch bytes>, <num call arguments>, <call target>,
[call arguments...],
<StackMaps::ConstantOp>, <calling convention>,
<StackMaps::ConstantOp>, <statepoint flags>,
<StackMaps::ConstantOp>, <num deopt args>, [deopt args...],
<StackMaps::ConstantOp>, <num gc pointers>, [gc pointers...],
<StackMaps::ConstantOp>, <num gc allocas>, [gc allocas...]
<StackMaps::ConstantOp>, <num entries in gc map>, [base/derived indices...]
Changes are:
- every gc pointer is listed only once in a flat length-prefixed list;
- alloca list is prefixed with its length too;
- following alloca list is length-prefixed list of base-derived
indices of pointers from gc pointer list. Note that indices are
logical (number of pointer), not absolute (index of machine operand).
Differential Revision: https://reviews.llvm.org/D87154
uint8_t types are implicitly promoted to int, leading to a
unsigned-signed comparison.
Thanks for the heads-up @uabelho.
Differential Revision: https://reviews.llvm.org/D88876
In DAGCombiner::ForwardStoreValueToDirectLoad I have fixed up some
implicit casts from TypeSize -> uint64_t and replaced calls to
getVectorNumElements() with getVectorElementCount(). There are some
simple cases of forwarding that we can definitely support for
scalable vectors, i.e. when the store and load are both scalable
vectors and have the same size. I have added tests for the new
code paths here:
CodeGen/AArch64/sve-forward-st-to-ld.ll
Differential Revision: https://reviews.llvm.org/D87098
This patch enables basic BSS section handling, and improves a couple of error
messages in the ELF section parsing code.
Patch by Christian Schafmeister. Thanks Christian!
Differential Revision: https://reviews.llvm.org/D88867
Drop `noundef` for return values that are replaced by void and make it
illegal to put `noundef` on a void value.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87306
Alignment attributes need to be dropped for non-pointer values.
This also introduces a check into the verifier to ensure you don't use
`align` on anything but a pointer. Test needed to be adjusted
accordingly.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87304
Use context to prove that load can be safely executed at a point where load is being hoisted.
Postpone the decision about safety of speculative load execution till the moment we know
where we hoist load and check safety at that context.
Reviewers: nikic, fhahn, mkazantsev, lebedev.ri, efriedma, reames
Reviewed By: reames, mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D88725
This makes the NPM skip not required passes on functions marked optnone.
If this causes a pass that should be required but has not been marked
required to be skipped, add
`static bool isRequired() { return true; }`
to the pass class. AlwaysInlinerPass is an example.
clang/test/CodeGen/O0-no-skipped-passes.c is useful for checking that
no passes are skipped under -O0.
The -enable-npm-optnone option will be removed once this has been stable
for long enough without issues.
Reviewed By: ychen, asbirlea
Differential Revision: https://reviews.llvm.org/D87869
Refactor exit block creation to a single call ensureEarlyExitBlock.
Add support for generating an early exit block which clears the
exec mask, but only add this instruction when required.
These changes are to facilitate adding more forms of early
termination for PS shaders in the near future.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D88775
When unbundling COPY bundles in VirtRegRewriter the start of the
bundle is not correctly referenced in the unbundling loop.
The effect of this is that unbundled instructions are sometimes
inserted out-of-order, particular in cases where multiple
reordering have been applied to avoid clobbering dependencies.
The resulting instruction sequence clobbers dependencies.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D88821
Register context information was already being passed into the DWARFDebugFrame code that dumps unwind information but it wasn't being used. This change adds the ability to dump registers names of a valid MC register context was passed in and if it knows about the register. Updated the tests to use the newly returned register names.
Differential Revision: https://reviews.llvm.org/D88767
This and its friend X86ISD::LCMPXCHG8_SAVE_RBX_DAG are used if we need to avoid clobbering the frame pointer in EBX/RBX. EBX/RBX are only used a frame pointer in 64-bit mode. In 64-bit mode we don't use CMPXCHG8B since we have a GR64 cmpxchg available. So we don't need special handling for LCMPXCHG8B.
Split from D88808
Differential Revision: https://reviews.llvm.org/D88853
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.
Differential Revision: https://reviews.llvm.org/D88063
This reverts commit 20797989ea.
This patch (https://reviews.llvm.org/D69257) cannot complete a stage2
build due to the change:
```
CI->getCalledFunction()->getName().contains("longjmp")
```
There are several concrete issues here:
- The callee may not be a function, so `getCalledFunction` can assert.
- The called value may not have a name, so `getName` can assert.
- There's no distinction made between "my_longjmp_test_helper" and the
actual longjmp libcall.
At a higher level, there's a serious layering problem here. The
splitting pass makes policy decisions in a general way (e.g. based on
attributes or profile data). Special-casing certain names breaks the
layering. It subverts the work of library maintainers (who may now need
to opt-out of unexpected optimization behavior for any affected
functions) and can lead to inconsistent optimization behavior (as not
all llvm passes special-case ".*longjmp.*" in the same way).
The patch may need significant revision to address these issues.
But the immediate issue is that this crashes while compiling llvm's unit
tests in a stage2 build (due to the `getName` problem).
(it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html)
This canonicalization seems dubious.
Most importantly, while it does not create `inttoptr` casts by itself,
it may cause them to appear later, see e.g. D88788.
I think it's pretty obvious that it is an undesirable outcome,
by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts
are not no-op, and are no longer eager to look past them.
Which e.g. means that given
```
%a = load i32
%b = inttoptr %a
%c = inttoptr %a
```
we likely won't be able to tell that `%b` and `%c` is the same thing.
As we can see in D88789 / D88788 / D88806 / D75505,
we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least)
And we can't recover the situation post-inlining in instcombine.
So it really does look like this fold is actively breaking
otherwise-good IR, in a way that is not recoverable.
And that means, this fold isn't helpful in exposing the passes
that are otherwise unaware of these patterns it produces.
Thusly, i propose to simply not perform such a canonicalization.
The original motivational RFC does not state what larger problem
that canonicalization was trying to solve, so i'm not sure
how this plays out in the larger picture.
On vanilla llvm test-suite + RawSpeed, this results in
increase of asm instructions and final object size by ~+0.05%
decreases final count of bitcasts by -4.79% (-28990),
ptrtoint casts by -15.41% (-3423),
and of inttoptr casts by -25.59% (-6919, *sic*).
Overall, there's -0.04% less IR blocks, -0.39% instructions.
See https://bugs.llvm.org/show_bug.cgi?id=47592
Differential Revision: https://reviews.llvm.org/D88789
... by using MapVector. The issue was caused by 63182c2ac0.
Also use stable_partition instead of partition to get stable results
across different STL implementations.
When retrying the "simplify with operand replaced" select
optimization without poison flags, also handle inbounds on GEPs.
Of course, this particular example would also be safe to transform
while keeping inbounds, but the underlying machinery does not
know this (yet).
Continuing from D88499, we can now model the normalization function as a
virtual member of VirtRegAuxInfo. Note that the default
(normalizeSpillWeight) is also used stand-alone in RAGreedy.
Differential Revision: https://reviews.llvm.org/D88713
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MSTORE I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.
I have added a CHECK line to the following test:
CodeGen/AArch64/sve-split-store.ll
that ensures no new warnings are added.
Differential Revision: https://reviews.llvm.org/D86928
If a CSEMIRBuilder query hits the instruction at the current insert point,
move insert point ahead one so that subsequent uses of the builder don't end up with
uses before defs.
This fix also shows that AMDGPU was also affected by this bug often, but got away
with it because it was using a G_IMPLICIT_DEF before the use.
Differential Revision: https://reviews.llvm.org/D88605
Use m_Specific instead of m_Value followed by an equality check - we already do this for the similar folds above, it looks like an oversight in rG2b459fe7e1e where the original pattern match code looked a little different.
This reverts partial of a2fb5446 (actually, 2508ef01) about removing
negated FP constant immediately if it has no uses. However, as discussed
in bug 47517, there're cases when NegX is folded into constant from
other places while NegY is removed by that line of code and NegX is
equal to NegY. In these cases, NegX is deleted before used and crash
happens. So revert the code and add necessary test case.
This reverts commit a3caf7f610.
The ReleaseLTO-g test-suite configuration has been failing
to build since this commit, because clang segfaults while
building 7zip.
Support VRI, VRR, VRS, VRV, VRX, VSI instruction formats with the .insn
directive.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D88357
This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV.
Differential Revision: https://reviews.llvm.org/D87836
Update the code responsible for deleting VPBBs and recipes to properly
update users and release operands.
This is another preparation for D84680 & following patches towards
enabling modeling def-use chains in VPlan.
Use tablegen generic tables to get the index of image intrinsic
arguments.
Before, the computation of which image intrinsic argument is at which
index was scattered in a few places, tablegen, the SDag instruction
selection and GlobalISel. This patch changes that, so only tablegen
contains code to compute indices and the ImageDimIntrinsicInfo table
provides these information.
Differential Revision: https://reviews.llvm.org/D86270