Use getAPIntValue() directly - this is mainly a best practice style issue to help prevent fuzz tests blowing up when a i12345 (or whatever) is generated.
Use getConstantOperandVal/getConstantOperandAPInt wrappers where possible.
llvm-svn: 371315
```
.type foo,@gnu_indirect_function
.set foo,foo_resolver
.set foo2,foo
.set foo3,foo2
```
The types of foo2 and foo3 should be STT_GNU_IFUNC, but we currently
resolve them to the type of foo_resolver. This patch fixes it.
Differential Revision: https://reviews.llvm.org/D67206
Patch by Senran Zhang
llvm-svn: 371312
Summary:
Normally TargetLowering::expandFixedPointMul would handle
SMULFIXSAT with scale zero by using an SMULO to compute the
product and determine if saturation is needed (if overflow
happened). But if SMULO isn't custom/legal it falls through
and uses the same technique, using MULHS/SMUL_LOHI, as used
for non-zero scales.
Problem was that when checking for overflow (handling saturation)
when not using MULO we did not expect to find a zero scale. So
we ended up in an assertion when doing
APInt::getLowBitsSet(VTSize, Scale - 1)
This patch fixes the problem by adding a new special case for
how saturation is computed when scale is zero.
Reviewers: RKSimon, bevinh, leonardchan, spatel
Reviewed By: RKSimon
Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67071
llvm-svn: 371309
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.
This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.
Patch by: leonardchan, bjope
Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel
Reviewed By: leonardchan
Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57836
llvm-svn: 371308
Fix for https://bugs.llvm.org/show_bug.cgi?id=43230.
When creating PSHUFLW from a repeated shuffle mask, we have to apply
the checks to the repeated mask, not the original one. For the test
case from PR43230 the inspected part of the original mask is all undef.
Differential Revision: https://reviews.llvm.org/D67314
llvm-svn: 371307
This addresses the issue mentioned on D19867. When we simplify
with.overflow instructions in CVP, we leave behind extractvalue
of insertvalue sequences that LVI no longer understands. This
means that we can not simplify any instructions based on the
with.overflow anymore (until some over pass like InstCombine
cleans them up).
This patch extends LVI extractvalue handling by calling
SimplifyExtractValueInst (which doesn't do anything more than
constant folding + looking through insertvalue) and using the block
value of the simplification.
A possible alternative would be to do something similar to
SimplifyIndVars, where we instead directly try to replace
extractvalue users of the with.overflow. This would need some
additional structural changes to CVP, as it's currently not legal
to remove anything but the current instruction -- we'd have to
introduce a worklist with instructions scheduled for deletion or similar.
Differential Revision: https://reviews.llvm.org/D67035
llvm-svn: 371306
Summary:
The value operand in DW_OP_plus_uconst/DW_OP_constu value can be
large (it uses uint64_t as representation internally in LLVM).
This means that in the uint64_t to int conversions, previously done
by DwarfExpression::addMachineRegExpression, could lose information.
Also, the negation done in "-Offset" was undefined behavior in case
Offset was exactly INT_MIN.
To avoid the above problems, we now avoid transformation like
[Reg, DW_OP_plus_uconst, Offset] --> [DW_OP_breg, Offset]
and
[Reg, DW_OP_constu, Offset, DW_OP_plus] --> [DW_OP_breg, Offset]
when Offset > INT_MAX.
And we avoid to transform
[Reg, DW_OP_constu, Offset, DW_OP_minus] --> [DW_OP_breg,-Offset]
when Offset > INT_MAX+1.
The patch also adjusts DwarfCompileUnit::constructVariableDIEImpl
to make sure that "DW_OP_constu, Offset, DW_OP_minus" is used
instead of "DW_OP_plus_uconst, Offset" when creating DIExpressions
with negative frame index offsets.
Notice that this might just be the tip of the iceberg. There
are lots of fishy handling related to these constants. I think both
DIExpression::appendOffset and DIExpression::extractIfOffset may
trigger undefined behavior for certain values.
Reviewers: sdesmalen, rnk, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: jholewinski, aprantl, hiraditya, ychen, uabelho, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67263
llvm-svn: 371304
Summary:
This patch introduces initial `AAValueSimplify` which simplifies a value in a context.
example
- (for function returned) If all the return values are the same and constant, then we can replace callsite returned with the constant.
- If an internal function takes the same value(constant) as an argument in the callsite, then we can replace the argument with that constant.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66967
llvm-svn: 371291
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
llvm-svn: 371284
version after r371273.
Also fix a minor issue in r371273 that only surfaced after template
instantiation from LLVM's use of the demangler.
llvm-svn: 371274
Despite the fact that the localizer's original motivation was to fix horrendous
constant spilling at -O0, shortening live ranges still has net benefits even
with optimizations enabled.
On an -Os build of CTMark, doing this improves code size by 0.5% geomean.
There are a few regressions, bullet increasing in size by 0.5%. One example from
bullet where code size increased slightly was due to GlobalISel actually now
generating the same code as SelectionDAG. So we actually have an opportunity
in future to implement better heuristics for localization and therefore be
*better* than SDAG in some cases. In relation to other optimizations though that
one is relatively minor.
Differential Revision: https://reviews.llvm.org/D67303
llvm-svn: 371266
Add the new method `LibCallSimplifier::substituteInParent()` that calls
`LibCallSimplifier::replaceAllUsesWith()' and
`LibCallSimplifier::eraseFromParent()` back to back, simplifying the
resulting code.
llvm-svn: 371264
Summary:
There's an unspoken invariant of callbr that the list of BlockAddress
Constants in the "function args" list match the BasicBlocks in the
"other labels" list. (This invariant is being added to the LangRef in
https://reviews.llvm.org/D67196).
When modifying the any of the indirect destinations of a callbr
instruction (possible jump targets), we need to update the function
arguments if the argument is a BlockAddress whose BasicBlock refers to
the indirect destination BasicBlock being replaced. Otherwise, many
transforms that modify successors will end up violating that invariant.
A recent change to the arm64 Linux kernel exposed this bug, which
prevents the kernel from booting.
I considered maintaining a mapping from indirect destination BasicBlock
to argument operand BlockAddress, but this ends up being a one to
potentially many (though usually one) mapping. Also, the list of
arguments to a function (or more typically inline assembly) ends up
being less than 10. The implementation is significantly simpler to just
rescan the full list of arguments. Because of the one to potentially
many relationship, the full arg list must be scanned (we can't stop at
the first instance).
Thanks to the following folks that reported the issue and helped debug
it:
* Nathan Chancellor
* Will Deacon
* Andrew Murray
* Craig Topper
Link: https://bugs.llvm.org/show_bug.cgi?id=43222
Link: https://github.com/ClangBuiltLinux/linux/issues/649
Link: https://lists.infradead.org/pipermail/linux-arm-kernel/2019-September/678330.html
Reviewers: craig.topper, chandlerc
Reviewed By: craig.topper
Subscribers: void, javed.absar, kristof.beyls, hiraditya, llvm-commits, nathanchance, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67252
llvm-svn: 371262
We can use a MOVSX16 here then rely on FixupBWInst to change to
MOVSX32 if the upper bits are dead. With a special case to
not promote if it could be turned into CBW.
Then we can rely on X86MCInstLower to turn the MOVSX into CBW
very late if register allocation worked out.
Using MOVSX gives an opportunity to use the MOVSX as a both a
copy and a sign extend since the input and output register aren't
tied together.
Differential Revision: https://reviews.llvm.org/D67192
llvm-svn: 371243
We can rely on X86FixupBWInsts to turn these into MOVZX32. This
simplifies a follow up commit to use MOVSX for i8 sdivrem with
a late optimization to use CBW when register allocation works out.
llvm-svn: 371242
In order to keep remarks around, we need to make them tied to a string
table.
Users then can delete the parser and rely on the string table to keep
the memory of the strings alive and deduplicated.
llvm-svn: 371233
-tailcallopt requires that we perform different stack adjustments than with
sibling calls. For example, the `@caller_to0_from8` function in
test/CodeGen/AArch64/tail-call.ll requires that we adjust SP. Without
-tailcallopt, this adjustment does not happen. With it, however, it is expected.
So, to ensure that adding sibling call support doesn't break -tailcallopt,
make CallLowering always fall back on possible tail calls when -tailcallopt
is passed in.
Update test/CodeGen/AArch64/tail-call.ll with a GlobalISel line to make sure
that we don't differ from the SDAG implementation at any point.
Differential Revision: https://reviews.llvm.org/D67245
llvm-svn: 371227
This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together.
This is useful for various MVE instructions, such as vmla and others that take R registers.
Loop tests have been added to the vmla test file to make sure vmlas are generated in loops.
Differential revision: https://reviews.llvm.org/D66295
llvm-svn: 371218
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67267
llvm-svn: 371212
Summary:
This was discovered while introducing the llvm::Align type.
The original setMinFunctionAlignment used to take alignment as log2, looking at the comment it seems like instructions are to be 2-bytes aligned and not 4-bytes aligned.
Reviewers: uweigand
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67271
llvm-svn: 371204
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67229
llvm-svn: 371200
This patch allows the DFAPacketizer to be queried after a packet is formed to work out which
resources were allocated to the packetized instructions.
This is particularly important for targets that do their own bundle packing - it's not
sufficient to know simply that instructions can share a packet; which slots are used is
also required for encoding.
This extends the emitter to emit a side-table containing resource usage diffs for each
state transition. The packetizer maintains a set of all possible resource states in its
current state. After packetization is complete, all remaining resource states are
possible packetization strategies.
The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default
(most uses of the packetizer like MachinePipeliner don't care and don't need the extra
maintained state).
Differential Revision: https://reviews.llvm.org/D66936
llvm-svn: 371198
If a stack spill location is overwritten by another spill instruction,
any variable locations pointing at that slot should be terminated. We
cannot rely on spills always being restored to registers or variable
locations being moved by a DBG_VALUE: the register allocator is entitled
to spill a value and then forget about it when it goes out of liveness.
To address this, scan for memory writes to spill locations, even those we
don't consider to be normal "spills". isSpillInstruction and
isLocationSpill distinguish the two now. After identifying spill
overwrites, terminate the open range, and insert a $noreg DBG_VALUE for
that variable.
Differential Revision: https://reviews.llvm.org/D66941
llvm-svn: 371193
Summary:
This fixes poor scheduling in a function containing a barrier and a few
load instructions.
Without this fix, ScheduleDAGInstrs::buildSchedGraph adds an artificial
edge in the dependency graph from the barrier instruction to the exit
node representing live-out latency, with a latency of about 500 cycles.
Because of this it thinks the critical path through the graph also has
a latency of about 500 cycles. And because of that it does not think
that any of the load instructions are on the critical path, so it
schedules them with no regard for their (80 cycle) latency, which gives
poor results.
Reviewers: arsenm, dstuttard, tpr, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67218
llvm-svn: 371192
`struct Elf*_Shdr` has a field `sh_offset`, named `ShOffset` in
llvm::ELFYAML::Section. Rename SHOffset (e_shoff) to SHOff to prevent confusion.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D67254
llvm-svn: 371185
The MVE and LOB extensions of Armv8.1m can be combined to enable
'tail predication' which removes the need for a scalar remainder
loop after vectorization. Lane predication is performed implicitly
via a system register. The effects of predication is described in
Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points
being:
- For vector operations that perform reduction across the vector and
produce a scalar result, whether the value is accumulated or not.
- For non-load instructions, the predicate flags determine if the
destination register byte is updated with the new value or if the
previous value is preserved.
- For vector store instructions, whether the store occurs or not.
- For vector load instructions, whether the value that is loaded or
whether zeros are written to that element of the destination
register.
This patch implements a pass that takes a hardware loop, containing
masked vector instructions, and converts it something that resembles
an MVE tail predicated loop. Currently, if we had code generation,
we'd generate a loop in which the VCTP would generate the predicate
and VPST would then setup the value of VPR.PO. The loads and stores
would be placed in VPT blocks so this is not tail predication, but
normal VPT predication with the predicate based upon a element
counting induction variable. Further work needs to be done to finally
produce a true tail predicated loop.
Because only the loads and stores are predicated, in both the LLVM IR
and MIR level, we will restrict support to only lane-wise operations
(no horizontal reductions). We will perform a final check on MIR
during loop finalisation too.
Another restriction, specific to MVE, is that all the vector
instructions need operate on the same number of elements. This is
because predication is performed at the byte level and this is set
on entry to the loop, or by the VCTP instead.
Differential Revision: https://reviews.llvm.org/D65884
llvm-svn: 371179
Summary:
Fix a bug of not update the jump table and recommit it again.
In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun.
But the `early-ret` pass is before `block-placement`, we don't want to run it again.
This patch is to do the simple early return to optimize the blocks at the last of `block-placement`.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D63972
llvm-svn: 371177
Summary: It says [[ http://www.sco.com/developers/gabi/latest/ch4.eheader.html | here ]] that if there are no program headers than e_phoff should be 0, but currently it is always set after the header. GNU's `readelf` (but not `llvm-readelf`) complains about this: `readelf: Warning: possibly corrupt ELF header - it has a non-zero program header offset, but no program headers`.
Reviewers: jhenderson, grimar, MaskRay, rupprecht
Reviewed By: jhenderson, grimar, MaskRay
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67054
llvm-svn: 371162
Passing INT64_MIN to MCInstPrinter::formatHex triggers undefined
behavior because the negation of -9223372036854775808 cannot be
represented in type 'int64_t' (aka 'long long'). This patch puts a
workaround in place to just print the hex value directly.
A possible alternative involves using a small helper functions that uses
(implementation) defined conversions to achieve the desirable value:
static int64_t helper(int64_t V) {
auto U = static_cast<uint64_t>(V);
return V < 0 ? -U : U;
}
The underlying problem is that MCInstPrinter::formatHex(int64_t) returns
a format_object<int64_t> and should really return a
format_object<uint64_t>. However, that's not possible because formatImm
needs to be able to print both as decimal (where a signed is required)
and hex (where we'd prefer to always have an unsigned).
format_object<int64_t> formatImm(int64_t Value) const {
return PrintImmHex ? formatHex(Value) : formatDec(Value);
}
Differential revision: https://reviews.llvm.org/D67236
llvm-svn: 371159
See http://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html
and D60242 for the lld partition feature.
This patch:
* Teaches yaml2obj to parse the 3 section types.
* Teaches llvm-readobj/llvm-readelf to dump the 3 section types.
There is no test for SHT_LLVM_DEPENDENT_LIBRARIES in llvm-readobj. Add
it as well.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D67228
llvm-svn: 371157
The same stack is loaded for each workitem ID, and each use. Nothing
prevents you from creating multiple fixed stack objects with the same
offsets, so this was creating a load for each unique frame index,
despite them being the same offset. Re-use the same frame index so the
loads are CSEable.
llvm-svn: 371148
Summary:
Here we try to avoid issues with "explicit branch" with SimplifyBranchOnICmpChain
which can check on undef. Msan by design reports branches on uninitialized
memory and undefs, so we have false report here.
In general msan does not like when we convert
```
// If at least one of them is true we can MSAN is ok if another is undefs
if (a || b)
return;
```
into
```
// If 'a' is undef MSAN will complain even if 'b' is true
if (a)
return;
if (b)
return;
```
Example
Before optimization we had something like this:
```
while (true) {
bool maybe_undef = doStuff();
while (true) {
char c = getChar();
if (c != 10 && c != 13)
continue
break;
}
// we know that c == 10 || c == 13 if we get here,
// so msan know that branch is not affected by maybe_undef
if (maybe_undef || c == 10 || c == 13)
continue;
return;
}
```
SimplifyBranchOnICmpChain will convert that into
```
while (true) {
bool maybe_undef = doStuff();
while (true) {
char c = getChar();
if (c != 10 && c != 13)
continue;
break;
}
// however msan will complain here:
if (maybe_undef)
continue;
// we know that c == 10 || c == 13, so either way we will get continue
switch(c) {
case 10: continue;
case 13: continue;
}
return;
}
```
Reviewers: eugenis, efriedma
Reviewed By: eugenis, efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67205
llvm-svn: 371138
Approximately 30% of the time was spent in the std::vector
constructor. In one testcase this pushes the scheduler to being the
second slowest pass.
I'm not sure I understand why these vector are necessary. The default
scheduler initCandidate seems to use some pre-existing vectors for the
pressure.
llvm-svn: 371136
This patch reuses the MIR vreg renamer from the MIRCanonicalizerPass to cleanup
names of vregs in a MIR file for MIR test authors. I found it useful when
writing a regression test for a globalisel failure I encountered recently and
thought it might be useful for other folks as well.
Differential Revision: https://reviews.llvm.org/D67209
llvm-svn: 371121
Now that we look through copies, it's possible to visit registers that
have a register class constraint but not a type constraint. Avoid looking
through copies when this occurs as the SrcReg won't be able to determine
it's bit width or any known bits.
Along the same lines, if the initial query is on a register that doesn't
have a type constraint then the result is a default-constructed KnownBits,
that is, a 1-bit fully-unknown value.
llvm-svn: 371116
Recommit basic sibling call lowering (https://reviews.llvm.org/D67189)
The issue was that if you have a return type other than void, call lowering
will emit COPYs to get the return value after the call.
Disallow sibling calls other than ones that return void for now. Also
proactively disable swifterror tail calls for now, since there's a similar issue
with COPYs there.
Update call-translator-tail-call.ll to include test cases for each of these
things.
llvm-svn: 371114
The code was incorrectly counting the number of identical instructions,
and therefore tried to predicate an instruction which should not have
been predicated. This could have various effects: a compiler crash,
an assembler failure, a miscompile, or just generating an extra,
unnecessary instruction.
Instead of depending on TargetInstrInfo::removeBranch, which only
works on analyzable branches, just remove all branch instructions.
Fixes https://bugs.llvm.org/show_bug.cgi?id=43121 and
https://bugs.llvm.org/show_bug.cgi?id=41121 .
Differential Revision: https://reviews.llvm.org/D67203
llvm-svn: 371111
As noted in PR43197, we can use test+add+cmov+sra to implement
signed division by a power of 2.
This is based off the similar version in AArch64, but I've
adjusted it to use target independent nodes where AArch64 uses
target specific CMP and CSEL nodes. I've also blocked INT_MIN
as the transform isn't valid for that.
I've limited this to i32 and i64 on 64-bit targets for now and only
when CMOV is supported. i8 and i16 need further investigation to be
sure they get promoted to i32 well.
I adjusted a few tests to enable cmov to demonstrate the new
codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode
without cmov to avoid perturbing the scenario that is being
set up there.
Differential Revision: https://reviews.llvm.org/D67087
llvm-svn: 371104
A follow-up for r329011.
This may be changed to produce @llvm.sub.with.overflow in a later patch,
but for now just make things more consistent overall.
A few observations stem from this:
* There does not seem to be a similar one-instruction fold for uadd-overflow
* I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`,
so since the `icmp` here no longer refers to `sub`,
reconstructing `usub.with.overflow` will be problematic,
and will likely require standalone pass (similar to DivRemPairs).
https://rise4fun.com/Alive/Zqs
Name: (A - B) u> A --> B u> A
%t0 = sub i8 %A, %B
%r = icmp ugt i8 %t0, %A
=>
%r = icmp ugt i8 %B, %A
Name: (A - B) u<= A --> B u<= A
%t0 = sub i8 %A, %B
%r = icmp ule i8 %t0, %A
=>
%r = icmp ule i8 %B, %A
Name: C u< (C - D) --> C u< D
%t0 = sub i8 %C, %D
%r = icmp ult i8 %C, %t0
=>
%r = icmp ult i8 %C, %D
Name: C u>= (C - D) --> C u>= D
%t0 = sub i8 %C, %D
%r = icmp uge i8 %C, %t0
=>
%r = icmp uge i8 %C, %D
llvm-svn: 371101
If we have:
bb5:
br i1 %arg3, label %bb6, label %bb7
bb6:
%tmp = getelementptr inbounds i32, i32* %arg1, i64 2
store i32 3, i32* %tmp, align 4
br label %bb9
bb7:
%tmp8 = getelementptr inbounds i32, i32* %arg1, i64 2
store i32 3, i32* %tmp8, align 4
br label %bb9
bb9: ; preds = %bb4, %bb6, %bb7
...
We can't sink stores directly into bb9.
This patch creates new BB that is successor of %bb6 and %bb7
and sinks stores into that block.
SplitFooterBB is the parameter to the pass that controls
that behavior.
Change-Id: I7fdf50a772b84633e4b1b860e905bf7e3e29940f
Differential: https://reviews.llvm.org/D66234
llvm-svn: 371089
Summary:
Avoid visiting an instruction more than once by using a map.
This is similar to https://reviews.llvm.org/rL361416.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67198
llvm-svn: 371086
A number of inline assembly constraints are currently supported by LLVM, but rejected as invalid by Clang:
Target independent constraints:
s: An integer constant, but allowing only relocatable values
ARM specific constraints:
j: An immediate integer between 0 and 65535 (valid for MOVW)
x: A 32, 64, or 128-bit floating-point/SIMD register: s0-s15, d0-d7, or q0-q3
N: An immediate integer between 0 and 31 (Thumb1 only)
O: An immediate integer which is a multiple of 4 between -508 and 508. (Thumb1 only)
This patch adds support to Clang for the missing constraints along with some checks to ensure that the constraints are used with the correct target and Thumb mode, and that immediates are within valid ranges (at least where possible). The constraints are already implemented in LLVM, but just a couple of minor corrections to checks (V8M Baseline includes MOVW so should work with 'j', 'N' and 'O' shouldn't be valid in Thumb2) so that Clang and LLVM are in line with each other and the documentation.
Differential Revision: https://reviews.llvm.org/D65863
Change-Id: I18076619e319bac35fbb60f590c069145c9d9a0a
llvm-svn: 371079
As discussed on D64551 and PR43227, we don't correctly handle cases where the base load has a non-zero byte offset.
Until we can properly handle this, we must bail from EltsFromConsecutiveLoads.
llvm-svn: 371078
Linkers (ld.bfd/gold/lld) place the section header table at the very
end. This allows tools to strip it, which is optional in executable/shared objects.
In addition, if we add or section, the size of the section header table
will change. Placing the section header table in the end keeps section
offsets unchanged.
yaml2obj currently places the section header table immediately after the
program header. Follow what linkers do to make offset updating easier.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D67221
llvm-svn: 371074
This attempts to just fix the creation of VPT blocks, fixing up the iterating,
which instructions are considered in the bundle, and making sure that we do not
overrun the end of the block.
Differential Revision: https://reviews.llvm.org/D67219
llvm-svn: 371064
G_FENCE comes form fence instruction. For MIPS fence is generated in
AtomicExpandPass when atomic instruction gets surrounded with fence
instruction when needed.
G_FENCE arguments don't have LLT, because of that there is no job for
legalizer and regbankselect. Instruction select G_FENCE for MIPS32.
Differential Revision: https://reviews.llvm.org/D67181
llvm-svn: 371056
Select G_INTRINSIC_W_SIDE_EFFECTS for Intrinsic::trap for MIPS32
via legalizeIntrinsic.
Differential Revision: https://reviews.llvm.org/D67180
llvm-svn: 371055
Instead of returning structure by value clang usually adds pointer
to that structure as an argument. Pointers don't require special
handling no matter the SRet flag. Remove unsuccessful exit from
lowerCall for arguments with SRet flag if they are pointers.
Differential Revision: https://reviews.llvm.org/D67179
llvm-svn: 371054
This adds support for basic sibling call lowering in AArch64. The intent here is
to only handle tail calls which do not change the ABI (hence, sibling calls.)
At this point, it is very restricted. It does not handle
- Vararg calls.
- Calls with outgoing arguments.
- Calls whose calling conventions differ from the caller's calling convention.
- Tail/sibling calls with BTI enabled.
This patch adds
- `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent
to the same function in AArch64ISelLowering.cpp (albeit with the restrictions
above.)
- `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in
AArch64ISelLowering.cpp.
- `getCallOpcode`, which is exactly what it sounds like.
Tail/sibling calls are lowered by checking if they pass target-independent tail
call positioning checks, and checking if they satisfy
`isEligibleForTailCallOptimization`. If they do, then a tail call instruction is
emitted instead of a normal call. If we have a sibling call (which is always the
case in this patch), then we do not emit any stack adjustment operations. When
we go to lower a return, we check if we've already emitted a tail call. If so,
then we skip the return lowering.
For testing, this patch
- Adds call-translator-tail-call.ll to test which tail calls we currently lower,
which ones we don't, and which ones we shouldn't.
- Updates branch-target-enforcement-indirect-calls.ll to show that we fall back
as expected.
Differential Revision: https://reviews.llvm.org/D67189
........
This fails on EXPENSIVE_CHECKS builds due to a -verify-machineinstrs test failure in CodeGen/AArch64/dllimport.ll
llvm-svn: 371051
Handle the remaining cases also by handling asm goto in
SystemZInstrInfo::getBranchInfo().
Review: Ulrich Weigand
https://reviews.llvm.org/D67151
llvm-svn: 371048
Fixes clang static-analyzer warning.
Technically the MachineInstr *Sub might still be null if we're comparing zero (IsCmpZero == true), although this probably won't happen as SrcReg2 is probably == 0.
llvm-svn: 371047
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:
- `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
- `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
- `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
Reviewers: lattner, thegameg, courbet
Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65945
llvm-svn: 371045
-ftime-trace could break flame-graph assumptions on Windows, with an
inner scope overrunning outer scopes. This was due to the way that times
were truncated. Changed this so time_points for the flame-graph are
truncated instead of durations, preserving the relative order of event
starts and ends.
I have tried to retain the extra precision for the totals, which count
thousands or millions of events.
Added assert to check this property holds in future.
Fixes PR43043
Differential Revision: https://reviews.llvm.org/D66411
llvm-svn: 371039
After r361885, realPathFromHandle() ends up getting called on the working
directory on each Clang invocation. This unveiled that the code didn't work for
paths on network shares.
For example, if one maps the local dir c:\src\tmp to x:
net use x: \\localhost\c$\tmp
and run e.g. "clang -c foo.cc" in x:\, realPathFromHandle will get
\\?\UNC\localhost\c$\src\tmp\ back from GetFinalPathNameByHandleW, and would
strip off the initial \\?\ prefix, ending up with a path that doesn't work.
This patch makes the prefix stripping a little smarter to handle this case.
Differential revision: https://reviews.llvm.org/D67166
llvm-svn: 371035
In D62809 I accidentally added "ELFState<ELFT> &State" as the
first parameter to two methods. There is no reason for having that.
I removed this argument and also moved finalizeStrings declaration to
remove an excessive 'private:' tag.
Differential revision: https://reviews.llvm.org/D67157
llvm-svn: 371033
Fix: added missing return "return 0;"
Original commit message:
This eliminates one of the error(1) call in this lib.
It is different from the others because happens on a fields mapping stage
and can be easily fixed.
Differential revision: https://reviews.llvm.org/D67150
llvm-svn: 371030
This eliminates one of the error(1) call in this lib.
It is different from the others because happens on a fields mapping stage
and can be easily fixed.
Differential revision: https://reviews.llvm.org/D67150
llvm-svn: 371023
As DW_AT_rnglists_base points after the header and headers have
different sizes for DWARF32 and DWARF64, we have to use the format
of the CU to adjust the offset correctly in order to extract
the referenced range list table.
The patch also changes the type of RangeSectionBase because in DWARF64
it is 8-bytes long.
Differential Revision: https://reviews.llvm.org/D67098
llvm-svn: 371016
This adds support for basic sibling call lowering in AArch64. The intent here is
to only handle tail calls which do not change the ABI (hence, sibling calls.)
At this point, it is very restricted. It does not handle
- Vararg calls.
- Calls with outgoing arguments.
- Calls whose calling conventions differ from the caller's calling convention.
- Tail/sibling calls with BTI enabled.
This patch adds
- `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent
to the same function in AArch64ISelLowering.cpp (albeit with the restrictions
above.)
- `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in
AArch64ISelLowering.cpp.
- `getCallOpcode`, which is exactly what it sounds like.
Tail/sibling calls are lowered by checking if they pass target-independent tail
call positioning checks, and checking if they satisfy
`isEligibleForTailCallOptimization`. If they do, then a tail call instruction is
emitted instead of a normal call. If we have a sibling call (which is always the
case in this patch), then we do not emit any stack adjustment operations. When
we go to lower a return, we check if we've already emitted a tail call. If so,
then we skip the return lowering.
For testing, this patch
- Adds call-translator-tail-call.ll to test which tail calls we currently lower,
which ones we don't, and which ones we shouldn't.
- Updates branch-target-enforcement-indirect-calls.ll to show that we fall back
as expected.
Differential Revision: https://reviews.llvm.org/D67189
llvm-svn: 370996
Moving MIRCanonicalizerPass vreg renaming code to MIRVRegNamerUtils so that it
can be reused in another pass (ie planing to write a standalone mir-namer pass).
I'm going to write a mir-namer pass so that next time someone has to author a
test in MIR, they can use it to cleanup the naming and make it more readable by
having the numbered vregs swapped out with named vregs.
Differential Revision: https://reviews.llvm.org/D67114
llvm-svn: 370985
In mingw environments, resources are normally compiled to resource
object files directly, instead of letting the linker convert them to
COFF format.
Since some time, GCC supports the notion of a default manifest object.
When invoking the linker, GCC looks for the default manifest object
file, and if found in the expected path, it is added to linker commands.
The default manifest is one that indicates support for the latest known
versions of windows, to implicitly unlock the modern behaviours of certain
APIs.
Not all mingw/gcc distributions include this file, but e.g. in msys2,
the default manifest object is distributed in a separate package (which
can be but might not always be installed).
This means that even if user projects only use one single resource
object file, the linker can end up with two resource object files,
and thus needs to support merging them.
The default manifest has a language id of zero, and GNU ld has got
logic for dropping a manifest with a zero language id, if there's
another manifest present with a nonzero language id. If there are
multiple manifests with a nonzero language id, the merging process
errors out.
Differential Revision: https://reviews.llvm.org/D66825
llvm-svn: 370974
This patch merges the sancov module and funciton passes into one module pass.
The reason for this is because we ran into an out of memory error when
attempting to run asan fuzzer on some protobufs (pc.cc files). I traced the OOM
error to the destructor of SanitizerCoverage where we only call
appendTo[Compiler]Used which calls appendToUsedList. I'm not sure where precisely
in appendToUsedList causes the OOM, but I am able to confirm that it's calling
this function *repeatedly* that causes the OOM. (I hacked sancov a bit such that
I can still create and destroy a new sancov on every function run, but only call
appendToUsedList after all functions in the module have finished. This passes, but
when I make it such that appendToUsedList is called on every sancov destruction,
we hit OOM.)
I don't think the OOM is from just adding to the SmallSet and SmallVector inside
appendToUsedList since in either case for a given module, they'll have the same
max size. I suspect that when the existing llvm.compiler.used global is erased,
the memory behind it isn't freed. I could be wrong on this though.
This patch works around the OOM issue by just calling appendToUsedList at the
end of every module run instead of function run. The same amount of constants
still get added to llvm.compiler.used, abd we make the pass usage and logic
simpler by not having any inter-pass dependencies.
Differential Revision: https://reviews.llvm.org/D66988
llvm-svn: 370971
This patch adds the ability to encode and decode InlineInfo objects and adds test coverage. Error handling is introduced in the encoding and decoding which will be used from here on out for remaining patches.
Differential Revision: https://reviews.llvm.org/D66600
llvm-svn: 370936
Since an add instruction must produce an unused carry out, this
requires additional SGPRs. This can be avoided by keeping the entire
offset computation in SGPRs. If one SGPR is still available, this only
costs one extra mov. If none are available, the entire computation can
be done in place and reversed.
This does assume the use is a VGPR operand. This was already assumed,
and we currently only select frame indexes to VALU instructions. This
should probably be fixed at some point to handle more possible MIR.
llvm-svn: 370929
Summary:
Instead of building attributes for internal functions which we do not
update as long as we assume they are dead, we now do not create
attributes until we assume the internal function to be live. This
improves the number of required iterations, as well as the number of
required updates, in real code. On our tests, the results are mixed.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66914
llvm-svn: 370924
Summary:
We create attributes on-demand so we need to check the white list
on-demand. This also unifies the location at which we create,
initialize, and eventually invalidate new abstract attributes.
The tests show mixed results, a few more call site attributes are
determined which can cause more iterations.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66913
llvm-svn: 370922
Summary:
Before we tried to rule out non-exact definitions early but that lead to
on-demand attributes created for them anyway. As a consequence we needed
to look at the definition in the initialize of each attribute again.
This patch centralized this lookup and tightens the condition under
which we give up on non-exact definitions.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67115
llvm-svn: 370917
SROA pass processes debug info incorrecly if applied twice.
Specifically, after SROA works first time, instcombine converts dbg.declare
intrinsics into dbg.value. Inlining creates new opportunities for SROA,
so it is called again. This time it does not handle correctly previously
inserted dbg.value intrinsics.
Differential Revision: https://reviews.llvm.org/D64595
llvm-svn: 370906
Apologies, due to a git SNAFU this fix (dump doesn't exist and silence unused variables) stayed in my index rather than applying to rL370893.
llvm-svn: 370894
This is the beginnings of a reimplementation of ModuloScheduleExpander. It works
by generating a single-block correct pipelined kernel and then peeling out the
prolog and epilogs.
This patch implements kernel generation as well as a validator that will
confirm the number of phis added is the same as the ModuloScheduleExpander.
Prolog and epilog peeling will come in a different patch.
Differential Revision: https://reviews.llvm.org/D67081
llvm-svn: 370893
When comparing variable locations, LiveDebugValues currently considers only
the machine location, ignoring any DIExpression applied to it. This is a
problem because that DIExpression can do pretty much anything to the machine
location, for example dereferencing it.
This patch adds DIExpressions to that comparison; now variables based on the
same register/memory-location but with different expressions will compare
differently, and be dropped if we attempt to merge them between blocks. This
reduces variable coverage-range a little, but only because we were producing
broken locations.
Differential Revision: https://reviews.llvm.org/D66942
llvm-svn: 370877
On release builds, 'MI' isn't used by anything (it's already inserted into a
block by BuildMI), while on non-release builds it's used by a LLVM_DEBUG
statement. Mark as explicitly used to avoid the warning.
llvm-svn: 370870
Summary:
While fixing the handling of some error cases, r370363 introduced new
problems -- assertion failures due to unchecked errors (my excuse is that a very
early version of that patch used Optional<T> instead of Expected).
This patch adds proper handling of parsing errors encountered when
dumping location lists from inside DWARF DIEs, and adds a bunch of
additional tests.
I reorder the arguments of the location list dumping functions to make
them consistent, and also be able to dump the two kinds of location
lists generically.
Reviewers: JDevlieghere, dblaikie, probinson
Subscribers: aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67102
llvm-svn: 370868
PT_GNU_STACK is used in an llvm-objcopy test.
I plan to use PT_GNU_RELRO in a patch to improve nested segment
processing in llvm-objcopy (PR42963).
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D67146
llvm-svn: 370857
For any unpaired muls, we accumulate them as an input to the
reduction. Check the type of the mul and perform a sext if the
existing accumlator input type is not the same.
Differential Revision: https://reviews.llvm.org/D66993
llvm-svn: 370851
Summary: Previously module pass printer pass prints the banner even when the module doesn't include any function provided with `-filter-print-funcs` option. This introduced a lot of noise, especailly with ThinLTO. This diff addresses the issue and makes the banner printed only when the module includes functions in `-filter-print-funcs` list.
Reviewers: fedor.sergeev
Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66560
llvm-svn: 370849
Similar to the issue with G_ZEXT that was fixed earlier, this is a quick
to fall back if the source type is not exactly half of the dest type.
Fixes the clang-cmake-aarch64-lld bot build.
llvm-svn: 370847
This reverts r370525 (git commit 0bb1630685)
Also reverts r370543 (git commit 185ddc08ee)
The approach I took only works for functions marked `noreturn`. In
general, a call that is not known to be noreturn may be followed by
unreachable for other reasons. For example, there could be multiple call
sites to a function that throws sometimes, and at some call sites, it is
known to always throw, so it is followed by unreachable. We need to
insert an `int3` in these cases to pacify the Windows unwinder.
I think this probably deserves its own standalone, Win64-only fixup pass
that runs after block placement. Implementing that will take some time,
so let's revert to TrapUnreachable in the mean time.
llvm-svn: 370829
Summary:
This removes all string constants for function names and compares
functions by string directly when needed. Many of these constants are
used only once or twice so the benefit of defining them separately is
not very clear, and this actually fixes a bug.
When we already have a `malloc` declaration which is an alias to
something else within the module,
```
@malloc = weak hidden alias i8* (i32), i8* (i32)* @dlmalloc
```
(this happens compiling with emscripten with `-s WASM_OBJECT_FILES=0`
because all bc files are merged before being fed into `wasm-ld` which
runs the backend optimizations as LTO)
`Module::getFunction("malloc")` in `canLongjmp` returns `nullptr`
because `Module::getFunction` dyncasts pointer into `Function`, but the
alias is a `GlobalValue` but not a `Function`. This makes `canLongjmp`
return false for `malloc` in this case, and we end up adding a lot of
longjmp handling code around malloc. This is not only a code size
increase but actually a bug because `malloc` is used in the entry block
when preparing for setjmp tables for emscripten sjlj handling, and this
makes initial setjmp preparation, which has to happen in the entry
block, move to another split block, and this interferes with SSA update
later.
This also adds two more functions, `getTempRet0` and `setTempRet0`, in
the list of not longjmp-able functions.
Fixes https://github.com/emscripten-core/emscripten/issues/8935.
Reviewers: sbc100
Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, dexonsmith, dschuff, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67129
llvm-svn: 370828
The check needs to validate a counter offset before performing pointer
arithmetic with the (potentially corrupt) offset.
Found by UBSan's pointer overflow check.
rdar://54843625
Differential Revision: https://reviews.llvm.org/D66979
llvm-svn: 370826
When I dug into this, it turns out to be *much* more involved than I'd realized and doesn't actually simplify anything.
The general purpose of the leader table is that we want to find the most-dominating definition quickly. The problem for equivalance folding is slightly different; we want to find the most dominating *value* whose definition block dominates our use quickly.
To make this change, we'd end up having to restructure the leader table (either the sorting thereof, or maybe even introducing multiple leader tables per value) and that complexity is just not worth it.
llvm-svn: 370824
Now that we have the infrastructure to support s128 types as parameters
we can expand these to libcalls.
Differential Revision: https://reviews.llvm.org/D66185
llvm-svn: 370823
On AArch64, s128 types have to be split into s64 GPRs when passed as arguments.
This change adds the generic support in call lowering for dealing with multiple
registers, for incoming and outgoing args.
Support for splitting for return types not yet implemented.
Differential Revision: https://reviews.llvm.org/D66180
llvm-svn: 370822
Add the no-capture argument attribute deduction to the Attributor
fixpoint framework.
The new string attributed "no-capture-maybe-returned" is introduced to
allow deduction of no-capture through functions that "capture" an
argument but only by "returning" it. It is only used by the Attributor
for testing.
Differential Revision: https://reviews.llvm.org/D59922
llvm-svn: 370817
Summary:
Simplify the right shift of the intermediate result (given
in four parts) by using funnel shift.
There are some impact on lit tests, but that seems to be
related to register allocation differences due to how FSHR
is expanded on X86 (giving a slightly different operand order
for the OR operations compared to the old code).
Reviewers: leonardchan, RKSimon, spatel, lebedev.ri
Reviewed By: RKSimon
Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, pzheng, bevinh, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67036
llvm-svn: 370813
These flags should simply be passed through to the target, which will do
the right thing. Add an MC/X86 test that uses these directives with the
three primary object file formats and shows that they disassemble the
same everywhere.
There is a missing test for .code32 on Windows ARM, since I'm not sure
exactly how to construct one.
Fixes PR43203
llvm-svn: 370805
This extends the existing logic for propagating constant expressions in an analogous manner for what we do across basic blocks. The core point is that we chose some order of operands, and canonicalize uses towards that one.
The heuristic used is inspired by the one used across blocks; in a follow up change, I'd plan to common them so that the cross block version uses the slightly stronger ordering herein.
As noted by the TODOs in the code, there's a good amount of room for improving the existing code and making it more powerful. Some follow up work planned.
Differential Revision: https://reviews.llvm.org/D66977
llvm-svn: 370791
This pattern, when imported at -O0 adds an extra copy via the SUBREG_TO_REG.
This is because the SUBREG_TO_REG is not eliminated. At all other opt levels,
it is eliminated.
This is a 1% geomean code size savings at -O0 on CTMark.
Differential Revision: https://reviews.llvm.org/D67027
llvm-svn: 370789
The code here seems to date back to r134705, when tablegen lowering was first
being added. I don't believe that we need to include CPSR implicit operands on
the MCInst. This now works more like other backends (like AArch64), where all
implicit registers are skipped.
This allows the AliasInst for CSEL's to match correctly, as can be seen in the
test changes.
Differential revision: https://reviews.llvm.org/D66703
llvm-svn: 370745
This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can
also be used in ISel Lowering, adding codesize values to the computed costs, to
be able to compare either approximate instruction counts or codesize costs.
It also adds a HasLowerConstantMaterializationCost, which compares the
ConstantMaterializationCost of two values, returning true if the first is
smaller either in instruction count/codesize, or falling back to the other in
the case that they are equal.
This is used in constant CSEL lowering to invert the predicate if the opposite
is easier to materialise.
Differential revision: https://reviews.llvm.org/D66701
llvm-svn: 370741
Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value.
This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used.
Code by Ranjeet Singh and Simon Tatham, with some modifications from me.
Differential revision: https://reviews.llvm.org/D66483
llvm-svn: 370739
Now the last `.section` directive in the MIPS asm file preamble
is the `.section .mdebug.abi`. If assembler code injected for example
by the LLVM `module asm` or the C ` __asm` directives do not contain
explicit switching to the `.text` section it goes to the `.mdebug.abi`
section. It might be unexpected to the user and in fact for example
breaks building some existing code like FreeBSD libc [1].
The patch forces switching to the `.text` section after emitting MIPS
assembler file preamble.
[1] https://bugs.llvm.org/show_bug.cgi?id=43119
Fix PR43119.
Differential Revision: https://reviews.llvm.org/D67014
llvm-svn: 370735
We were using isShiftedInt<7, Shift>(RHSC) to detect the ranges of offsets to
fold into MVE loads/stores. The instructions actually take a 7 bit unsigned
integer which is either added or subtracted. So something more like
isShiftedUInt<7, Shift>(abs(RHSC)).
Instead I've changes this to use the isScaledConstantInRange method, same as in
SelectT2AddrModeImm7Offset used by pre/post inc, which seemed to already be
getting this correct.
Differential revision: https://reviews.llvm.org/D66997
llvm-svn: 370731
Decoding of VMSR doesn't diagnose some unpredictable encodings, as the unpredictable bits are not correctly set.
Diff-reduce this instruction's internals WRT VMRS so I can see the differences better. Mostly this is s/src/Rt/g.
Fill in the "should-be-(0)" bits.
Designate the Unpredictable{} bits for both VMRS and VMSR.
Patch by Mark Murray!
Differential revision: https://reviews.llvm.org/D66938
llvm-svn: 370729
To save a 'add sp,#val' instruction by adding registers to the final pop instruction,
the first register transferred by this pop instruction need to be found.
If the function to be optimized has a non-void return value, the operand list contains
r0 (implicit) which prevents the optimization to take place.
Therefore implicit register references should be skipped in the search loop,
because this registers are never popped from the stack.
Patch by Rainer Herbertz (rOptimizer)!
Differential revision: https://reviews.llvm.org/D66730
llvm-svn: 370728
Summary:
Fold-tail currently supports reduction last-vector-value live-out's,
but has yet to support last-scalar-value live-outs, including
non-header phi's. As it relies on AllowedExit in order to detect
them and bail out we need to add the non-header PHI nodes to
AllowedExit, otherwise we end up with miscompiles.
Solves https://bugs.llvm.org/show_bug.cgi?id=43166
Reviewers: fhahn, Ayal
Reviewed By: fhahn, Ayal
Subscribers: anna, hiraditya, rkruppe, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67074
llvm-svn: 370721
Now that we allow tail-folding, not only when we optimise for size, make
sure we do not run in this assert.
Differential revision: https://reviews.llvm.org/D66932
llvm-svn: 370711
The loop vectorizer was running in an assert when it tried to fold the tail and
had to emit runtime memory disambiguation checks.
Differential revision: https://reviews.llvm.org/D66803
llvm-svn: 370707
Emitting a schedule is really hard. There are lots of corner cases to take care of; in fact, of the 60+ SWP-specific testcases in the Hexagon backend most of those are testing codegen rather than the schedule creation itself.
One issue is that to test an emission corner case we must craft an input such that the generated schedule uses that corner case; sometimes this is very hard and convolutes testcases. Other times it is impossible but we want to test it anyway.
This patch adds a simple test pass that will consume a module containing a loop and generate pipelined code from it. We use post-instr-symbols as a way to annotate instructions with the stage and cycle that we want to schedule them at.
We also provide a flag that causes the MachinePipeliner to generate these annotations instead of actually emitting code; this allows us to generate an input testcase with:
llc < %s -stop-after=pipeliner -pipeliner-annotate-for-testing -o test.mir
And run the emission in isolation with:
llc < test.mir -run-pass=modulo-schedule-test
llvm-svn: 370705
This merges the 32-bit and 64-bit mode code to just use Custom
for both i32 and i64. We already had most of the handling in
the custom handling due to the AVX512 having legal fp_to_uint.
Just needed to add the i32->i64 promotion handling. Refactor
the fp_to_uint code in the custom handler to simplify the
number of times we check things.
Tweak cost model tables to match the default handling we were
getting due to Expand before.
llvm-svn: 370700
Use Custom lowering instead. Fall back to default expansion only
when the scalar FP type belongs in an XMM register. This improves
lowering for i32 to fp80, and also i32 to double on SSE1 only.
llvm-svn: 370699
FP128 values are passed in xmm registers so should be asssociated
with an SSE feature rather than MMX which uses a different set
of registers.
llc enables sse1 and sse2 by default with x86_64. But does not
enable mmx. Clang enables all 3 features by default.
I've tried to add command lines to test with -sse
where possible, but any test that returns a value in an xmm
register fails with a fatal error with -sse since we have no
defined ABI for that scenario.
llvm-svn: 370682
We should be using MQPR, and if we don't we can get COPYs and PHIs created for
QPR. These get folded into instructions, failing verification checks.
Differential revision: https://reviews.llvm.org/D66214
llvm-svn: 370676
Now that constrained fpto[su]i intrinsic are available,
add codegen support to the SystemZ backend.
In addition to pure back-end changes, I've also needed
to add the strict_fp_to_[su]int and any_fp_to_[su]int
pattern fragments in the obvious way.
llvm-svn: 370674
Summary:
Adds the following inline asm constraints for SVE:
- w: SVE vector register with full range, Z0 to Z31
- x: Restricted to registers Z0 to Z15 inclusive.
- y: Restricted to registers Z0 to Z7 inclusive.
This change also adds the "z" modifier to interpret a register as an SVE register.
Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness.
Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened
Reviewed By: sdesmalen
Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66302
llvm-svn: 370673
Fix: add a 'consumeError()' call to ObjectFile.cpp.
This error was never checked.
Original commit message:
It adds a test case for a problem fixed by D66976 <https://reviews.llvm.org/D66976>.
It was introduced by me in D66089 <https://reviews.llvm.org/D66089>.
The error reported was never consumed because of a wrong variable name used,
so it could fail when LLVM_ENABLE_ABI_BREAKING_CHECKS is used.
Differential revision: https://reviews.llvm.org/D67002
llvm-svn: 370669
The motivating bugs are:
https://bugs.llvm.org/show_bug.cgi?id=41340https://bugs.llvm.org/show_bug.cgi?id=42697
As discussed there, we could view this as a failure of IR canonicalization,
but then we would need to implement a backend fixup with target overrides
to get this right in all cases. Instead, we can just view this as a codegen
opportunity. It's not even clear for x86 exactly when we should favor
test+set; some CPUs have better theoretical throughput for the ALU ops than
bt/test.
This patch is made more complicated than I expected because there's an early
DAGCombine for 'and' that can change types of the intermediate ops via
trunc+anyext.
Differential Revision: https://reviews.llvm.org/D66687
llvm-svn: 370668
Summary:
D61491 caused us to use relocs when they're not strictly necessary, to
refer to symbols in the text section. This is a pessimization and it's a
problem for some loaders that don't support relocs yet.
Reviewers: nhaehnle, arsenm, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65813
llvm-svn: 370667
Summary:
Commit r366897 introduced the possibility to set a variable from an
expression, such as [[#VAR2:VAR1+3]]. While introducing this feature, it
introduced extra logic to allow using such a variable on the same line
later on. Unfortunately that extra logic is flawed as it relies on a
mapping from variable to expression defining it when the mapping is from
variable definition to expression. This flaw causes among other issues
PR42896.
This commit avoids the problem by forbidding all use of a variable
defined on the same line, and removes the now useless logic. Redesign
will be done in a later commit because it will require some amount of
refactoring first for the solution to be clean. One example is the need
for some sort of transaction mechanism to set a variable temporarily and
from an expression and rollback if the CHECK pattern does not match so
that diagnostics show the right variable values.
Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk
Subscribers: JonChesterfield, rogfer01, hfinkel, kristina, rnk, tra, arichardson, grimar, dblaikie, probinson, llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66141
llvm-svn: 370663
bitcast <N x i8> (shuf X, undef, <N, N-1,...0>) to i{N*8} --> bswap (bitcast X to i{N*8})
In PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146
...we have a more complicated case where SLP is making a mess of bswap. This patch won't
do anything for that currently, but we need to improve bswap recognition in instcombine,
SLP, and/or a standalone pass to avoid that problem.
This is limited using the data-layout so we don't try to do this transform with actual
vector types. The backend does not appear to have folds to convert in either direction,
so we don't want to mess up something that is actually better lowered as a shuffle.
On x86, we're trading something like this:
vmovd %edi, %xmm0
vpshufb LCPI0_0(%rip), %xmm0, %xmm0 ## xmm0 = xmm0[3,2,1,0,u,u,u,u,u,u,u,u,u,u,u,u]
vmovd %xmm0, %eax
For:
movl %edi, %eax
bswapl %eax
Differential Revision: https://reviews.llvm.org/D66965
llvm-svn: 370659
I don't see GNU dlltool supporting doing this; with only a -d option
and no -l option, GNU dlltool runs successfully but doesn't write any
output file.
Differential Revision: https://reviews.llvm.org/D65645
llvm-svn: 370655
On BtVer2 conditional SIMD stores are heavily microcoded.
The latency is directly proportional to the number of packed elements extracted
from the input vector. Also, according to micro-benchmarks, most of the
computation seems to be done in the integer unit.
Only a minority of the uOPs is executed by the FPU. The observed behaviour on
the FPU looks similar to this:
- The input MASK value is moved to the Integer Unit
-- [ a VMOVMSK-like uOP-executed on JFPU0].
- In parallel, each element of the input XMM/YMM is extracted and then sent to
the IntegerUnit through JFPU1.
As expected, a (conditional) store is executed for every extracted element.
Interestingly, a (speculative) load is executed for every extracted element too.
It is as-if a "LOAD - BIT_EXTRACT- CMOV" sequence of uOPs is repeated by the
integer unit for every contionally stored element.
VMASKMOVDQU is a special case: the number of speculative loads is always 2
(presumably, one load per quadword). That means, extra shifts and masking is
performed on (one of) the loaded quadwords before each conditional store (that
also explains the big number of non-FP uOPs retired).
This patch replaces the existing writes for conditional SIMD stores (i.e.
WriteFMaskedStore, and WriteFMaskedStoreY) with the following new writes:
WriteFMaskedStore32 [ XMM Packed Single ]
WriteFMaskedStore32Y [ YMM Packed Single ]
WriteFMaskedStore64 [ XMM Packed Double ]
WriteFMaskedStore64Y [ YMM Packed Double ]
Added a wrapper class named X86SchedWriteMaskMove in X86Schedule.td to describe
both RM and MR variants for conditional SIMD moves in a single tablegen
definition.
Instances of that class are then passed in input to multiclass avx_movmask_rm
when constructing MASKMOVPS/PD definitions.
Since this patch introduces new writes, I had to update all the X86 scheduling
models.
Differential Revision: https://reviews.llvm.org/D66801
llvm-svn: 370649
The missing line added by this patch ensures that only spilt variable
locations are candidates for being restored from the stack. Otherwise,
register or constant-value information can be interpreted as a spill
location, through a union.
The added regression test replicates a scenario where this occurs: the
stack load from [rsp] causes the register-location DBG_VALUE to be
"restored" to rsi, when it should be left alone. See PR43058 for details.
Un x-fail a test that was suffering from this from a previous patch.
Differential Revision: https://reviews.llvm.org/D66895
llvm-svn: 370648
This is in line with the previous changes which allowed to
override the sh_offset/sh_size and useful for writing test cases.
Differential revision: https://reviews.llvm.org/D66998
llvm-svn: 370633
Verify that the call site DWARF symbols (added during the implementation
of the debug entry values feature) are generated properly.
Differential Revision: https://reviews.llvm.org/D66865
llvm-svn: 370631
MachineLICM can hoist an invariant load, but if that load is folded it needs to be unfolded. On AVX512 sometimes this load is an broadcast load which we were previously unable to unfold. This patch adds initial support for that with a very basic list of supported instructions as a starting point.
Differential Revision: https://reviews.llvm.org/D67017
llvm-svn: 370620
The motivating case for this is a long way from here:
https://bugs.llvm.org/show_bug.cgi?id=43146
...but I think this is where we have to start.
We need to canonicalize/optimize sequences of shift and logic to ease
pattern matching for things like bswap and improve perf in general.
But without the artificial limit of '!LegalTypes' (early combining),
there are a lot of test diffs, and not all are good.
In the minimal tests added for this proposal, x86 should have better
throughput in all cases. AArch64 is neutral for scalar tests because
it can fold shifts into bitwise logic ops.
There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns:
https://rise4fun.com/Alive/VlIhttps://rise4fun.com/Alive/n1mhttps://rise4fun.com/Alive/1Vn
Differential Revision: https://reviews.llvm.org/D67021
llvm-svn: 370617
Rename to lowerShuffleAsLanePermuteAndShuffle to make it clear that not just blends are performed.
Cleanup the in-lane shuffle mask generation to make it more obvious what's going on.
Some prep work noticed while investigating the poor shuffle code mentioned in D66004.
llvm-svn: 370613
These were never enabled correctly and are causing other problems. Taking them
out for the moment, whilst we work on the issues.
This reverts r370329.
llvm-svn: 370607
Summary:
This fixes the bugzilla id 43183 which triggerd by the following commit:
[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall
llvm-svn: 370604
This is what the copies will eventually be turned into. We don't
use COPY_TO_REGCLASS for scalar_to_vector patterns. So we should
use the real instruction here too.
llvm-svn: 370601
Summary:
Back-end currently expands mempcpy, but middle-end should work with memcpy instead of mempcpy to enable more memcpy-optimization.
GCC backend emits mempcpy, so LLVM backend could form it too, if we know mempcpy libcall is better than memcpy + n.
https://godbolt.org/z/dOCG96
Reviewers: efriedma, spatel, craig.topper, RKSimon, jdoerfert
Reviewed By: efriedma
Subscribers: hjl.tools, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65737
llvm-svn: 370593
EltsFromConsecutiveLoads was assuming that the number of input elts was the same as the number of elements in the output vector type when creating a zeroing shuffle, causing an assert when subvectors were being combined instead of just scalars.
llvm-svn: 370592
Narrowing stores when the target doesn't support the narrow version
forces the target to expand into a load-modify-store sequence, which
is highly suboptimal. The information narrowing throws away (legality
of the inverse transform) is hard to re-analyze. If the target doesn't
support a store of the narrow type, don't narrow even in pre-legalize
mode.
No test as this is DAGCombiner and depends on target bits.
llvm-svn: 370576
Use a { iN undef, i1 false } struct as the base, and only insert
the first operand, instead of using { iN undef, i1 undef } as the
base and inserting both. This is the same as what we do in InstCombine.
Differential Revision: https://reviews.llvm.org/D67034
llvm-svn: 370573
Restructured the code a little bit in preparation for adding
UMULFIXSAT. I think it will be easier to understand the code
if not interleaving the codegen for signed/unsigned/saturated
cases that much.
llvm-svn: 370569
1. zlib::compress accept &size_t but the param is an uint64_t.
2. Some systems don't have zlib installed. Don't use compression by default.
llvm-svn: 370564
cold versus function being newly added.
This is the second half of https://reviews.llvm.org/D66374.
Profile symbol list is the collection of function symbols showing up in
the binary which generates the current profile. It is used to discriminate
function being cold versus function being newly added. Profile symbol list
is only added for profile with ExtBinary format.
During profile use compilation, when profile-sample-accurate is enabled,
a function without profile will be regarded as cold only when it is
contained in that list.
Differential Revision: https://reviews.llvm.org/D66766
llvm-svn: 370563
Summary:
Adds clang builtins and LLVM intrinsics for these experimental
instructions. They are not implemented in engines yet, but that is ok
because the user must opt into using them by calling the builtins.
Reviewers: aheejin, dschuff
Reviewed By: aheejin
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D67020
llvm-svn: 370556
This is an updated version of https://reviews.llvm.org/D66909 to fix PR42605.
Basically, current phi translatation translates an old value number to an new
value number for a call instruction based on the literal equality of call
expression, without verifying there is no clobber in between. This is incorrect.
To get a finegrain check, use MachineDependence analysis to do the job. However,
this is still not ideal. Although given a call instruction,
`MemoryDependenceResults::getCallDependencyFrom` returns identical call
instructions without clobber in between using MemDepResult with its DepType to
be `Def`. However, identical is too strict here and we want it to be relaxed a
little to consider phi-translation -- callee is the same, param operands can be
different. That means changing the semantic of `MemDepResult::Def` and I don't
know the potential impact.
So currently the patch is still conservative to only handle
MemDepResult::NonFuncLocal, which means the current call has no function local
clobber. If there is clobber, even if the clobber doesn't stand in between the
current call and the call with the new value, we won't do phi-translate.
Differential Revision: https://reviews.llvm.org/D67013
llvm-svn: 370547
Also improve assembler parser register validation for .seh_ directives.
This requires moving X86-specific seh directive handling into the x86
backend, which addresses some assembler FIXMEs.
Differential Revision: https://reviews.llvm.org/D66625
llvm-svn: 370533
Users have complained llvm.trap produce two ud2 instructions on Win64,
one for the trap, and one for unreachable. This change fixes that.
TrapUnreachable was added and enabled for Win64 in r206684 (April 2014)
to avoid poorly understood issues with the Windows unwinder.
There seem to be two major things in play:
- the unwinder
- C++ EH, _CxxFrameHandler3 & co
The unwinder disassembles forward from the return address to scan for
epilogues. Inserting a ud2 had the effect of stopping the unwinder, and
ensuring that it ran the EH personality function for the current frame.
However, it's not clear what the unwinder does when the return address
happens to be the last address of one function and the first address of
the next function.
The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out
what the current EH state number is. It does this by consulting the
ip2state table, which maps from PC to state number. This seems to go
wrong when the return address is the last PC of the function or catch
funclet.
I'm not sure precisely which system is involved here, but in order to
address these real or hypothetical problems, I believe it is enough to
insert int3 after a call site if it would otherwise be the last
instruction in a function or funclet. I was able to reproduce some
similar problems locally by arranging for a noreturn call to appear at
the end of a catch block immediately before an unrelated function, and I
confirmed that the problems go away when an extra trailing int3
instruction is added.
MSVC inserts int3 after every noreturn function call, but I believe it's
only necessary to do it if the call would be the last instruction. This
change inserts a pseudo instruction that expands to int3 if it is in the
last basic block of a function or funclet. I did what I could to run the
Microsoft compiler EH tests, and the ones I was able to run showed no
behavior difference before or after this change.
Differential Revision: https://reviews.llvm.org/D66980
llvm-svn: 370525
This is the first stage in refactoring the pipeliner and making it more
accessible for backends to override and control. This separates the logic and
state required to *emit* a scheudule from the logic that *computes* and
validates a schedule.
This will enable (a) new schedule emitters and (b) new modulo scheduling
implementations to coexist.
NFC.
Differential Revision: https://reviews.llvm.org/D67006
llvm-svn: 370500
Just disable NSW/NUW flags. This matches what we're already doing for the other situations for these nodes, it was just missed for the demanded constant case.
Noticed by inspection - confirmed in offline discussion with @spatel. I've checked we have test coverage in the x86 extract-bits.ll and extract-lowbits.ll tests
llvm-svn: 370497
gcc and icc pass these types in zmm registers in zmm registers.
This patch implements a quick hack to override the register
type before calling convention handling to one that is legal.
Longer term we might want to do something similar to 256-bit
integer registers on AVX1 where we just split all the operations.
Fixes PR42957
Differential Revision: https://reviews.llvm.org/D66708
llvm-svn: 370495
Summary:
MTE allows memory access to bypass tag check iff the address argument
is [SP, #imm]. This change takes advantage of this to demote uses of
tagged addresses to regular FrameIndex operands, reducing register
pressure in large functions.
MO_TAGGED target flag is used to signal that the FrameIndex operand
refers to memory that might be tagged, and needs to be handled with
care. Such operand must be lowered to [SP, #imm] directly, without a
scratch register.
The transformation pass attempts to predict when the offset will be
out of range and disable the optimization.
AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case
this prediction has been wrong, but it is quite inefficient and should
be avoided.
Reviewers: pcc, vitalybuka, ostannard
Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66457
llvm-svn: 370490
I'm looking at unfolding broadcast loads on AVX512 which will
require refactoring this code to select broadcast opcodes instead
of regular load/stores in some cases. Merging them to avoid
further complicating their interfaces.
llvm-svn: 370484
Summary:
Instead of recomputing information for call sites we now use the
function information directly. This is always valid and once we have
call site specific information we can improve here.
This patch also bootstraps attributes that are created on-demand through
an initial update call. Information that is known will then directly be
available in the new attribute without causing an iteration delay.
The tests show how this improves the iteration count.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66781
llvm-svn: 370480
Summary:
Any pointer could have load/store users not only floating ones so we
move the manifest logic for alignment into the AAAlignImpl class.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66922
llvm-svn: 370479
Currenly we can encode the 'st_other' field of symbol using 3 fields.
'Visibility' is used to encode STV_* values.
'Other' is used to encode everything except the visibility, but it can't handle arbitrary values.
'StOther' is used to encode arbitrary values when 'Visibility'/'Other' are not helpfull enough.
'st_other' field is used to encode symbol visibility and platform-dependent
flags and values. Problem to encode it is that it consists of Visibility part (STV_* values)
which are enumeration values and the Other part, which is different and inconsistent.
For MIPS the Other part contains flags for all STO_MIPS_* values except STO_MIPS_MIPS16.
(Like comment in ELFDumper says: "Someones in their infinite wisdom decided to make
STO_MIPS_MIPS16 flag overlapped with other ST_MIPS_xxx flags."...)
And for PPC64 the Other part might actually encode any value.
This patch implements custom logic for handling the st_other and removes
'Visibility' and 'StOther' fields.
Here is an example of a new YAML style this patch allows:
- Name: foo
Other: [ 0x4 ]
- Name: bar
Other: [ STV_PROTECTED, 4 ]
- Name: zed
Other: [ STV_PROTECTED, STO_MIPS_OPTIONAL, 0xf8 ]
Differential revision: https://reviews.llvm.org/D66886
llvm-svn: 370472
This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total.
llvm-svn: 370468
Summary:
Found a couple of places in the code where all the PHI nodes
of a MBB is updated, replacing references to one MBB by
reference to another MBB instead.
This patch simply refactors the code to use a common helper
(MachineBasicBlock::replacePhiUsesWith) for such PHI node
updates.
Reviewers: t.p.northover, arsenm, uabelho
Subscribers: wdng, hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66750
llvm-svn: 370463