This pattern:
t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>
...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.
In the affected tests here, it looks like it just makes RA wiggle. But we might
as well squash this to prevent it interfering with other pattern-matching.
Differential Revision:
https://reviews.llvm.org/D56604
llvm-svn: 351008
MIPS `jr` instruction uses a delay-slot. To escape execution of
arbitrary instruction we should either fill the delay-slot by `nop`
instruction or swap `jr` instruction and logically preceding
instruction. This fix implements the second method to generate a bit
more effective code.
llvm-svn: 351001
MIPS ABI states that every function must be called through jalr $t9. In
other words, a function expect that t9 register points to the beginning
of its code. A function uses this register to calculate offset to the
Global Offset Table and save it to the `gp` register.
```
lui $gp, %hi(_gp_disp)
addiu $gp, %lo(_gp_disp)
addu $gp, $gp, $t9
```
If `t9` and as a result `$gp` point to the wrong place the following code
loads incorrect value from GOT and passes control to invalid code.
```
lw $v0,%call16(foo)($gp)
jalr $t9
```
OrcMips32 and OrcMips64 writeResolverCode methods pass control to the
resolved address, but do not setup `$t9` before the call. The `t9` holds
value of the beginning of `resolver` code so any attempts to call
routines via GOT failed.
This change fixes the problem. The `OrcLazy/hidden-visibility.ll` test
starts to pass correctly. Before the change it fails on MIPS because the
`exitOnLazyCallThroughFailure` called from the resolver code could not
call libc routine `exit` via GOT.
Differential Revision: http://reviews.llvm.org/D56058
llvm-svn: 351000
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.
Differential Revision: https://reviews.llvm.org/D56544
llvm-svn: 350998
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.
Reapplying with a smaller number of inline elements in the
SmallSetVector, to avoid running into the SmallDenseMap issue
described in D56455.
Differential Revision: https://reviews.llvm.org/D56362
llvm-svn: 350997
The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do.
This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there.
Fixes another instruction for PR34877.
llvm-svn: 350994
As discussed on llvm-dev
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>, we have
to be careful when trying to select the *w RV64M instructions. i32 is not a
legal type for RV64 in the RISC-V backend, so operations have been promoted by
the time they reach instruction selection. Information about whether the
operation was originally a 32-bit operations has been lost, and it's easy to
write incorrect patterns.
Similarly to the variable 32-bit shifts, a DAG combine on ANY_EXTEND will
produce a SIGN_EXTEND if this is likely to result in sdiv/udiv/urem being
selected (and so save instructions to sext/zext the input operands).
Differential Revision: https://reviews.llvm.org/D53230
llvm-svn: 350993
This restores support for selecting the SLLW/SRLW/SRAW instructions, which was
removed in rL348067 as the previous patterns made some unsafe assumptions.
Also see the related llvm-dev discussion
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>
Ultimately I didn't introduce a custom SelectionDAG node, but instead added a
DAG combine that inserts an AssertZext i5 on the shift amount for an i32
variable-length shift and also added an ANY_EXTEND DAG-combine which will
instead produce a SIGN_EXTEND for an i32 variable-length shift, increasing the
opportunity to safely select SLLW/SRLW/SRAW.
There are obviously different ways of addressing this (a number discussed in
the llvm-dev thread), so I'd welcome further feedback and comments.
Note that there are now some cases in
test/CodeGen/RISCV/rv64i-exhaustive-w-insts.ll where sraw/srlw/sllw is
selected even though sra/srl/sll could be used without any extra instructions.
Given both are semantically equivalent, there doesn't seem a good reason to
prefer one vs the other. Given that would require more logic to still select
sra/srl/sll in those cases, I've left it preferring the *w variants.
Differential Revision: https://reviews.llvm.org/D56264
llvm-svn: 350992
We no longer need to extend mask scalars before bitcasting them to vXi1. This was only needed for the truncate intrinsics. And was really a bug in our lowering of them.
llvm-svn: 350991
We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type.
By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned.
llvm-svn: 350989
We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask.
This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that.
This fixes some of PR34877
llvm-svn: 350985
This fixes https://bugs.llvm.org/show_bug.cgi?id=40110.
This implements handling of undef operands for integer intrinsics in
ConstantFolding, in particular for the bitcounting intrinsics (ctpop,
cttz, ctlz), the with.overflow intrinsics, the saturating math
intrinsics and the funnel shift intrinsics.
The undef behavior follows what InstSimplify does for the general cas
e of non-constant operands. For the bitcount intrinsics (where
InstSimplify doesn't do undef handling -- there cannot be a combination
of an undef + non-constant operand) I'm using a 0 result if the intrinsic
is defined for zero and undef otherwise.
Differential Revision: https://reviews.llvm.org/D55950
llvm-svn: 350971
Teach x86 assembly operand parsing to distinguish between assembler
variable assigned to named registers and those assigned to immediate
values.
Reviewers: rnk, nickdesaulniers, void
Subscribers: hiraditya, jyknight, llvm-commits
Differential Revision: https://reviews.llvm.org/D56287
llvm-svn: 350966
Summary:
When legalizing the result of a SELECT_CC node by promoting the
floating-point type, use the promoted-to type rather than the original
type.
Fix PR40273.
Reviewers: efriedma, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56566
llvm-svn: 350951
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).
The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.
This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.
Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.
Reviewers: pcc
Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D53890
llvm-svn: 350948
MergeFunc only deletes unused duplicate functions if they have local
linkage, but it should be safe to relax this to any "discardable if
unused" linkage type.
Differential Revision: https://reviews.llvm.org/D56574
llvm-svn: 350939
Currently when a select has a constant value in one branch and the select feeds
a conditional branch (via a compare/ phi and compare) we unfold the select
statement. This results in threading the conditional branch later on. Similar
opportunity exists when a select (with a constant in one branch) feeds a
switch (via a phi node). The patch unfolds select under this condition.
A testcase is provided.
llvm-svn: 350931
Previously, we limited this transform to cases where the
extraction into the build vector happens from vectors of
the same type as the build vector, but that's not required.
There's a slight potential regression seen in the AVX512
result for phadd -- we're using the 256-bit flavor of the
instruction now even though the 128-bit subset is sufficient.
The same problem could already be seen in the AVX2 result.
Follow-up patches will attempt to narrow that back down.
llvm-svn: 350928
We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions.
Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector.
This fixes some of the regressions from D350800.
llvm-svn: 350918
Summary:
We now use __stack_pointer global and global.get/global.set instruction.
This fixes the checking routine for stack_pointer writes accordingly.
This also fixes the existing __stack_pointer test in reg-stackify.ll:
That test used to pass not because of __stack_pointer clashes but
because the function `stackpointer_callee` was not marked as `readnone`,
so it was assumed to possibly write to memory arbitraily, and
`global.set` instruction was marked as `mayStore` in the .td definition,
so they were identified as intervening writes. After we added `readnone`
to its attribute, this test fails without this patch.
Reviewers: dschuff, sunfish
Subscribers: jgravelle-google, sbc100, llvm-commits
Differential Revision: https://reviews.llvm.org/D56094
llvm-svn: 350906
* Teach AsmParser to recognize @rn in distination operand as 0(rn).
* Do not allow Disassembler decoding instructions that have size more
than a number of input bytes.
* Fix UB in MSP430MCCodeEmitter.
Patch by Kristina Bessonova!
Differential Revision: https://reviews.llvm.org/D56547
llvm-svn: 350903
Summary:
This is a third attempt, but this time we have vetted it on Windows
first. The previous errors were due to an uninitialized class member.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D56560
llvm-svn: 350901
Sanity will fail for this, since we're exploring getting a clobber
further than the sanity check expects.
Ideally we need to teach the sanity check to differentiate between the
two walkers based on the SkipSelf bool in the query.
llvm-svn: 350895
Summary:
The original patch addressed the use of BlockRPONumber by forcing a sequence point when accessing that map in a conditional. In short we found cases where that map was being accessed with blocks that had not yet been added to that structure. For context, I've kept the wall of text below, to what we are trying to fix, by always ensuring a updated BlockRPONumber.
== Backstory ==
I was investigating an ICE (segfault accessing a DenseMap item). This failure happened non-deterministically, with no apparent reason and only on a Windows build of LLVM (from October 2018).
After looking into the crashes (multiple core files) and running DynamoRio, the cores and DynamoRio (DR) log pointed to the same code in `GVN::performScalarPRE()`. The values in the map are unsigned integers, the keys are `llvm::BasicBlock*`. Our test case that triggered this warning and periodic crash is rather involved. But the problematic line looks to be:
GVN.cpp: Line 2197
```
if (BlockRPONumber[P] >= BlockRPONumber[CurrentBlock] &&
```
To test things out, I cooked up a patch that accessed the items in the map outside of the condition, by forcing a sequence point between accesses. DynamoRio stopped warning of the issue, and the test didn't seem to crash after 1000+ runs.
My investigation was on an older version of LLVM, (source from October this year). What it looks like was occurring is the following, and the assembly from the latest pull of llvm in December seems to confirm this might still be an issue; however, I have not witnessed the crash on more recent builds. Of course the asm in question is generated from the host compiler on that Windows box (not clang), but it hints that we might want to consider how we access the BlockRPONumber map in this conditional (line 2197, listed above). In any case, I don't think the host compiler is wrong, rather I think it is pointing out a possibly latent bug in llvm.
1) There is no sequence point for the `>=` operation.
2) A call to a `DenseMapBase::operator[]` can have the side effect of the map reallocating a larger store (more Buckets, via a call to `DenseMap::grow`).
3) It seems perfectly legal for a host compiler to generate assembly that stores the result of a call to `operator[]` on the stack (that's what my host compile of GVN.cpp is doing) . A second call to `operator[]` //might// encourage the map to 'grow' thus making any pointers to the map's store invalid. The `>=` compares the first and second values. If the first happens to be a pointer produced from operator[], it could be invalid when dereferenced at the time of comparison.
The assembly generated from the Window's host compiler does show the result of the first access to the map via `operator[]` produces a pointer to an unsigned int. And that pointer is being stored on the stack. If a second call to the map (which does occur) causes the map to grow, that address (on the stack) is now invalid.
Reviewers: t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D55974
llvm-svn: 350880
Summary:
Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.
Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.
Reviewers: sanjoy, davide, gberry, george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D40375
llvm-svn: 350879
This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created.
This should help some of the regressions from D56387
Differential Revision: https://reviews.llvm.org/D56421
llvm-svn: 350875
Despite what the comment says, FCMP_UNE would be an OR not an AND. In the lowering code the first branch created still goes to the original destination. The second branch was exchanged to go to where the subsequent unconditional branch went. This is different than what we do for FCMP_OEQ where both branches that we create go to the original unconditional branch.
As far as I can tell, I think this means we don't need to exchange the branch target with the unconditional branch for FCMP_UNE at all.
Differential Revision: https://reviews.llvm.org/D56309
llvm-svn: 350873
This commit fixes the dwordx3/southern-islands failures that were found
in bugzilla https://bugs.llvm.org/show_bug.cgi?id=40129, by not
generating the dwordx3 variants of load/store instructions that were
added to the ISA after southern islands.
Differential Revision: https://reviews.llvm.org/D56434
llvm-svn: 350838
That is, remove many of the calls to Type::getNumContainedTypes(),
Type::subtypes(), and Type::getContainedType(N).
I'm not intending to remove these accessors -- they are
useful/necessary in some cases. However, removing the pointee type
from pointers would potentially break some uses, and reducing the
number of calls makes it easier to audit.
llvm-svn: 350835
This further improves compatibility with GNU as, allowing input such as the
following to be assembled:
.equ CONST, 0x123456
li a0, CONST
addi a0, a0, %lo(CONST)
.equ CONST, 1
slli a0, a0, CONST
Note that we don't have perfect compatibility with gas, as it will avoid
emitting a relocation in this case:
addi a0, a0, %lo(CONST2)
.equ CONST2, 0x123456
Thanks to Shiva Chen for suggesting a better way to approach this during review.
Differential Revision: https://reviews.llvm.org/D52298
llvm-svn: 350831
When we use the partial-matching function on a 128-bit chunk, we must
account for the possibility that we've matched undef halves of the
original source vectors, so the outputs may need to be reset.
This should allow closing PR40243:
https://bugs.llvm.org/show_bug.cgi?id=40243
llvm-svn: 350830
This is a partial fix for:
https://bugs.llvm.org/show_bug.cgi?id=40243
...as seen in the integer test, we still need to correct the result when using the
existing (old) horizontal op matching function because it does not model the way
x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op).
A potential follow-up change for that is discussed in the bug report - see also D56490.
This generally duplicates a lot of the existing matching code, but we can't just remove
that without introducing regressions, so the existing code is renamed and used less often.
Follow-ups may try to reduce that overlap.
Differential Revision: https://reviews.llvm.org/D56450
llvm-svn: 350826
Summary:
This patch changes the legalization action for some half-precision floating-
point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops
are not supported in hardware for half-precision vectors, but promotion is
not always possible (for v8f16 operands). Changing the action to Expand fixes
an assertion failure in the legalizer when the frontend produces such ops.
In addition, a quick microbenchmark shows that, in the v4f16 case,
expanding introduces fewer spills and is therefore slightly faster than
promoting.
Reviewers: t.p.northover, SjoerdMeijer
Reviewed By: SjoerdMeijer
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D56296
llvm-svn: 350825
Field ResourceUnitMask was incorrectly defined as a 'const unsigned' mask. It
should have been a 64 bit quantity instead. That means, ResourceUnitMask was
always implicitly truncated to a 32 bit quantity.
This issue has been found by inspection. Surprisingly, that bug was latent, and
it never negatively affected any existing upstream targets.
This patch fixes the wrong definition of ResourceUnitMask, and adds a bunch of
extra debug prints to help debugging potential issues related to invalid
processor resource masks.
llvm-svn: 350820
Allow to specify loop-unrolling with optional parameters explicitly
spelled out in -passes pipeline specification.
Introducing somewhat generic way of specifying parameters parsing via
FUNCTION_PASS_PARAMETRIZED pass registration.
Syntax of parametrized unroll pass name is as follows:
'unroll<' parameter-list '>'
Where parameter-list is ';'-separate list of parameter names and optlevel
optlevel: 'O[0-3]'
parameter: { 'partial' | 'peeling' | 'runtime' | 'upperbound' }
negated: 'no-' parameter
Example:
-passes=loop(unroll<O3;runtime;no-upperbound>)
this invokes LoopUnrollPass configured with OptLevel=3,
Runtime, no UpperBound, everything else by default.
llvm-svn: 350808
Add t2TEQrr to the map of instructions with can be reduced down into
a T1 instruction. This is a special case because TEQ just sets the
CPSR and doesn't write to a GPR, which is not the case for EOR. So,
we need to ensure that the EOR can write to the first operand.
Differential Revision: https://reviews.llvm.org/D56255
llvm-svn: 350801
Summary:
This pass replaces GR8/GR16/GR32/GR64 with their equivalent sized mask register classes. But VK32/VK64 aren't legal without AVX512BW. Apparently this mostly appears to work if the register coalescer is able to remove the VK32/VK64 register class reference. Or if we don't ever spill it. But there's no guarantee of that.
Another Intel employee managed to trigger a crash due to this with ISPC. Unfortunately, I've lost the test case he sent me at the time. I'm trying to get him to reproduce it for me. I'd like to get this in before 8.0 branches since its a little scary.
The regressions here are unfortunate, but I think we can make some improvements to DAG combine, load folding, etc. to fix them. Just not sure if we can get that done for 8.0.
Fixes PR39741
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56460
llvm-svn: 350800
Summary:
D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc.
This patch adds the necessary enums and MCExpr needed for lowering these.
Reviewers: rnk, mstorsjo, efriedma
Reviewed By: efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D56037
llvm-svn: 350798
This is a second attempt at r350778, which was reverted in
r350789. The only change is that the unimplemented-simd128 feature has
been renamed simd128-unimplemented, since naming it
unimplemented-simd128 somehow made the simd128 feature flag enable the
unimplemented-simd128 feature on Windows.
llvm-svn: 350791
Found while trying to figure out why my second version of D56421 worked better than the first version. We weren't deleting the vselect in a timely fashion and that caused SimplfyDemandedBit to see an additional user.
The new version doesn't have this problem so this fix isn't needed there, but seemed like the right thing to do.
llvm-svn: 350781
Summary:
This replaces the old ad-hoc -wasm-enable-unimplemented-simd
flag. Also makes the new unimplemented-simd128 feature imply the
simd128 feature.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton
Differential Revision: https://reviews.llvm.org/D56501
llvm-svn: 350778
The C standard says "The memchr function locates the first
occurrence of c (converted to an unsigned char)[...]". The expansion
was missing the conversion to unsigned char.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39041 .
Differential Revision: https://reviews.llvm.org/D55947
llvm-svn: 350775
When a null terminator is required and the file size is a multiple of the system page size, MemoryBuffer will prefer pread() over mmap(), which can result in excessive memory usage.
Patch by Mike Hommey!
Differential Revision: https://reviews.llvm.org/D56475
llvm-svn: 350774
Summary:
Looks like many passes print its pass description as a debug message at
the start of each pass, so added that to (mostly newly added) other
passes as well.
Reviewers: dschuff
Subscribers: jgravelle-google, sbc100, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56142
llvm-svn: 350771
Summary:
Instead of using two separate callbacks to return the entry count and the
relative block frequency, use a single callback to return callsite
count. This would allow better supporting hybrid mode in the future as
the count of callsite need not always be derived from entry count (as in
sample PGO).
Reviewers: davidxl
Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D56464
llvm-svn: 350755
If the caller's return type does not have a zeroext attribute but the
callee does a tail call zeroext, we won't consider the tail call during
CodeGenPrepare because the attributes don't match.
However, if the result of the tail call has no uses, it makes sense to
drop the sext/zext attributes.
Differential Revision: https://reviews.llvm.org/D56486
llvm-svn: 350753
Summary: All a non-default title for the debugging this debugging aide
Reviewers: twoh, Kader, modocache
Reviewed By: twoh
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56499
llvm-svn: 350749
Summary:
For some reason the backend assumed that the condition mask would be
the first argument to the LLVM intrinsic, but everywhere else the
condition mask is the third argument.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56412
llvm-svn: 350746
This is an initial implementation for Speculative Load Hardening for
AArch64. It builds on top of the recently introduced
AArch64SpeculationHardening pass.
This doesn't implement (yet) some of the optimizations implemented for
the X86SpeculativeLoadHardening pass. I thought introducing the
optimizations incrementally in follow-up patches should make this easier
to review.
Differential Revision: https://reviews.llvm.org/D55929
llvm-svn: 350729
Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now.
Test is fully reworked, added comments. Other fixes:
1. dpp move with uses and old reg initializer should be in the same BB.
2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity.
3. Added add, subrev, and, or instructions to the old folding function.
4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.
Differential revision: https://reviews.llvm.org/D55444
llvm-svn: 350721
Follow up patch of rL350385, for adding predres
command line option. This patch renames the
feature as to keep it aligned with the option
passed by/to clang
Differential Revision: https://reviews.llvm.org/D56484
llvm-svn: 350702
Summary:
This fixes PR39710. In that case we emitted a location list looking like
this:
.Ldebug_loc0:
.quad .Lfunc_begin0-.Lfunc_begin0
.quad .Lfunc_begin0-.Lfunc_begin0
.short 1 # Loc expr size
.byte 85 # DW_OP_reg5
.quad .Lfunc_begin0-.Lfunc_begin0
.quad .Lfunc_end0-.Lfunc_begin0
.short 1 # Loc expr size
.byte 85 # super-register DW_OP_reg5
.quad 0
.quad 0
As seen, the first entry's beginning and ending addresses evalute to 0,
which meant that the entry inadvertently became an "end of list" entry,
resulting in the location list ending sooner than expected.
To fix this, omit all entries with empty ranges. Location list entries
with empty ranges do not have any effect, as specified by DWARF, so we
might as well drop them:
"A location list entry (but not a base address selection or end of list
entry) whose beginning and ending addresses are equal has no effect
because the size of the range covered by such an entry is zero."
Reviewers: davide, aprantl, dblaikie
Reviewed By: aprantl
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D55919
llvm-svn: 350698
Current strategy of dropping `InstructionPrecedenceTracking` cache is to
invalidate the entire basic block whenever we change its contents. In fact,
`InstructionPrecedenceTracking` has 2 internal strictures: `OrderedInstructions`
that is needed to be invalidated whenever the contents changes, and the map
with first special instructions in block. This second map does not need an
update if we add/remove a non-special instuction because it cannot
affect the contents of this map.
This patch changes API of `InstructionPrecedenceTracking` so that it now
accounts for reasons under which we invalidate blocks. This should lead
to much less recalculations of the map and should save us some compile time
because in practice we don't typically add/remove special instructions.
Differential Revision: https://reviews.llvm.org/D54462
Reviewed By: efriedma
llvm-svn: 350694
When the result type is v2i64/v2f64 and the index element size is i32, the index vector has two unused elements making the type v4i32. The mask VT should match the number of memory accesses that will be made.
This is consistent with the isel patterns used for the target independent gather/scatter intrinsic.
llvm-svn: 350687
Bad machine code: Illegal virtual register for instruction
function: TestULE
basic block: %bb.0 entry (0x1000a39b158)
instruction: %2:crrc = FCMPUD %1:vsfrc, %3:f8rc
operand 1: %1:vsfrc
Fix assert about missing match between fcmp instruction and register class.
We should use vsx related cmp instruction xvcmpudp instead of fcmpu when vsx is opened.
add -verifymachineinstrs option into related test cases to enable the verify pass.
Differential Revision: https://reviews.llvm.org/D55686
llvm-svn: 350685
This removes check for single use from general ShrinkDemandedConstant
to the BE because of the AArch64 regression after D56289/rL350475.
After several hours of experiments I did not come up with a testcase
failing on any other targets if check is not performed.
Moreover, direct call to ShrinkDemandedConstant is not really needed
and superceed by SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D56406
llvm-svn: 350684
Currently it's possible for following
check on V.WriteLanes (which is not really meaningful
during SubRangeJoin) to pass for one half of the pair,
and then fall through to to one of the impossible
or unresolved states. This then fails as inconsistent
on the other half.
During the main range join, the check between V.WriteLanes
and OtherV.ValidLanes must have passed, meaning this
should be a CR_Replace.
Fixes most of the testcases in bugs 39542 and 39602
llvm-svn: 350678
We can't go back and recover the lanes if it turns
out the implicit_def really can't be erased.
Assume all lanes are valid if an unresolved conflict
is encountered. There aren't any tests where this
seems to matter either way, but this seems like a
safer option.
Fixes bug 39602
llvm-svn: 350676
This is matching the equivalent of the DAG expansion,
so it should never end up with worse perf than the
original code even if the target doesn't have a rotate
instruction.
llvm-svn: 350672
In LTO or Thin-lto mode (though linker plugin), the module
names are of temp file names which are different for
different compilations. Using SourceFileName avoids the issue.
This should not change any functionality for current PGO as
all the current callers of getPGOFuncName() is before LTO.
Differential Revision: https://reviews.llvm.org/D56327
llvm-svn: 350671
Summary:
StoreResults pass does not optimize store instructions anymore because
store instructions don't return results values anymore. Now this pass is
used solely for memory intrinsics, so update the pass name accordingly
and fix outdated pass descriptions as well.
This patch does not change any meaningful behavior, but not marked as
NFC because it changes a comment check line in a test case.
Reviewers: dschuff
Subscribers: mgorny, sbc100, jgravelle-google, sunfiish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56093
llvm-svn: 350669
Starting in C++17, MSVC introduced a new mangling for function
parameters that are themselves noexcept functions. This patch
makes llvm-undname properly demangle them.
Patch by Zachary Henkel
Differential Revision: https://reviews.llvm.org/D55769
llvm-svn: 350656
A straightforward port of tsan to the new PM, following the same path
as D55647.
Differential Revision: https://reviews.llvm.org/D56433
llvm-svn: 350647
Summary:
This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case.
We initially had a restrictive check that the IDom is only updated when
it is the header of the loop.
However, we also need to update the IDom to the correct one when the
IDom is any block within the original loop. See added test cases (which
fail dom tree verification without the patch).
Reviewers: reames, mzolotukhin, mkazantsev, hfinkel
Reviewed by: brzycki, kuhar
Subscribers: zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D56284
llvm-svn: 350640
Commit f1db33c5c1a9 ("[BPF] Disable relocation for .BTF.ext section")
assigned relocation type R_BPF_NONE if the fixup type
is FK_Data_4 and the symbol is temporary.
The reason is we use FK_Data_4 as a fixup type
for insn offsets in .BTF.ext section.
Just checking whether the symbol is temporary is not enough.
For example, .debug_info may reference some strings whose
fixup is FK_Data_4 with a temporary symbol as well.
To truely reflect the case for .BTF.ext section,
this patch further checks that the section associateed with the symbol
must be SHF_ALLOC and SHF_EXECINSTR, i.e., in the text section.
This fixed the above-mentioned problem.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 350637
This patch adds a convenience report() method for physical registers and
uses it to print the offending register with the 'MBB has allocatable
live-in' error.
Reviewers: MatzeB, rtereshin, dsanders
Reviewed By: dsanders
Differential Revision: https://reviews.llvm.org/D55946
llvm-svn: 350630
Commit rL347861 introduced an unintentional change in the behaviour when
compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly
disabling GlobalISel resulted in using FastISel but an updated condition
in the commit changed it to using SelectionDAG. The patch fixes this
condition and slightly better organizes the code that chooses the
instruction selector.
Fixes PR40131.
Differential Revision: https://reviews.llvm.org/D56266
llvm-svn: 350626
The new-pm version of DA is untested. Testing requires a printer, so
add that and use it in the existing DA tests.
Differential Revision: https://reviews.llvm.org/D56386
llvm-svn: 350624
For stack frames on the size of a register in x86, a code size optimization
emits "push rax/eax" instead of "sub" for stack allocation. For example:
foo:
.cfi_startproc
BB#0:
pushq %rax
Ltmp0:
.cfi_def_cfa_offset 16
...
.cfi_endproc
However, we are falling back to DWARF in this case because we cannot
encode %rax as a saved register.
This requirement is wrong, since we don't care about the contents of
%rax, it is the equivalent of a sub.
In order to specify that we care about the contents of %rax, we would
need a .cfi_offset %rax, <offset>.
It's also overzealous in the case where there are pushes for callee saved
registers followed by a "push rax/eax" instead of "sub", in which case we should
also be able to encode the callee saved regs and everything else using compact
unwind.
Patch authored by Bruno Cardoso Lopes.
Differential Revision: https://reviews.llvm.org/D13793
llvm-svn: 350623
This reverts commit rL350497
reported remaining issues seem to be unrelated to modules or this change.
more info: https://reviews.llvm.org/D56084
llvm-svn: 350621
We have code to split vector splats (of zero and non-zero) for performance
reasons, but it ignores the fact that a store might be truncating.
Actually, truncating stores are formed for vNi8 and vNi16 types. Since the
truncation is from a legal type, the size of the store is always <= 64-bits and
so they don't actually benefit from being split up anyway, so this patch just
disables that transformation.
llvm-svn: 350620
Using a PatLeaf for sext_16_node allowed matching smulbb and smlabb
instructions once the operands had been sign extended. But we also
need to use sext_inreg operands along with sext_16_node to catch a
few more cases that enable use to remove the unnecessary sxth.
Differential Revision: https://reviews.llvm.org/D55992
llvm-svn: 350613
I'm not entirely sure this is the correct thing
to do with the global isel philosophy, but I think
this is necessary to handle how differently SGPRs
are used normally vs. from a condition.
For example, it makes sense to allow a copy
from a VGPR to an SGPR, but it makes no sense
to allow a copy from VGPRs to SGPRs used as
select mask.
This avoids regbankselecting strange code with
a truncate feeding directly into a condition field.
Now a copy is forced from sgpr(s1) to vcc, which is
more sensible to handle.
Some of these issues could probably avoided with making enough
operations resulting in i1 illegal. I think we can't avoid
this register bank for legality.
For example, an i1 and where one source is from a truncate, and
one source is a compare needs some kind of copy inserted to
make sure both are in condition registers.
llvm-svn: 350611
Summary: Add a utility function for creating a basic block without a parent function. A useful operation for compilers that need to synthesize and conditionally insert code without having to bother with appending and immediately unlinking a block.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56279
llvm-svn: 350608
Summary: Fix an old outstanding problem with the int cast builder binding always assuming the cast is signed by introducing a new LLVMBuildIntCast2 operation and deprecating the old prototype.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56280
llvm-svn: 350607
Summary:
FixIrreducibleControlFlow and LateEHPrepare both possibly modify CFG and
create new registers. There seems to be no reason these passes go after
register-related optimization passes (PrepareForLiveIntervals,
OptimizeLiveIntervals, StoreResults, RegStackify, and RegColoring), and
this also possibly create new optimization opportunities. I think we
should put all current and future optimization passes before RegStackify
(and related passes) unless there's a reason not to.
Reviewers: kripken
Subscribers: dschuff, sbc100, sunfish, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D56356
llvm-svn: 350596
If a copy was needed to handle the condition of brcond, it was being
inserted before the defining instruction. Add tests for iterator edge
cases.
I find the existing code here suspect for the case where it's looking
for terminators that modify the register. It's going to insert a copy
in the middle of the terminators, which isn't allowed (it might be
necessary to have a COPY_terminator if anybody actually needs this).
Also legalize brcond for AMDGPU.
llvm-svn: 350595
Summary:
We don't need to explicitly use `NI` anymore because we now don't use
`let` statements within the definitions.
Reviewers: aardappel
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56376
llvm-svn: 350594
merging a src register in ToBeUpdated set.
This is to fix PR40061 related with https://reviews.llvm.org/rL339035.
In https://reviews.llvm.org/rL339035, live interval of source pseudo register
in rematerialized copy may be saved in ToBeUpdated set and its update may be
postponed.
In PR40061, %t2 = %t1 is rematerialized and %t1 is added into toBeUpdated set
to postpone its live interval update. After the rematerialization, the live
interval of %t1 is larger than necessary. Then %t1 is merged into %t3 and %t1
gets removed. After the merge, %t3 contains live interval larger than necessary.
Because %t3 is not in toBeUpdated set, its live interval is not updated after
register coalescing and it will break some assumption in regalloc.
The patch requires the live interval of destination register in a merge to be
updated if the source register is in ToBeUpdated.
Differential revision: https://reviews.llvm.org/D55867
llvm-svn: 350586
MSVC has a nesting limit of around 110-130. An if/else if/else if counts against this next level. The autoupgrade code consists a long chain of these checking matches against strings.
This commit moves some code to a helper function to move out a large if/else chain that was inside of one of the blocks into a separate function. There are more of these we could move or we could change some to lookup tables.
I've also merged together a few similar blocks in the outer chain. This should buy us some margin for a little bit.
llvm-svn: 350564
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.
Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.
This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.
Differential Revision: https://reviews.llvm.org/D56087
llvm-svn: 350560
This patch adds the sign/zero extension done by
vgetlane to ARM computeKnownBitsForTargetNode.
Differential revision: https://reviews.llvm.org/D56098
llvm-svn: 350553
Summary:
The option enables loop transformations to hoist accesses that do not
have clobbers in the loop. If the clobber queries skips the starting
access, the result may be outside the loop instead of the header Phi.
Adding the walker that uses this option in a separate patch.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D55944
llvm-svn: 350551
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.
Differential Revision: https://reviews.llvm.org/D56362
llvm-svn: 350547
Summary:
If a divergent branch instruction is marked as divergent by propagation
rule 2 in DivergencePropagator::exploreSyncDependency() and its condition
is uniform, that branch would incorrectly be assumed to be uniform.
Reviewers: arsenm, tstellar
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D56331
llvm-svn: 350532
`CallSite`.
With this change, the remaining `CallSite` usages are just for
implementing the wrapper type itself.
This does update the C API but leaves the names of that API alone and
only updates their implementation.
Differential Revision: https://reviews.llvm.org/D56184
llvm-svn: 350509
update client code.
Also rename it to use the more generic term `call` instead of something
that could be confused with a praticular type.
Differential Revision: https://reviews.llvm.org/D56183
llvm-svn: 350508
`CallSite` wrapper.
Mostly mechanical, but I've tried to tidy up code where it made sense to
do so.
Differential Revision: https://reviews.llvm.org/D56143
llvm-svn: 350507