Summary:
This has been superseded by "[AMDGPU]: PHI Elimination hooks added for custom COPY insertion."
This reverts the code changes from commit 53f967f2bd
but keeps the test case.
Reviewers: hliao, arsenm, tpr, dstuttard
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68769
llvm-svn: 374347
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68250
llvm-svn: 374340
The FileCheck utility is enhanced to support a `--ignore-case`
option. This is useful in cases where the output of Unix tools
differs in case (e.g. case not specified by Posix).
Reviewers: Bigcheese, jakehehrlich, rupprecht, espindola, alexshap, jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D68146
llvm-svn: 374339
This is just small refactoring to minimize changes in upcoming patch.
In the next path I'm going to introduce changes into heuristic for vectorization of "tiny trip count" loops.
Patch by Evgeniy Brevnov <evgueni.brevnov@gmail.com>
Reviewers: hsaito, Ayal, fhahn, reames
Reviewed By: hsaito
Differential Revision: https://reviews.llvm.org/D67690
llvm-svn: 374338
Summary:
The implementation is fairly straight-forward and uses the same patterns
as the existing streams. The yaml form does not attempt to preserve the
data in the "gaps" that can be created by setting a larger-than-required
header or entry size in the stream header, because the existing consumer
(lldb) does not make use of the information in the gap in any way, and
attempting to preserve that would make the implementation more
complicated.
Reviewers: amccarth, jhenderson, clayborg
Subscribers: llvm-commits, lldb-commits, markmentovai, zturner, JosephTremoulet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68645
llvm-svn: 374337
This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat
intrinsics.
Differential Revision: https://reviews.llvm.org/D68566
llvm-svn: 374336
EXPENSIVE_CHECKS build was failing on new test.
This is fixed by marking $ra register as undef.
Test now has -verify-machineinstrs to check for operand flags.
llvm-svn: 374320
Currently, the heuristics the if-conversion pass uses for diamond if-conversion
are based on execution time, with no consideration for code size. This adds a
new set of heuristics to be used when optimising for code size.
This is mostly target-independent, because the if-conversion pass can
see the code size of the instructions which it is removing. For thumb,
there are a few passes (insertion of IT instructions, selection of
narrow branches, and selection of CBZ instructions) which are run after
if conversion and affect these heuristics, so I've added target hooks to
better predict the code-size effect of a proposed if-conversion.
Differential revision: https://reviews.llvm.org/D67350
llvm-svn: 374301
SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
the additional non-allocatable TTMP registers. There's no point in
allocating SReg_128 vregs. This shrinks the size of the classes
regalloc needs to consider, which is usually good.
llvm-svn: 374284
Summary:
`null` in the default address space (=AS 0) cannot be captured nor can
it alias anything. We make this clear now as it can be important for
callbacks and other cases later on. In addition, this patch improves the
debug output for noalias deduction.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68624
llvm-svn: 374280
Original Patch broke for compilations w/ gcc and exposed asan fail.
This reland repairs those bugs.
Differential Revision: https://reviews.llvm.org/D67529
llvm-svn: 374277
Summary:
Visual Studio doesn't like it while stepping. It kicks you out of the
source view of the file being stepped through and tries to fall back to
the disassembly view.
Fixes PR43530
The fix is incomplete, because it's possible to have a basic block with
no source locations at all. In this case, we don't emit a .cv_loc, but
that will result in wrong stepping behavior in the debugger if the
layout predecessor of the location-less BB has an unrelated source
location. We could try harder to find a valid location that dominates or
post-dominates the current BB, but in general it's a dataflow problem,
and one still might not exist. I left a FIXME about this.
As an alternative, we might want to consider having the middle-end check
if its emitting codeview and get it to stop using line zero.
Reviewers: akhuang
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68747
llvm-svn: 374267
As background, starting in D66309, I'm working on support unordered atomics analogous to volatile flags on normal LoadSDNode/StoreSDNodes for X86.
As part of that, I spent some time going through usages of LoadSDNode and StoreSDNode looking for cases where we might have missed a volatility check or need an atomic check. I couldn't find any cases that clearly miscompile - i.e. no test cases - but a couple of pieces in code loop suspicious though I can't figure out how to exercise them.
This patch adds defensive checks and asserts in the places my manual audit found. If anyone has any ideas on how to either a) disprove any of the checks, or b) hit the bug they might be fixing, I welcome suggestions.
Differential Revision: https://reviews.llvm.org/D68419
llvm-svn: 374261
In a future patch, this will help cleanup m0 handling.
The register coalescer handles copies from a register that
materializes an immediate, but doesn't handle move immediates
itself. The virtual register uses will often be allocated to the same
register, so there end up being no real copy.
llvm-svn: 374257
This was ignoring the register bank of the input pointer, and
isUniformMMO seems overly aggressive.
This will now conservatively assume a VGPR in cases where the incoming
bank hasn't been determined yet (i.e. is from a loop phi).
llvm-svn: 374255
If original instruction did not have source modifiers they were
not added to the new DPP instruction as well, even if needed.
Differential Revision: https://reviews.llvm.org/D68729
llvm-svn: 374241
Summary:
This is necessary and sufficient to get simple cases of multiple
return working with multivalue enabled. More complex cases will
require block and loop signatures to be generalized to potentially be
type indices as well.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68684
llvm-svn: 374235
in ExtBinary format
Currently for Text, Binary and ExtBinary format profiles, when we compile a
module with samplefdo, even if there is no function showing up in the profile,
we have to load all the function profiles from the profile input. That is a
waste of compile time.
CompactBinary format profile has already had the support of loading function
profiles on demand. In this patch, we add the support to load profile on
demand for ExtBinary format. It will work no matter the sections in ExtBinary
format profile are compressed or not. Experiment shows it reduces the time to
compile a server benchmark by 30%.
When profile remapping and loading function profiles on demand are both used,
extra work needs to be done so that the loading on demand process will take
the name remapping into consideration. It will be addressed in a follow-up
patch.
Differential Revision: https://reviews.llvm.org/D68601
llvm-svn: 374233
Add own version of the mathematical constants from the upcoming C++20 `std::numbers`.
Differential revision: https://reviews.llvm.org/D68257
llvm-svn: 374207
Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved
moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo.
Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model. Changes include:
- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
implementation
- Moving the default TargetTransformInfoImplBase implementation to a default
MCSubtarget implementation
Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC
and SystemZ. AArch64 already has a custom subtarget implementation, so its
custom TTI implementation is migrated to use the new facilities in BasicTTIImpl
to invoke its custom subtarget implementation. The custom TTI implementations
continue to exist for the other targets with this change. They are not moved
over to subtarget-based implementations.
The end goal is to have the default subtarget implementation defer to the system
model defined by the target. With this change, the default MCSubtargetInfo
implementation essentially returns the defaults TargetTransformInfoImplBase used
to return. Existing users of TTI defaults will hit the defaults now in
MCSubtargetInfo. Targets that define their own custom TTI implementations won't
use the BasicTTIImpl implementations that route to the subtarget.
Once system models are in place for the targets that use these interfaces, their
custom TTI implementations can be removed.
Differential Revision: https://reviews.llvm.org/D63614
llvm-svn: 374205
(specifying an underlying type for the enum might also be suitable - but
this seems better/as good, since there's a clear expectation this can
contain values other than the actual enumerators of this enum)
llvm-svn: 374196
Summary:
This clang builtin and corresponding LLVM intrinsic are necessary to
expose the exact semantics of the underlying WebAssembly instruction
to users. LLVM produces a poison value if the dynamic swizzle indices
are greater than the vector size, but the WebAssembly instruction sets
the corresponding output lane to zero. Users who depend on this
behavior can safely use this builtin.
Depends on D68527.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D68531
llvm-svn: 374189
Summary:
Adds the new v8x16.swizzle SIMD instruction as specified at
https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#swizzling-using-variable-indices.
In addition to adding swizzles as a candidate lowering in
LowerBUILD_VECTOR, also rewrites and simplifies the lowering to
minimize the number of replace_lanes necessary rather than trying to
minimize code size. This leads to more uses of v128.const instead of
splats, which is expected to increase performance.
The new code will be easier to tune once V8 implements all the vector
construction operations, and it will also be easier to add new
candidate instructions in the future if necessary.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68527
llvm-svn: 374188
We failed to account for the target register width (max vector factor)
when vectorizing starting from GEPs. This causes vectorization to
proceed to obviously illegal widths as in:
https://bugs.llvm.org/show_bug.cgi?id=43578
For x86, this also means that SLP can produce rogue AVX or AVX512
code even when the user specifies a narrower vector width.
The AArch64 test in ext-trunc.ll appears to be better using the
narrower width. I'm not exactly sure what getelementptr.ll is trying
to do, but it's testing with "-slp-threshold=-18", so I'm not worried
about those diffs. The x86 test is an over-reduction from SPEC h264;
this patch appears to restore the perf loss caused by SLP when using
-march=haswell.
Differential Revision: https://reviews.llvm.org/D68667
llvm-svn: 374183
stack
This patch makes sure that if we tag some memory, we untag that memory before
the function returns/throws via any exit, reachable from the tag operation. For
that we place the untag operation either at:
a) the lifetime end call for the alloca, if that call post-dominates the
lifetime start call (where the tag operation is placed), or it (the
lifetime end call) dominates all reachable exits, otherwise
b) at the reachable exits
Differential Revision: https://reviews.llvm.org/D68469
llvm-svn: 374182
Summary:
The rule for the moveAllAfterMergeBlocks API si for all instructions
from `From` to have been moved to `To`, while keeping the CFG edges (and
block terminators) unchanged.
Update all the callsites for moveAllAfterMergeBlocks to follow this.
Pending follow-up: since the same behavior is needed everytime, merge
all callsites into one. The common denominator may be the call to
`MergeBlockIntoPredecessor`.
Resolves PR43569.
Reviewers: george.burgess.iv
Subscribers: Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68659
llvm-svn: 374177
When optimising for size and SCEV runtime checks need to be emitted to check
overflow behaviour, the loop vectorizer can run in this assert:
LoopVectorize.cpp:2699: void llvm::InnerLoopVectorizer::emitSCEVChecks(
llvm::Loop *, llvm::BasicBlock *): Assertion `!BB->getParent()->hasOptSize()
&& "Cannot SCEV check stride or overflow when opt
We should not generate predicates while optimising for size because
code will be generated for predicates such as these SCEV overflow runtime
checks.
This should fix PR43371.
Differential Revision: https://reviews.llvm.org/D68082
llvm-svn: 374166
The `expandLoadImmReal` handles four different and almost non-overlapping
cases: loading a "single" float immediate into a GPR, loading a "single"
float immediate into a FPR, and the same couple for a "double" float
immediate.
It's better to move each `else if` branch into separate methods.
llvm-svn: 374164
David added the JamCRC implementation in r246590. More recently, Eugene
added a CRC-32 implementation in r357901, which falls back to zlib's
crc32 function if present.
These checksums are essentially the same, so having multiple
implementations seems unnecessary. This replaces the CRC-32
implementation with the simpler one from JamCRC, and implements the
JamCRC interface in terms of CRC-32 since this means it can use zlib's
implementation when available, saving a few bytes and potentially making
it faster.
JamCRC took an ArrayRef<char> argument, and CRC-32 took a StringRef.
This patch changes it to ArrayRef<uint8_t> which I think is the best
choice, and simplifies a few of the callers nicely.
Differential revision: https://reviews.llvm.org/D68570
llvm-svn: 374148
Summary:
While working with DagInit's, it's often the case that you expect the
operator to be a reference to a def. This patch adds a wrapper for this
common case to reduce the amount of boilerplate callers need to duplicate
repeatedly.
getOperatorAsDef() returns the record if the DagInit has an operator that is
a DefInit. Otherwise, it prints a fatal error.
There's only a few pre-existing examples in LLVM at the moment and I've
left a few instances of the code this simplifies as they had more specific
error messages than the generic one this produces. I'm going to be using
this a fair bit in my subsequent patches.
Reviewers: bogner, volkan, nhaehnle
Reviewed By: nhaehnle
Subscribers: nhaehnle, hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68424
llvm-svn: 374101
There were 2 problems here. First, these patterns were duplicated to
handle the inverted shift operands instead of using the commuted
PatFrags.
Second, the point of the zext folding patterns don't apply to the
non-0ing high subtargets. They should be skipped instead of inserting
the extension. The zeroing high code would be emitted when necessary
anyway. This was also emitting unnecessary zexts in cases where the
high bits were undefined.
llvm-svn: 374092
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"
This reverts commit 9f41deccc0.
This reverts commit 18b6fe07bc.
The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.
llvm-svn: 374091
Factor out CodeExtractor's analysis of allocas (for shrinkwrapping
purposes), and allow the analysis to be reused.
This resolves a quadratic compile-time bug observed when compiling
AMDGPUDisassembler.cpp.o.
Pre-patch (Release + LTO clang):
```
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
176.5278 ( 57.8%) 0.4915 ( 18.5%) 177.0192 ( 57.4%) 177.4112 ( 57.3%) Hot Cold Splitting
```
Post-patch (ReleaseAsserts clang):
```
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
1.4051 ( 3.3%) 0.0079 ( 0.3%) 1.4129 ( 3.2%) 1.4129 ( 3.2%) Hot Cold Splitting
```
Testing: check-llvm, and comparing the AMDGPUDisassembler.cpp.o binary
pre- vs. post-patch.
An alternate approach is to hide CodeExtractorAnalysisCache from clients
of CodeExtractor, and to recompute the analysis from scratch inside of
CodeExtractor::extractCodeRegion(). This eliminates some redundant work
in the shrinkwrapping legality check. However, some clients continue to
exhibit O(n^2) compile time behavior as computing the analysis is O(n).
rdar://55912966
Differential Revision: https://reviews.llvm.org/D68616
llvm-svn: 374089
Summary:
Without offsets on the MachineMemOperands (MMOs),
MachineInstr::mayAlias() will return true for all reads and writes to the
same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler
when analyzing dependencies of buffer loads and stores. It also limits
the SILoadStoreOptimizer from merging more instructions.
This patch reduces the compile time of one pathological compute shader
from 12 seconds to 1 second.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65097
llvm-svn: 374087
The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<> directly and if not assert will fire for us.
llvm-svn: 374085
Inhibit generation of unused real dpp instructions on gfx10 just
like it is done on other subtargets. This does not change anything
because these are illegal anyway and not accepted, but it does
reduce the number of instruction definitions generated.
Differential Revision: https://reviews.llvm.org/D68607
llvm-svn: 374083
Summary:
When searching for local expression tree created by stackified
registers, for 'block' placement, we start the search from the previous
instruction of a BB's terminator. But in 'try''s case, we should start
from the previous instruction of a call that can throw, or a EH_LABEL
that precedes the call, because the return values of the call's previous
instructions can be stackified and consumed by the throwing call.
For example,
```
i32.call @foo
call @bar ; may throw
br $label0
```
In this case, if we start the search from the previous instruction of
the terminator (`br` here), we end up stopping at `call @bar` and place
a 'try' between `i32.call @foo` and `call @bar`, because `call @bar`
does not have a return value so it is not a local expression tree of
`br`.
But in this case, unlike when placing 'block's, we should start the
search from `call @bar`, because the return value of `i32.call @foo` is
stackified and used by `call @bar`.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68619
llvm-svn: 374073
During the If-Converter optimization pay attention when copying or
deleting call instructions in order to keep call site information in
valid state.
Reviewers: aprantl, vsk, efriedma
Reviewed By: vsk, efriedma
Differential Revision: https://reviews.llvm.org/D66955
llvm-svn: 374068
Summary:
In D65186 and related patches, MustBeExecutedContextExplorer is introduced. This enables us to traverse instructions guaranteed to execute from function entry. If we can know the argument is used as `dereferenceable` or `nonnull` in these instructions, we can mark `dereferenceable` or `nonnull` in the argument definition:
1. Memory instruction (similar to D64258)
Trace memory instruction pointer operand. Currently, only inbounds GEPs are traced.
```
define i64* @f(i64* %a) {
entry:
%add.ptr = getelementptr inbounds i64, i64* %a, i64 1
; (because of inbounds GEP we can know that %a is at least dereferenceable(16))
store i64 1, i64* %add.ptr, align 8
ret i64* %add.ptr ; dereferenceable 8 (because above instruction stores into it)
}
```
2. Propagation from callsite (similar to D27855)
If `deref` or `nonnull` are known in call site parameter attributes we can also say that argument also that attribute.
```
declare void @use3(i8* %x, i8* %y, i8* %z);
declare void @use3nonnull(i8* nonnull %x, i8* nonnull %y, i8* nonnull %z);
define void @parent1(i8* %a, i8* %b, i8* %c) {
call void @use3nonnull(i8* %b, i8* %c, i8* %a)
; Above instruction is always executed so we can say that@parent1(i8* nonnnull %a, i8* nonnull %b, i8* nonnull %c)
call void @use3(i8* %c, i8* %a, i8* %b)
ret void
}
```
Reviewers: jdoerfert, sstefan1, spatel, reames
Reviewed By: jdoerfert
Subscribers: xbolva00, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65402
llvm-svn: 374063
Summary: This patch introduces a generic way to compose two structured deductions. This will be used for composing generic deduction with `MustBeExecutedExplorer` and other existing generic deduction.
Reviewers: jdoerfert, sstefan1
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66645
llvm-svn: 374060
Summary:
This format introduces new features and platforms
The motivation for this format is to support more than 1 platform since previous versions only supported additional architectures and 1 platform,
for example ios + ios-simulator and macCatalyst.
Reviewers: ributzka, steven_wu
Reviewed By: ributzka
Subscribers: mgorny, hiraditya, mgrang, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67529
llvm-svn: 374058
When -pg option is present than a call to _mcount is inserted into every
function. However since the proper ABI was not followed then the generated
gmon.out did not give proper results. By inserting needed instructions
before every _mcount we can fix this.
Differential Revision: https://reviews.llvm.org/D68390
llvm-svn: 374055
Summary:
This patch adds the definitions of the constants and structures
necessary to interpret the MemoryInfoList minidump stream, as well as
the object::MinidumpFile interface to access the stream.
While the code is fairly simple, there is one important deviation from
the other minidump streams, which is worth calling out explicitly.
Unlike other "List" streams, the size of the records inside
MemoryInfoList stream is not known statically. Instead it is described
in the stream header. This makes it impossible to return
ArrayRef<MemoryInfo> from the accessor method, as it is done with other
streams. Instead, I create an iterator class, which can be parameterized
by the runtime size of the structure, and return
iterator_range<iterator> instead.
Reviewers: amccarth, jhenderson, clayborg
Subscribers: JosephTremoulet, zturner, markmentovai, lldb-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68210
llvm-svn: 374051
Tim Northover remarked that the added patterns for fmls fp16
produce wrong code in case the fsub instruction has a
multiplication as its first operand, i.e., all the patterns FMLSv*_OP1:
> define <8 x half> @test_FMLSv8f16_OP1(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
> ; CHECK-LABEL: test_FMLSv8f16_OP1:
> ; CHECK: fmls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h
> entry:
>
> %mul = fmul fast <8 x half> %c, %b
> %sub = fsub fast <8 x half> %mul, %a
> ret <8 x half> %sub
> }
>
> This doesn't look right to me. The exact instruction produced is "fmls
> v0.8h, v2.8h, v1.8h", which I think calculates "v0 - v2*v1", but the
> IR is calculating "v2*v1-v0". The equivalent <4 x float> code also
> doesn't emit an fmls.
This patch generates an fmla and negates the value of the operand2 of the fsub.
Inspecting the pattern match, I found that there was another mistake in the
opcode to be selected: matching FMULv4*16 should generate FMLSv4*16
and not FMLSv2*32.
Tested on aarch64-linux with make check-all.
Differential Revision: https://reviews.llvm.org/D67990
llvm-svn: 374044
* Adds a TypeSize struct to represent the known minimum size of a type
along with a flag to indicate that the runtime size is a integer multiple
of that size
* Converts existing size query functions from Type.h and DataLayout.h to
return a TypeSize result
* Adds convenience methods (including a transparent conversion operator
to uint64_t) so that most existing code 'just works' as if the return
values were still scalars.
* Uses the new size queries along with ElementCount to ensure that all
supported instructions used with scalable vectors can be constructed
in IR.
Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen
Reviewed By: rovka, sdesmalen
Differential Revision: https://reviews.llvm.org/D53137
llvm-svn: 374042
Summary:
When getValueInMiddleOfBlock happens to be called for a basic block
that has no incoming value at all, an IMPLICIT_DEF is inserted in that
block via GetValueAtEndOfBlockInternal. This IMPLICIT_DEF must be at
the top of its basic block or it will likely not reach the use that
the caller intends to insert.
Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204
Reviewers: arsenm, rampitec
Subscribers: jvesely, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68183
llvm-svn: 374040
Before this patch, loads and stores were only tracked by their corresponding
queues in the LSUnit from dispatch until execute stage. In practice we should be
more conservative and assume that memory opcodes leave their queues at
retirement stage.
Basically, loads should leave the load queue only when they have completed and
delivered their data. We conservatively assume that a load is completed when it
is retired. Stores should be tracked by the store queue from dispatch until
retirement. In practice, stores can only leave the store queue if their data can
be written to the data cache.
This is mostly a mechanical change. With this patch, the retire stage notifies
the LSUnit when a memory instruction is retired. That would triggers the release
of LDQ/STQ entries. The only visible change is in memory tests for the bdver2
model. That is because bdver2 is the only model that defines the load/store
queue size.
This patch partially addresses PR39830.
Differential Revision: https://reviews.llvm.org/D68266
llvm-svn: 374034
Support for tracking registers that forward function parameters into the
following function frame. For now we only support cases when parameter
is forwarded through single register.
Reviewers: aprantl, vsk, t.p.northover
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D66953
llvm-svn: 374033
Summary: LoopRotate is a loop pass and SE should always be available.
Reviewers: anemet, asbirlea
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D68573
llvm-svn: 374026
Based on the discussion in
http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the
conclusion was reached that the ARM backend should produce vcmp instead
of vcmpe instructions by default, i.e. not be producing an Invalid
Operation exception when either arguments in a floating point compare
are quiet NaNs.
In the future, after constrained floating point intrinsics for floating
point compare have been introduced, vcmpe instructions probably should
be produced for those intrinsics - depending on the exact semantics
they'll be defined to have.
This patch logically consists of the following parts:
- Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and
http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which
implemented fine-tuning for when to produce vcmpe (i.e. not do it for
equality comparisons). The complexity introduced by those patches
isn't needed anymore if we just always produce vcmp instead. Maybe
these patches need to be reintroduced again once support is needed to
map potential LLVM-IR constrained floating point compare intrinsics to
the ARM instruction set.
- Simply select vcmp, instead of vcmpe, see simple changes in
lib/Target/ARM/ARMInstrVFP.td
- Adapt lots of tests that tested for vcmpe (instead of vcmp). For all
of these test, the intent of what is tested for isn't related to
whether the vcmp should produce an Invalid Operation exception or not.
Fixes PR43374.
Differential Revision: https://reviews.llvm.org/D68463
llvm-svn: 374025
Several LLVM tools write text files/streams without using OF_Text.
This can cause problems on platforms which distinguish between
text and binary output. This PR adds the OF_Text flag for the
following tools:
- llvm-dis
- llvm-dwarfdump
- llvm-mca
- llvm-mc (assembler files only)
- opt (assembler files only)
- RemarkStreamer (used e.g. by opt)
Reviewers: rnk, vivekvpandya, Bigcheese, andreadb
Differential Revision: https://reviews.llvm.org/D67696
llvm-svn: 374024
Summary:
Implement support for hexadecimal escape sequences to match how GNU 'as'
handles them. I.e., read all hexadecimal characters and truncate to the
lower 16 bits.
Reviewers: nickdesaulniers, jcai19
Subscribers: llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68598
llvm-svn: 374018
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
llvm-svn: 374017
The initialization logic has become part of the Attributor but the
patches that introduced these calls here were in development when the
transition happened.
We also now clean up (undefine) the macros used to create attributes.
llvm-svn: 373987
Local linkage is internal or private, and private is a specialization of
internal, so either is fine for all our "local linkage" queries.
llvm-svn: 373986
Summary:
When we iterate over uses of functions and expect them to be call sites,
we now use abstract call sites to allow callback calls.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, hfinkel, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67871
llvm-svn: 373985
Gather instructions can use i32 or i64 elements for indices. If
the index is zero extended from a type smaller than i32 to i64, we
can shrink the extend to just extend to i32.
llvm-svn: 373982
When the target option GuaranteedTailCallOpt is specified, calls with
the fastcc calling convention will be transformed into tail calls if
they are in tail position. This diff adds a new calling convention,
tailcc, currently supported only on X86, which behaves the same way as
fastcc, except that the GuaranteedTailCallOpt flag does not need to
enabled in order to enable tail call optimization.
Patch by Dwight Guth <dwight.guth@runtimeverification.com>!
Reviewed By: lebedev.ri, paquette, rnk
Differential Revision: https://reviews.llvm.org/D67855
llvm-svn: 373976
Summary:
There was a bug when computing the number of unwind destination
mismatches in CFGStackify. When there are many mismatched calls that
share the same (original) destination BB, they have to be counted
separately.
This also fixes a typo and runs `fixUnwindMismatches` only when the wasm
exception handling is enabled. This is to prevent unnecessary
computations and does not change behavior.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68552
llvm-svn: 373975
Summary:
Previously, `WebAssembly::mayThrow()` assumed all inputs are global
addresses. But when intrinsics, such as `memcpy`, `memmove`, or `memset`
are lowered to external symbols in instruction selection and later
emitted as library calls. And these functions don't throw.
This patch adds handling to those memory intrinsics to `mayThrow`
function. But while most of libcalls don't throw, we can't guarantee all
of them don't throw, so currently we conservatively return true for all
other external symbols.
I think a better way to solve this problem is to embed 'nounwind' info
in `TargetLowering::CallLoweringInfo`, so that we can access the info
from the backend. This will also enable transferring 'nounwind'
properties of LLVM IR instructions. Currently we don't transfer that
info and we can only access properties of callee functions, if the
callees are within the module. Other targets don't need this info in the
backend because they do all the processing before isel, but it will help
us because that info will reduce code size increase in fixing unwind
destination mismatches in CFGStackify.
But for now we return false for these memory intrinsics and true for all
other libcalls conservatively.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68553
llvm-svn: 373967
This can come up in Bit Stream abstractions.
The pattern looks big/scary, but it can't be simplified any further.
It only is so simple because a number of my preparatory folds had
happened already (shift amount reassociation / shift amount
reassociation in bit test, sign bit test detection).
Highlights:
* There are two main flavors: https://rise4fun.com/Alive/zWi
The difference is add vs. sub, and left-shift of -1 vs. 1
* Since we only change the shift opcode,
we can preserve the exact-ness: https://rise4fun.com/Alive/4u4
* There can be truncation after high-bit-extraction:
https://rise4fun.com/Alive/slHc1 (the main pattern i'm after!)
Which means that we need to ignore zext of shift amounts and of NBits.
* The sign-extending magic can be extended itself (in add pattern
via sext, in sub pattern via zext. not the other way around!)
https://rise4fun.com/Alive/NhG
(or those sext/zext can be sinked into `select`!)
Which again means we should pay attention when matching NBits.
* We can have both truncation of extraction and widening of magic:
https://rise4fun.com/Alive/XTw
In other words, i don't believe we need to have any checks on
bitwidths of any of these constructs.
This is worsened in general by the fact that we may have `sext` instead
of `zext` for shift amounts, and we don't yet canonicalize to `zext`,
although we should. I have not done anything about that here.
Also, we really should have something to weed out `sub` like these,
by folding them into `add` variant.
https://bugs.llvm.org/show_bug.cgi?id=42389
llvm-svn: 373964
True, no test coverage is being added here. But those non-canonical
predicates that are already handled here already have no test coverage
as far as i can tell. I tried to add tests for them, but all the patterns
already get handled elsewhere.
llvm-svn: 373962
Summary:
Currently, we pre-check whether we need to produce a mask or not.
This involves some rather magical constants.
I'd like to extend this fold to also handle the situation
when there's also a `trunc` before outer shift.
That will require another set of magical constants.
It's ugly.
Instead, we can just compute the mask, and check
whether mask is a pass-through (all-ones) or not.
This way we don't need to have any magical numbers.
This change is NFC other than the fact that we now compute
the mask and then check if we need (and can!) apply it.
Reviewers: spatel
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68470
llvm-svn: 373961
Summary:
When we do `ConstantExpr::getZExt()`, that "extends" `undef` to `0`,
which means that for patterns a/b we'd assume that we must not produce
any bits for that channel, while in reality we simply didn't care
about that channel - i.e. we don't need to mask it.
Reviewers: spatel
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68239
llvm-svn: 373960
Start manually writing a table to get the subreg index. TableGen
should probably generate this, but I'm not sure what it looks like in
the arbitrary case where subregisters are allowed to not fully cover
the super-registers.
llvm-svn: 373947
At minimum handle the s64 insert type, which are emitted in real cases
during legalization.
We really need TableGen to emit something to emit something like the
inverse of composeSubRegIndices do determine the subreg index to use.
llvm-svn: 373938
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.
Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.
llvm-svn: 373937
Doing this makes MSVC complain that `empty(someRange)` could refer to
either C++17's std::empty or LLVM's llvm::empty, which previously we
avoided via SFINAE because std::empty is defined in terms of an empty
member rather than begin and end. So, switch callers over to the new
method as it is added.
https://reviews.llvm.org/D68439
llvm-svn: 373935
r369697 changed the behavior of stripPointerCasts to no longer include
aliases. However, the code in CGDeclCXX.cpp's createAtExitStub counted
on the looking through aliases to properly set the calling convention of
a call.
The result of the change was that the calling convention mismatch of the
call would be replaced with a llvm.trap, causing a runtime crash.
Differential Revision: https://reviews.llvm.org/D68584
llvm-svn: 373929
After changing the remark serialization, we now pass StringRefs to the
serializer. We should use StringRef for StringBlockVal, to avoid
creating temporary objects, which then cause StringBlockVal.Value to
point to invalid memory.
Reviewers: thegameg, anemet
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D68571
llvm-svn: 373923
Previously ExtBinary profile format only supports compression using zlib for
profile symbol list. In this patch, we extend the compression support to any
section. User can select some or all of the sections to compress. In an
experiment, for a 45M profile in ExtBinary format, compressing name table
reduced its size to 24M, and compressing all the sections reduced its size
to 11M.
Differential Revision: https://reviews.llvm.org/D68253
llvm-svn: 373914
This ensures that frame-based unwinding will continue to work when
calling a noreturn function; there is not much use having the caller's
frame pointer saved if you don't also have the caller's program counter.
Patch by James Clarke.
Differential Revision: https://reviews.llvm.org/D68542
llvm-svn: 373907
J/JAL/JALX/JALS are absolute branches, but stay within the current
256 MB-aligned region, so we must include the high bits of the
instruction address when calculating the branch target.
Patch by James Clarke.
Differential Revision: https://reviews.llvm.org/D68548
llvm-svn: 373906
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.
Reviewed by: andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by: craig.topper
Differential Revision: https://reviews.llvm.org/D64746
llvm-svn: 373900
It broke MC/AsmParser/directive_ascii.s on all bots:
Assertion failed: (Index < Length && "Invalid index!"), function operator[],
file ../../llvm/include/llvm/ADT/StringRef.h, line 243.
llvm-svn: 373898
Summary:
Implement support for hexadecimal escape sequences to match how GNU 'as'
handles them. I.e., read all hexadecimal characters and truncate to the
lower 16 bits.
Reviewers: nickdesaulniers
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68483
llvm-svn: 373888
This reverts SVN r373833, as it caused a failed assert "Non-zero loop
cost expected" on building numerous projects, see PR43582 for details
and reproduction samples.
llvm-svn: 373882
Move the erasing and iterator updating inside to match the
other slow LEA function.
I've adapted code from optTwoAddrLEA and basically rebuilt the
implementation here. We do lose the kill flags now just like
optTwoAddrLEA. This runs late enough in the pipeline that
shouldn't really be a problem.
llvm-svn: 373877
If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.
This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.
Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper.
Fixes PR43217
Differential Revision: https://reviews.llvm.org/D68544
llvm-svn: 373871
This is patch aims to group together the `CRNotPat` multi class instantiations
within the `PPCInstrInfo.td` file.
Integer instantiations of the multi class are grouped together into a section,
and the floating point patterns are separated into its own section.
Differential Revision: https://reviews.llvm.org/D67975
llvm-svn: 373869
Replaces setTargetShuffleZeroElements with getTargetShuffleAndZeroables which reports the Zeroable elements but doesn't merge them into the decoded target shuffle mask (the merging has been moved up into getTargetShuffleInputs until we can get rid of it entirely).
This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle mask isn't adjusted by its source inputs but instead we cache them in a parallel Zeroable mask.
llvm-svn: 373867
Summary:
The default legalization for v16i64->v16i8 tries to create a multiple stage truncate concatenating after each stage and truncating again. But avx512 implements truncates with multiple uops. So it should be better to truncate all the way to the desired element size and then concatenate the pieces using unpckl instructions. This minimizes the number of 2 uop truncates. The unpcks are all single uop instructions.
I tried to handle this by just custom splitting the v16i64->v16i8 shuffle. And hoped that the DAG combiner would leave the two halves in the state needed to make D68374 do the job for each half. This worked for the first half, but the second half got messed up. So I've implemented custom handling for v8i64->v8i8 when v8i64 needs to be split to produce the VTRUNCs directly.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68428
llvm-svn: 373864
Summary: The VSELECT splitting code tries to split a setcc input as well. But on avx512 where mask registers are well supported it should be better to just split the mask and use a single compare.
Reviewers: RKSimon, spatel, efriedma
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68359
llvm-svn: 373863
Summary: The assertion in getLoopGuardBranch can be a 'return nullptr'
under if condition.
Authored By: DTharun
Reviewer: Whitney, fhahn
Reviewed By: Whitney, fhahn
Subscribers: fhahn, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D66084
llvm-svn: 373857
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68250
llvm-svn: 373850
We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly.
This allows us to remove createTargetShuffleMask.
This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask.
llvm-svn: 373846
As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop.
This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended.
Differential Revision: https://reviews.llvm.org/D68226
llvm-svn: 373834
I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708https://bugs.llvm.org/show_bug.cgi?id=43146
We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).
Here, we add a scalar cost model adjustment with a conservative pattern match and cost
summation for a multi-instruction sequence that can probably be reduced later.
This should prevent SLP from creating a vector reduction unless that sequence is
extremely cheap.
In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:
movbe rax, qword ptr [rdi]
or:
mov rax, qword ptr [rdi]
Not some (half) vector monstrosity as we currently do using SLP:
vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
movzx eax, byte ptr [rdi]
movzx ecx, byte ptr [rdi + 5]
shl rcx, 40
movzx edx, byte ptr [rdi + 6]
shl rdx, 48
or rdx, rcx
movzx ecx, byte ptr [rdi + 7]
shl rcx, 56
or rcx, rdx
or rcx, rax
vextracti128 xmm1, ymm0, 1
vpor xmm0, xmm0, xmm1
vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1]
vpor xmm0, xmm0, xmm1
vmovq rax, xmm0
or rax, rcx
vzeroupper
ret
Differential Revision: https://reviews.llvm.org/D67841
llvm-svn: 373833
Rename some variables to match lowerShuffleAsRepeatedMaskAndLanePermute - prep work toward adding some equivalent sublane functionality.
llvm-svn: 373832
The motivation is to reuse the key value parsing logic here to
parse instance specific pass options within the context of MLIR.
The primary functionality exposed is the "," splitting for
arrays and the logic for properly handling duplicate definitions
of a single flag.
Patch by: Parker Schuh <parkers@google.com>
Differential Revision: https://reviews.llvm.org/D68294
llvm-svn: 373815
This is an omission in rL371441. Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block.
Included test case is how I spotted this. We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with. I'm sure it showed up a bunch of other ways as well.
Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode. Finding this with testing would have been "unpleasant".
llvm-svn: 373814
This reverts r371177 (git commit f879c68755)
It caused PR43566 by removing empty, address-taken MachineBasicBlocks.
Such blocks may have references from blockaddress or other operands, and
need more consideration to be removed.
See the PR for a test case to use when relanding.
llvm-svn: 373805
We do indeed already get it right in some cases, but only transitively,
with one-use restrictions. Since we only need to produce a single
comparison, it makes sense to match the pattern directly:
https://rise4fun.com/Alive/kPg
llvm-svn: 373802
Initially (D65380) i believed that if we have rightshift-trunc-rightshift,
we can't do any folding. But as it usually happens, i was wrong.
https://rise4fun.com/Alive/GEwhttps://rise4fun.com/Alive/gN2O
In https://bugs.llvm.org/show_bug.cgi?id=43564 we happen to have
this very sequence, of two right shifts separated by trunc.
And "just" so that happens, we apparently can fold the pattern
if the total shift amount is either 0, or it's equal to the bitwidth
of the innermost widest shift - i.e. if we are left with only the
original sign bit. Which is exactly what is wanted there.
llvm-svn: 373801
Outlining from noreturn functions doesn't do the correct thing right now. The
outliner should respect that the caller is marked noreturn. In the event that
we have a noreturn function, and the outlined code is in tail position, the
outliner will not see that the outlined function should be tail called. As a
result, you end up with a regular call containing a return.
Fixing this requires that we check that all candidates live inside noreturn
functions. So, for the sake of correctness, don't outline from noreturn
functions right now.
Add machine-outliner-noreturn.mir to test this.
llvm-svn: 373791
InstrEmitter's virtual register handling assumes that clones are emitted
after the cloned node. Make sure this assumption actually holds.
Fixes a "Node emitted out of order - early" assertion on the testcase.
This is probably a very rare case to actually hit in practice; even
without the explicit edge, the scheduler will usually end up scheduling
the nodes in the expected order due to other constraints.
Differential Revision: https://reviews.llvm.org/D68068
llvm-svn: 373782
The immediate form of VPCMP can represent these completely. The
vpcmpgt/eq are just shorter encodings.
This patch removes the isel patterns and just swaps the opcodes
and removes the immediate in MCInstLower. This matches where we do
some other encodings tricks.
Removes over 10K bytes from the isel table.
Differential Revision: https://reviews.llvm.org/D68446
llvm-svn: 373766
We already do this for ISD::TRUNCATE, but we can do the same for X86ISD::VTRUNC
Differential Revision: https://reviews.llvm.org/D68432
llvm-svn: 373765
This is a trivial point fix. Terminator instructions aren't scheduled, so
we shouldn't expect to be able to remap them.
This doesn't affect Hexagon and PPC because their terminators are always
hardware loop backbranches that have no register operands.
llvm-svn: 373762
Darwin platforms need the frame register to always point at a valid record even
if it's not updated in a leaf function. Backtraces are more important than one
extra GPR.
llvm-svn: 373738
Rather than having a mixture of location-state shared between DBG_VALUEs
and VarLoc objects in LiveDebugValues, this patch makes VarLoc the
master record of variable locations. The refactoring means that the
transfer of locations from one place to another is always a performed by
an operation on an existing VarLoc, that produces another transferred
VarLoc. DBG_VALUEs are only created at the end of LiveDebugValues, once
all locations are known. As a plus, there is now only one method where
DBG_VALUEs can be created.
The test case added covers a circumstance that is now impossible to
express in LiveDebugValues: if an already-indirect DBG_VALUE is spilt,
previously it would have been restored-from-spill as a direct DBG_VALUE.
We now don't lose this information along the way, as VarLocs always
refer back to the "original" non-transfer DBG_VALUE, and we can always
work out whether a location was "originally" indirect.
Differential Revision: https://reviews.llvm.org/D67398
llvm-svn: 373727
When transfering variable locations from one place to another,
LiveDebugValues immediately creates a DBG_VALUE representing that
transfer. This causes trouble if the variable location should
subsequently be invalidated by a loop back-edge, such as in the added
test case: the transfer DBG_VALUE from a now-invalid location is used
as proof that the variable location is correct. This is effectively a
self-fulfilling prophesy.
To avoid this, defer the insertion of transfer DBG_VALUEs until after
analysis has completed. Some of those transfers are still sketchy, but
we don't propagate them into other blocks now.
Differential Revision: https://reviews.llvm.org/D67393
llvm-svn: 373720
Summary:
This patch fixes a potential aliasing problem in InstClassEnum,
where local values were mixed with machine opcodes.
Introducing InstSubclass will keep them separate and help extending
InstClassEnum with other instruction types (e.g. MIMG) in the future.
This patch also makes getSubRegIdxs() more concise.
Reviewers: nhaehnle, arsenm, tstellar
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68384
llvm-svn: 373699
In the Atom model the symbols, content and relocations of a relocatable object
file are represented as a graph of atoms, where each Atom represents a
contiguous block of content with a single name (or no name at all if the
content is anonymous), and where edges between Atoms represent relocations.
If more than one symbol is associated with a contiguous block of content then
the content is broken into multiple atoms and layout constraints (represented by
edges) are introduced to ensure that the content remains effectively contiguous.
These layout constraints must be kept in mind when examining the content
associated with a symbol (it may be spread over multiple atoms) or when applying
certain relocation types (e.g. MachO subtractors).
This patch replaces the Atom model in JITLink with a blocks-and-symbols model.
The blocks-and-symbols model represents relocatable object files as bipartite
graphs, with one set of nodes representing contiguous content (Blocks) and
another representing named or anonymous locations (Symbols) within a Block.
Relocations are represented as edges from Blocks to Symbols. This scheme
removes layout constraints (simplifying handling of MachO alt-entry symbols,
and hopefully ELF sections at some point in the future) and simplifies some
relocation logic.
llvm-svn: 373689
We would like to split the SP adjustment to reduce the instructions in
prologue and epilogue as the following case. In this way, the offset of
the callee saved register could fit in a single store.
add sp,sp,-2032
sw ra,2028(sp)
sw s0,2024(sp)
sw s1,2020(sp)
sw s3,2012(sp)
sw s4,2008(sp)
add sp,sp,-64
Differential Revision: https://reviews.llvm.org/D68011
llvm-svn: 373688
Without this we can encounter link errors or incorrect behaviour
at runtime as a result of the wrong function being referenced.
Differential Revision: https://reviews.llvm.org/D67945
llvm-svn: 373678
As discussed on llvm-dev and:
https://bugs.llvm.org/show_bug.cgi?id=43542
...we have transforms that assume shift operations are legal and transforms to
use them are profitable, but that may not hold for simple targets.
In this case, the MSP430 target custom lowers shifts by repeating (many)
simpler/fixed ops. That can be avoided by keeping this code as setcc/select.
Differential Revision: https://reviews.llvm.org/D68397
llvm-svn: 373666
Move the write-if-changed logic behind a flag and don't pass it
with the MSVC generator. msbuild doesn't have a restat optimization,
so not doing write-if-change there doesn't have a cost, and it
should fix whatever causes PR43385.
llvm-svn: 373664
Summary:
This is follow up patch of https://reviews.llvm.org/D67595.
Adjust naming and the Commutable operands for additional patterns
to make it easier to read.
The testcase update also show that we can save some unecessary fmr as
well.
Reviewers: #powerpc, steven.zhang, hfinkel, nemanjai
Reviewed By: #powerpc, nemanjai
Subscribers: wuzish, hiraditya, kbarton, MaskRay, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68112
llvm-svn: 373652
This patch recognizes the shuffle pattern we get from a
v8i64->v8i8 truncate when v8i64 isn't a legal type.
With VLX we can use two VTRUNCs, unpckldq, and a insert_subvector.
Diffrential Revision: https://reviews.llvm.org/D68374
llvm-svn: 373645
We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly.
This only leaves one user of createTargetShuffleMask which we can hopefully get rid of in a similar manner.
This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask.
llvm-svn: 373641
Register indexing 64-bit elements is possible on the SALU, but not the
VALU. Handle splitting this into two 32-bit indexes. Extend waterfall
loop handling to allow moving a range of instructions.
llvm-svn: 373638
We can still do a waterfall loop over the index if using a VGPR to
index an SGPR. The result will still be a VGPR, but we can avoid the
wide copy of the source register to a VGPR.
llvm-svn: 373637
Summary:
This adds a pre-pass to this optimization that scans through the basic
block and generates lists of mergeable instructions with one list per unique
address.
In the optimization phase instead of scanning through the basic block for mergeable
instructions, we now iterate over the lists generated by the pre-pass.
The decision to re-optimize a block is now made per list, so if we fail to merge any
instructions with the same address, then we do not attempt to optimize them in
future passes over the block. This will help to reduce the time this pass
spends re-optimizing instructions.
In one pathological test case, this change reduces the time spent in the
SILoadStoreOptimizer from 0.2s to 0.03s.
This restructuring will also make it possible to implement further solutions in
this pass, because we can now add less expensive checks to the pre-pass and
filter instructions out early which will avoid the need to do the expensive
scanning during the optimization pass. For example, checking for adjacent
offsets is an inexpensive test we can move to the pre-pass.
Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65961
llvm-svn: 373630
The Hexagon code assumes there's no existing terminator when inserting its
trip count condition check.
This causes swp-stages5.ll to break. The generated code looks good to me,
it is likely a permutation. I have disabled the new codegen path to keep
everything green and will investigate along with the other 3-4 tests
that have different codegen.
Fixes expensive-checks build.
llvm-svn: 373629
During studying support for bitfield, I found an issue for
an example like the one in test offset-reloc-middle-chain.ll.
struct t1 { int c; };
struct s1 { struct t1 b; };
struct r1 { struct s1 a; };
#define _(x) __builtin_preserve_access_index(x)
void test1(void *p1, void *p2, void *p3);
void test(struct r1 *arg) {
struct s1 *ps = _(&arg->a);
struct t1 *pt = _(&arg->a.b);
int *pi = _(&arg->a.b.c);
test1(ps, pt, pi);
}
The IR looks like:
%0 = llvm.preserve.struct.access(base, ...)
%1 = llvm.preserve.struct.access(%0, ...)
%2 = llvm.preserve.struct.access(%1, ...)
using %0, %1 and %2
In this case, we need to generate three relocatiions
corresponding to chains: (%0), (%0, %1) and (%0, %1, %2).
After collecting all the chains, the current implementation
process each chain (in a map) with code generation sequentially.
For example, after (%0) is processed, the code may look like:
%0 = base + special_global_variable
// llvm.preserve.struct.access(base, ...) is delisted
// from the instruction stream.
%1 = llvm.preserve.struct.access(%0, ...)
%2 = llvm.preserve.struct.access(%1, ...)
using %0, %1 and %2
When processing chain (%0, %1), the current implementation
tries to visit intrinsic llvm.preserve.struct.access(base, ...)
to get some of its properties and this caused segfault.
This patch fixed the issue by remembering all necessary
information (kind, metadata, access_index, base) during
analysis phase, so in code generation phase there is
no need to examine the intrinsic call instructions.
This also simplifies the code.
Differential Revision: https://reviews.llvm.org/D68389
llvm-svn: 373621
These old aliases were renamed, but are still used by some projects (eg newlib).
Differential Revision: https://reviews.llvm.org/D68392
llvm-svn: 373618
It allows using "Size" with or without "Content" in YAML descriptions of
SHT_LLVM_ADDRSIG sections.
Differential revision: https://reviews.llvm.org/D68334
llvm-svn: 373610
Summary: This PR creates a utility class called ValueProfileCollector that tells PGOInstrumentationGen and PGOInstrumentationUse what to value-profile and where to attach the profile metadata. It then refactors logic scattered in PGOInstrumentation.cpp into two plugins that plug into the ValueProfileCollector.
Authored By: Wael Yehia <wyehia@ca.ibm.com>
Reviewer: davidxl, tejohnson, xur
Reviewed By: davidxl, tejohnson, xur
Subscribers: llvm-commits
Tag: #llvm
Differential Revision: https://reviews.llvm.org/D67920
Patch By Wael Yehia <wyehia@ca.ibm.com>
llvm-svn: 373601