Patchpoint instructions have operands which is actually zero cost
(or the same as register) to use the value from the stack.
In terms of statistic it makes same to separate them.
Move from computation instructions related to stack spill/reload to
number of stack slot referenced.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100016
Statepoint instruction has a deopt section which is actually live-through the call.
Currently this is handled by special post pass after RA - fixup-statepoint-caller-saved.
This change teaches Greedy RA that if segment of live interval is ended with statepoint
instruction and its reg is used in deopt bundle then this live interval interferes regmask of this statepoint
and as a result caller-saved register cannot be assigned to this live interval.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100296
M68kAsmParser uses `llvm::getTheM68kTarget` from M68kInfo, therefore we
should put M68kInfo as its direct dependency. Otherwise the build will
fail when building LLVM libraries as shared objects (building LLVM
libraries statically won't have this problem though).
This is a follow up of D99010. We didn't consider the live range of shape registers when hoist ldtilecfg. There maybe risks, e.g. we happen to insert it to an invalid range of some registers and get unexpected error.
This patch fixes this problem by storing the value to corresponding stack place of ldtilecfg after all its definition immediately.
This patch also fix a problem in previous code: If we don't have a ldtilecfg which dominates all AMX instructions, we cannot initialize shapes for other ldtilecfg.
There're still some optimization points left. E.g. eliminate unused mov instructions, break the def-use dependency before RA etc.
Reviewed By: LuoYuanke, xiangzhangllvm
Differential Revision: https://reviews.llvm.org/D99966
The VSX tablegen file has some rather eggregious uses of
COPY_TO_REGCLASS even in situations where it needs to use
SUBREG_TO_REG. While this produces correct code, it often doesn't
allow the register coalescer to coalesce copies and the resulting
code ends up being suboptimal. This patch just changes over
patterns that should use SUBREG_TO_REG.
This fixes the resolution of Rec10.Zero in ListSlices.td.
As part of this, correct the definition of complete for ListInit such that
it's complete iff all the elements in the list are complete rather than
always being complete regardless of the elements. This is the reason
Rec10.TwoFive from ListSlices.td previously resolved despite being
incomplete like Rec10.Zero was
Depends on D100247
Reviewed By: Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D100253
- Add support for HLASM style integers. These are the decimal integers [0-9].
- HLASM does not support the additional prefixed integers like, `0b`, `0x`, octal integers and Masm style integers.
- To achieve this, a field `LexHLASMStyleIntegers` (similar to the `LexMasmStyleIntegers` field) is introduced in `MCAsmLexer.h` as well as a corresponding setter.
Note: This field could also go into MCAsmInfo.h. I used the previous precedent set by the `LexMasmIntegers` field.
Depends on https://reviews.llvm.org/D99286
Reviewed By: epastor
Differential Revision: https://reviews.llvm.org/D99374
Currently, for any extern variable, if it doesn't have
section attribution, it will be put into a default ".extern"
btf DataSec. The initial design is to put every extern
variable in a DataSec so libbpf can use it.
But later on, libbpf actually requires extern variables
to put into special sections, e.g., ".kconfig", ".ksyms", etc.
so they can be used properly based on section name.
Andrii mentioned since ".extern" variables are
not actually used, it makes sense to remove it from
the compiler so libbpf does not need to deal with it,
esp. for static linking. The BTF for these extern variables
is still generated.
With this patch, I tested kernel selftests/bpf and all tests
passed. Indeed, removing ".extern" DataSec seems having no
impact.
Differential Revision: https://reviews.llvm.org/D100392
I've run into some cases where a large fraction of compile-time is
spent invalidating SCEV. One of the causes is forgetLoop(), which
walks all values that are def-use reachable from the loop header
phis. When invalidating a topmost loop, that might be close to all
values in a function. Additionally, it's fairly common for there to
not actually be anything to invalidate, but we'll still be performing
this walk again and again.
My first thought was that we don't need to continue walking the uses
if the current value doesn't have a SCEV expression. However, this
isn't quite right, because SCEV construction can skip over values
(e.g. for a chain of adds, we might only create a SCEV expression
for the final value).
What this patch does instead is to only walk the (full) def-use chain
of loop phis that have a SCEV expression. If there's no expression
for a phi, then we also don't have any dependent expressions to
invalidate.
Differential Revision: https://reviews.llvm.org/D100264
We already allow the comparison of the upper bits of 'IsAllOf' (allbits) patterns, but we can safely compare the known zero bits for 'IsAnyOf' (zerobits) patterns as well.
This fixes an issues where we are comparing a type wide than the number of vector elements, which avoids a regression mentioned in rGbaadbe04bf75.
- In the SystemZAsmParser, there will be a few queries to the type of dialect it is (AD_ATT, AD_HLASM) in future patches.
- It would be nice to have two small helper functions `isParsingATT()` and `isParsingHLASM()`
- Putting this as a separate smaller patch allows us to remove its definitions from other dependent patches.
Reviewed By: uweigand, abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D99891
For a global weak symbol defined as below:
char g __attribute__((weak)) = 2;
LLVM generates an allocated global with WeakAnyLinkage,
for which BPF backend generates proper BTF info.
For the above example, if a modifier "const" is added like
const char g __attribute__((weak)) = 2;
LLVM generates an allocated global with WeakODRLinkage,
for which BPF backend didn't generate any BTF as it
didn't handle WeakODRLinkage.
This patch addes support for WeakODRLinkage and proper
BTF info can be generated for weak symbol defined with
"const" modifier.
Differential Revision: https://reviews.llvm.org/D100362
- Currently, MCAsmInfo provides a CommentString attribute, that various targets can set, so that the AsmLexer can appropriately lex a string as a comment based on the set value of the attribute.
- However, AsmLexer also supports a few additional comment syntaxes, in addition to what's specified as a CommentString attribute. This includes regular C-style block comments (/* ... */), regular C-style line comments (// .... ) and #. While I'm not sure as to why this behaviour exists, I am assuming it does to maintain backward compatibility with GNU AS (see https://sourceware.org/binutils/docs/as/Comments.html#Comments for reference)
For example:
Consider a target which sets the CommentString attribute to '*'.
The following strings are all lexed as comments.
```
"# abc" -> comment
"// abc" -> comment
"/* abc */ -> comment
"* abc" -> comment
```
- In HLASM however, only "*" is accepted as a comment string, and nothing else.
- To achieve this, an additional attribute (`AllowAdditionalComments`) has been added to MCAsmInfo. If this attribute is set to false, then only the string specified by the CommentString attribute is used as a possible comment string to be lexed by the AsmLexer. The regular C-style block comments, line comments and "#" are disabled. As a final note, "#" will still be treated as a comment, if the CommentString attribute is set to "#".
Depends on https://reviews.llvm.org/D99277
Reviewed By: abhina.sreeskantharajan, myiwanch
Differential Revision: https://reviews.llvm.org/D99286
The IR stack protector pass must insert stack checks before the call instead of
between it and the return.
Similarly, SDAG one should recognize that ADJCALLFRAME instructions could be
part of the terminal sequence of a tail call. In this case because such call
frames cannot be nested in LLVM the stack protection code must skip over the
whole sequence (or risk clobbering argument registers).
This patch adds attributes corresponding to
implicits to functions/kernels if
1. it has an indirect call OR
2. it's address is taken.
Once such attributes are set, rest of the codegen would work
out-of-box for indirect calls. This patch eliminates
the potential overhead -fixed-abi imposes even though indirect functions
calls are not used.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D99347
As a side-effect of the change to default HoistCommonInsts to false
early in the pipeline, we fail to convert conditional branch & phis to
selects early on, which prevents vectorization for loops that contain
conditional branches that effectively are selects (or if the loop gets
vectorized, it will get vectorized very inefficiently).
This patch updates SimplifyCFG to perform hoisting if the only
instruction in both BBs is an equal branch. In this case, the only
additional instructions are selects for phis, which should be cheap.
Even though we perform hoisting, the benefits of this kind of hoisting
should by far outweigh the negatives.
For example, the loop in the code below will not get vectorized on
AArch64 with the current default, but will with the patch. This is a
fundamental pattern we should definitely vectorize. Besides that, I
think the select variants should be easier to use for reasoning across
other passes as well.
https://clang.godbolt.org/z/sbjd8Wshx
```
double clamp(double v) {
if (v < 0.0)
return 0.0;
if (v > 6.0)
return 6.0;
return v;
}
void loop(double* X, double *Y) {
for (unsigned i = 0; i < 20000; i++) {
X[i] = clamp(Y[i]);
}
}
```
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D100329
This is a work-in-progress implementation of an assembler for M68k.
Outstanding work:
- Updating existing tests assembly syntax
- Writing new tests for the assembler (and disassembler)
I've left those until there's consensus that this approach is okay (I hope that's okay!).
Questions I'm aware of:
- Should this use Motorola or gas syntax? (At the moment it uses Motorola syntax.)
- The disassembler produces a table at runtime for disassembly generated from the code beads. Is this okay? (This is less than ideal but as I mentioned in my llvm-dev post, it's quite complicated to write a table-gen parser for code beads.)
Depends on D98519
Depends on D98532
Depends on D98534
Depends on D98535
Depends on D98536
Differential Revision: https://reviews.llvm.org/D98537
Prep work for adding intrinsics in the future.
Left an assert that the input is constant in ReplaceNodeResults,
as the intrinsic shouldn't go through that path.
We should consider the feeder user number when we do reverse memory
operation transformation. Otherwise, we may get negative impact.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D100166
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
Jump threading can replace select then unconditional branch with
conditional branch, but when doing so loses debug info.
This destructive transform is eventually leading to a failed Verifier
run during full LTO builds of the Linux kernel with CFI and KCOV
enabled, as reported in PR39531.
ModuleSanitizerCoveragePass will insert calls to
__sanitizer_cov_trace_pc, and sometimes split critical edges,
using whatever debug info may or may not exist for the branch for
the added libcall. Since we can inline calls to
__sanitizer_cov_trace_pc due to LTO, this can lead to the error
observed in PR39531 when the debug info isn't propagated to
the libcall, because of prior destructive transforms that failed to
retain debug info.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D100137
Turning on -fstrict-vtable-pointers in Chrome caused an extra global
initializer. Turns out that a llvm.strip.invariant.group intrinsic was
causing GlobalOpt to fail to step through some simple code.
We can treat *.invariant.group uses as simply their operand.
Value::stripPointerCastsForAliasAnalysis() does exactly this. This
should be safe because the Evaluator does not skip memory accesses due
to invariants or alias analysis.
However, we don't want to leak that we've stripped arbitrary pointer
casts to users of Evaluator, so we bail out if we evaluate a function to
any constant, since we may have looked through *.invariant.group calls
and aliasing pointers cannot be arbitrarily substituted.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D98843
Instruction::getDebugLoc can return an invalid DebugLoc. For such cases
where metadata was accidentally removed from the libcall insertion
point, simply insert a DILocation with line 0 scoped to the caller. When
we can inline the libcall, such as during LTO, then we won't fail a
Verifier check that all calls to functions with debug metadata
themselves must have debug metadata.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D100158
Currently the ARM backend only accpets constant expressions as the
immediate operand in load and store instructions. This allows the
result of symbolic expressions to be used in memory instructions. For
example,
0:
.space 2048
strb r2, [r0, #(.-0b)]
would be assembled into the following instructions.
strb r2, [r0, #2048]
This only adds support to ldr, ldrb, str, and strb in arm mode to
address the build failure of Linux kernel for now, but should facilitate
adding support to similar instructions in the future if the need arises.
Link:
https://github.com/ClangBuiltLinux/linux/issues/1329
Reviewed By: peter.smith, nickdesaulniers
Differential Revision: https://reviews.llvm.org/D98916
Retry of 330619a3a6 that includes a clang test update.
Original commit message:
If we run passes before lowering llvm.expect intrinsics to metadata,
then those passes have no way to act on the hints provided by llvm.expect.
SimplifyCFG is the known offender, and we made it smarter about profile
metadata in D98898 <https://reviews.llvm.org/D98898>.
In the motivating example from https://llvm.org/PR49336 , this means we
were ignoring the recommended method for a programmer to tell the compiler
that a compare+branch is expensive. This change appears to solve that case -
the metadata survives to the backend, the compare order is as expected in IR,
and the backend does not do anything to reverse it.
We make the same change to the old pass manager to keep things synchronized.
Differential Revision: https://reviews.llvm.org/D100213
-filter-print-funcs -print-changed was crashing after the filter func
was removed by a pass with
Assertion failed: After.find("*** IR Dump") == 0 && "Unexpected banner format."
We weren't printing the banner because when we have -filter-print-funcs,
we print each function separately, letting the print function filter out
unwanted functions.
Reviewed By: jamieschmeiser
Differential Revision: https://reviews.llvm.org/D100237
SROA can handle invariant group intrinsics, let the inliner know that
for better heuristics when the intrinsics are present.
This fixes size issues in a couple files when turning on
-fstrict-vtable-pointers in Chrome.
Reviewed By: rnk, mtrofin
Differential Revision: https://reviews.llvm.org/D100249
This patch removes all uses of `std::iterator`, which was deprecated in C++17.
While this isn't currently an issue while compiling LLVM, it's useful for those using LLVM as a library.
For some reason there're a few places that were seemingly able to use `std` functions unqualified, which no longer works after this patch. I've updated those places, but I'm not really sure why it worked in the first place.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D67586
This patch adds more optimized codegen for the above SETCC forms,
by matching the '.vi' vector forms when the immediate is a 5-bit signed
immediate plus 1. The immediate can be decremented and the corresponding
SET[U]LE or SET[U]GT forms can be matched.
This work was left as a TODO from D94168.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D100096
D24453 enabled libcalls simplication for ARM PCS. This may cause
caller/callee calling conventions mismatch in some situations such as
LTO. This patch makes instcombine aware that the compatible calling
conventions differences are benign (not emitting undef idom).
Differential Revision: https://reviews.llvm.org/D99773
If we run passes before lowering llvm.expect intrinsics to metadata,
then those passes have no way to act on the hints provided by llvm.expect.
SimplifyCFG is the known offender, and we made it smarter about profile
metadata in D98898.
In the motivating example from https://llvm.org/PR49336 , this means we
were ignoring the recommended method for a programmer to tell the compiler
that a compare+branch is expensive. This change appears to solve that case -
the metadata survives to the backend, the compare order is as expected in IR,
and the backend does not do anything to reverse it.
We make the same change to the old pass manager to keep things synchronized.
Differential Revision: https://reviews.llvm.org/D100213
Add a number of intrinsics which natively lower to MVE operations to the
lane interleaving pass, allowing it to efficiently interleave the lanes
of chucks of operations containing these intrinsics.
Differential Revision: https://reviews.llvm.org/D97293
Fixes the issues noted in PR48768, where the and/or/xor instruction had been promoted to avoid i8/i16 partial-dependencies, but the test against zero had not.
We can almost certainly relax this fold to work for any truncation, although it breaks a number of existing folds (notable movmsk folds which tend to rely on the truncate to determine the demanded bits/elts in the source vector).
There is a reverse combine in TargetLowering.SimplifySetCC so we must wait until after legalization before attempting this.
The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop.
This patch try to calculate the nearest point where all shapes of AMX registers are reachable.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D99010
FP16 to FP32 converts can be handled in MVE lane interleaving, much like
the sext/zext lowering we do. This expands the pass with fpext and
fptrunc handling, and basic fp operations allowing more efficient
lowering of fp vectors.
Differential Revision: https://reviews.llvm.org/D97292
The patch makes two updates to the arm-block-placement pass:
- Handle arbitrarily nested loops
- Extends the search (for t2WhileLoopStartLR) to the predecessor of the
preHeader.
Differential Revision: https://reviews.llvm.org/D99649
This reverts commit cca9b5985c.
Buildbot reported an error for CodeGen/AArch64/machine-combiner-fmul-dup.mir:
*** Bad machine code: Virtual register killed in block, but needed live out. ***
- function: indexed_2s
- basic block: %bb.0 entry (0x640fee8)
Virtual register %7 is used after the block.
*** Bad machine code: Virtual register defs don't dominate all uses. ***
- function: indexed_2s
- v. register: %7
LLVM ERROR: Found 2 machine code errors.
This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner.
FMUL_indexed is normally selected during instruction selection, but it
does not work in cases when VDUP and VMUL are in different basic
blocks.
Differential Revision: https://reviews.llvm.org/D99662
Spilling the fp or bp to scratch could overwrite VGPRs of inactive
lanes. Fix that by using only the active lanes of the scavenged VGPR.
This builds on the assumptions that
1. a function is never called with exec=0
2. lanes do not die in a function, i.e. exec!=0 in the function epilog
3. no new lanes are active when exiting the function, i.e. exec in the
epilog is a subset of exec in the prolog.
Differential Revision: https://reviews.llvm.org/D96869
Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot
determine if a VGPR is used in other lanes or not, so we need to save
all lanes of the VGPR. We even need to save the VGPR if it is marked as
dead.
The generated code depends on two things:
- Can we scavenge an SGPR to save EXEC?
- And can we scavenge a VGPR?
If we can scavenge an SGPR, we
- save EXEC into the SGPR
- set the needed lane mask
- save the temporary VGPR
- write the spilled SGPR into VGPR lanes
- save the VGPR again to the target stack slot
- restore the VGPR
- restore EXEC
If we were not able to scavenge an SGPR, we do the same operations, but
everytime the temporary VGPR is written to memory, we
- write VGPR to memory
- flip exec (s_not exec, exec)
- write VGPR again (previously inactive lanes)
Surprisingly often, we are able to scavenge an SGPR, even though we are
at the brink of running out of SGPRs.
Scavenging a VGPR does not have a great effect (saves three instructions
if no SGPR was scavenged), but we need to know if the VGPR we use is
live before or not, otherwise the machine verifier complains.
Differential Revision: https://reviews.llvm.org/D96336
This patch adds the memory operands for indexed loads so
that certain optimizations can take place.
Differential Revision: https://reviews.llvm.org/D100215/
Change-Id: I539fcf046ca4ad1e7df1d893f57d751419d8364d
We saw some big compiling time impact after enabling the debug entry value
feature for X86 platform(D73534). Compiling time goes from 900s->1600s with
our testcase. It is caused by allocating/freeing the memory busily.
'using FwdRegWorklist = MapVector<unsigned, SmallVector<FwdRegParamInfo, 2>>;'
The value for this map is vector, and we miss the reference when access the
element. The same happens for `auto CalleesMap = MF->getCallSitesInfo();` which is a DenseMap.
Reviewed by: djtodoro, flychen50
Differential Revision: https://reviews.llvm.org/D100162
Say we have
%1=min(%a,%b)
%2=min(%b,%c)
%3=min(%2,%a)
The optimization will try to reassociate the later one so that we can rewrite it to %3=min(%1, %c) and remove %2.
But if %2 has another uses outside of %3 then we can't remove %2 and end up with:
%1=min(%a,%b)
%2=min(%b,%c)
%3=min(%1, %c)
This doesn't harm by itself except it is not profitable and changes IR for no good reason.
What is bad it triggers next iteration which finds out that optimization is applicable to %2 and %3 and generates:
%1=min(%a,%b)
%2=min(%b,%c)
%3=min(%1,%c)
%4=min(%2,%a)
and so on...
The solution is to prevent optimization in the first place if intermediate result (%2) has side uses and
known to be not removed.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D100170
XSCMPUQP is not available for pre-P9 subtargets. This patch will lower
them into libcall for correct behavior on power7/power8.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92083
First, we don't need vector-ness for the predecessor lists.
Secondly, like elsewhere, do insertions before deletions.
Lastly, the check that we actually need to insert an edge,
that it doesn't exist already, is backwards. Instead of
looking at successors of every single 'PredOfBB',
just always look at predecessors of the 'Succ'.
The result is always the same, but we avoid *really* inefficient code.
While, indeed, we may end up pushing less updates that we'd reserve space
for, self-dominating updates aren't often enough for that to matter.
But this should matter for normal updates.
Improve AVX512 mask inversion, rG38c799bce801 exposed some missing opportunities to move scalar not() back onto the boolvector types for folding with setcc etc.
In the final SIMD spec, there is only a single v128.any_true instruction, rather
than one for each lane interpretation because the semantics do not depend on the
lane interpretation.
Differential Revision: https://reviews.llvm.org/D100241
Followup to D100177, handle an similar (demorgan inverse style) case from PR47797 as well
The AVX512 test cases could be further improved if we folded not(iX bitcast(vXi1)) -> (iX bitcast(not(vXi1)))
Alive2: https://alive2.llvm.org/ce/z/AnA_-W
The first source has the same EEW as the destination and the other
source is a scalar so the overlap constraints don't apply to
the unmasked version.
For the masked version we have a constraint that the destination
can't be V0 so that covers the only overlap issue there.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D100217
It breaks up the function pass manager in the codegen pipeline.
With empty parameters, it looks at the -mllvm flag -rewrite-map-file.
This is likely not in use.
Add a check that we only have one function pass manager in the codegen
pipeline.
This required reverting commit 9583a3f2625818b78c0cf6d473cdedb9f23ad82c:
"[AsmPrinter] Delete dead takeDeletedSymbsForFunction()".
This was not NFC as initially thought. By coalescing two function
psas managers, this exposed the reverted code as necessary.
addr-label.ll was crashing due to an emitted blockaddress's block being
removed but the label not emitted.
Some tests relied on the fact that we had a module pass somewhere in the
codegen pipeline.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99707
Check the cache before calling isLoopSimplifyForm(). Otherwise we'd
always perform the check for the innermost loop and only skip it
for dominating loops.
This patch fixed the following issues along side with some refactoring:
1. Fix bugs where StringRef for context string out live the underlying std::string. We now keep string table in profile generator to hold std::strings. We also do the same for bracketed context strings in profile writer.
2. Make sure profile output strictly follow (total sample, name) order. Previously, there's inconsistency between ProfileMap's key and FunctionSamples's name, leading to inconsistent ordering. This is now fixed by introducing context profile canonicalization. Assertions are also added to make sure ProfileMap's key and FunctionSamples's name are always consistent.
3. Enhanced error handling for profile writing to make sure we bubble up errors properly for both llvm-profgen and llvm-profdata when string table is not populated correctly for extended binary profile.
4. Keep all internal context representation bracket free. This avoids creating new strings for context trimming, merging and preinline. getNameWithContext API is now simplied accordingly.
5. Factor out the code for context trimming and merging into SampleContextTrimmer in SampleProf.cpp. This enables llvm-profdata to use the trimmer when merging profiles. Changes in llvm-profgen will be in separate patch.
Differential Revision: https://reviews.llvm.org/D100090
The default is likely wrong.
Out of all the callees, only a single one needs to pass-in false (JumpThread),
everything else either already passes true, or should pass true.
Until the default is flipped, at least make it harder to unintentionally
add new callees with UseBlockValue=false.
"Does the predicate hold between two ranges?"
Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
"Does the predicate hold between two ranges?"
Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".
Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D96805
Update llvm::sys::fs::mapped_file_region to have a move constructor and
a move assignment operator, allowing it to be used as an Optional. Also,
update FileOutputBuffer's OnDiskBuffer to take advantage of this,
avoiding an extra allocation from the unique_ptr.
A nice follow-up would be to make the mapped_file_region constructor
private and replace its use with a factory function, such as
mapped_file_region::create(), that returns an Expected (or ErrorOr). I
don't plan on doing that immediately, but I might swing back later.
No functionality change, besides the saved allocation in OnDiskBuffer.
Differential Revision: https://reviews.llvm.org/D100159
This adds support for swapping comparison operands when it may introduce new
folding opportunities.
This is roughly the same as the code added to AArch64ISelLowering in
162435e7b5.
For an example of a testcase which exercises this, see
llvm/test/CodeGen/AArch64/swap-compare-operands.ll
(Godbolt for that testcase: https://godbolt.org/z/43WEMb)
The idea behind this is that sometimes, we may be able to fold away, say, a
shift or extend in a compare by swapping its operands.
e.g. in the case of this compare:
```
lsl x8, x0, #1
cmp x8, x1
cset w0, lt
```
The following is equivalent:
```
cmp x1, x0, lsl #1
cset w0, gt
```
Most of the code here is just a reimplementation of what already exists in
AArch64ISelLowering.
(See `getCmpOperandFoldingProfit` and `getAArch64Cmp` for the equivalent code.)
Note that most of the AND code in the testcase doesn't actually fold. It seems
like we're missing selection support for that sort of fold right now, since SDAG
happily folds these away (e.g testSwapCmpWithShiftedZeroExtend8_32 in the
original .ll testcase)
Differential Revision: https://reviews.llvm.org/D89422
Remove the MachineDCE pass after the first SIFoldOperands pass now
that SIFoldOperands deletes its own dead instructions.
Differential Revision: https://reviews.llvm.org/D100189
When inserting a new def and renaming of uses is asked, always compute
IDF and do the renaming for the blocks with Phis in that IDF.
Resolves PR49859.
Differential Revision: https://reviews.llvm.org/D100163
Regiser types for xxsplti32dx for two td file patterns was incorrect.
Fixed the two types and added a test case that was reduced from a larger
failing test.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D100223
When lowering a BUILD_VECTOR SDNode, we choose among various possible vector
creation instructions in an attempt to minimize the total number of instructions
used. We previously considered using swizzles, consts, and splats, and this
patch adds shuffles as well. A common pattern that now lowers to shuffles is
when two 64-bit vectors are concatenated. Previously, concatenations generally
lowered to sequences of extract_lane and replace_lane instructions when they
could have been a single shuffle.
Differential Revision: https://reviews.llvm.org/D100018
I recently forgot a comma in a defm argument list and tablegen just
failed with exit code 1 without printing an error message. I believe
this issue was introduced in a9fc44c557.
This change prints the following instead:
.../clang/include/clang/Driver/Options.td:569:3: error: Expected comma before next argument
Reviewed By: Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D100178
There are four new PowerPC instructions that are introduced in
Power 10. They are hashst, hashchk, hashstp, hashchkp.
These instructions will be used for ROP Protection.
This patch adds the four instructions.
Reviewed By: nemanjai, amyk, #powerpc
Differential Revision: https://reviews.llvm.org/D99375
This patch updates the linkage name in the DISubprogram of coro-split
functions, which is particularly important for Swift, where the
funclets have a special name mangling. This patch does not affect C++
coroutines, since the DW_AT_specification is expected to hold the
(original) linkage name. I believe this is mostly due to limitations
in AsmPrinter, so we might be able to relax this restriction in the
future.
Differential Revision: https://reviews.llvm.org/D99693
I've initially just enabled this for BMI which has the ANDN instruction for i32/i64 - the i16/i8 cases give an idea of what'd we get when we enable it in all cases (I'll do this as a later commit).
Additionally, the i16/i8 cases could be freely promoted to i32 (as the args are already zeroext) and we could then make use of ANDN + the free cmp0 there as well - this has come up in PR48768 and PR49028 so I'm going to look at this soon.
https://alive2.llvm.org/ce/z/QVWHP_https://alive2.llvm.org/ce/z/pLngT-
Vector cases do not appear to benefit from this as we end up with having to generate the zero vector as well - this is one of the reasons I didn't try to tie this into hasAndNot/hasAndNotCompare.
Differential Revision: https://reviews.llvm.org/D100177
As suggested in the review thread for 5094e12 and seen in the
motivating example from https://llvm.org/PR49885, it's not
clear if we have a way to create the optimal code without
this heuristic.
In LazyValueInfoImpl::isNonNullAtEndOfBlock we populate a set of
pointers, known to be non-null at the end of a block (e.g. because we
did a load through them). We then infer that any pointer, based on an
element of this set is non-null as well ("based" here meaning a
non-null pointer is the underlying object). This is incorrect, even if
the base pointer was non-null, the value of a GEP, that lacks the
inbounds` attribute, may be null.
This issue appeared as miscompilation of the following test case:
int puts(const char *);
typedef struct iter {
int *val;
} iter_t;
static long distance(iter_t first, iter_t last) {
long r = 0;
for (; first.val != last.val; first.val++)
++r;
return r;
}
int main() {
int arr[2] = {0};
iter_t i, j;
i.val = arr;
j.val = arr + 1;
if (distance(i, j) >= 2)
puts("failed");
else
puts("passed");
}
This fixes PR49662.
Differential Revision: https://reviews.llvm.org/D99642
This is cheap to implement, means less work for future passes like
MachineDCE, and slightly improves the folding in some cases.
Differential Revision: https://reviews.llvm.org/D100117
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.
Fixed version of D99587, which does not rely on the address space.
Differential Revision: https://reviews.llvm.org/D99743
Add an ability to store `Offset` between partially aliased location. Use this
storage within returned `ResultAlias` instead of caching it in `AAQueryInfo`.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D98718
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D98027
These cases were failing before, but with cryptic asserts.
Add asserts in the RegScavenger that fail earlier with better
messages. NFC
Differential Revision: https://reviews.llvm.org/D100109
New SDTypeProfile can be reused for other word operation patterns without explicit i64 type in the future.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D100097
Previously loading the vtable used in calling a virtual method in a loop
was not hoisted out of the loop. This fixes that.
canSinkOrHoistInst() itself doesn't check that the load operands are
loop invariant, callers also check that separately.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99784
meetBDVState looks pretty difficult to read and follow.
This is purely NFC but doing several things:
1) Combine meet and meetBDVState
2) Move the function to be a member of BDVState
3) Make BDVState be a mutable object
4) Convert switch to sequence of ifs
5) Adds comments.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D99064