When lowering a gc.result, we can assume that the result type of the gc.result matches the type of the underlying call. This is explicitly required in LangRef.
At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.
This patch shuffles some functions around so that some blocks of code can
be reused. In particular,
* Move the determination of "which blocks are in scope" to its own
function, as it's non-trivial to solve. Delete the "InScopeBlocks"
collection too, which nothing reads from.
* Split transfer emission (i.e., installing DBG_VALUEs into blocks) into
its own function.
* Name some useful types.
* Rename "ScopeToBlocks" to "ScopeToAssignBlocks", as that's what the
collection contains, blocks where assignments happen.
Differential Revision: https://reviews.llvm.org/D118454
ValueIDNum is supposed to be a value type that boils down to a uint64_t,
that has some bitfields for convenience. If we use the default operator=,
we end up with each bit field being individually assigned, which is
un-necessarily slow.
Implement the assignment operator by just copying the uint64_t value of
the object. This is quicker, and matches how the comparison operators
work already. Doing so is 0.1% faster on the compile-time-tracker.
Fixes a crash ('Invalid size request on a scalable vector') in visitAlloca()
when we call this function for a scalable alloca instruction, caused
by the implicit conversion of TySize to uint64_t.
This patch changes TySize to a TypeSize as returned by getTypeAllocSize()
and ensures the allocation size is multiplied by vscale for scalable vectors.
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D118372
We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements.
This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.
If we only assign a variable value a single time, we can take a short-cut
when computing its location: the variable value is only valid up to the
dominance frontier of where the assignemnt happens. Past that point, there
are other predecessors from where the variable has no value, meaning the
variable has no location past that point.
This patch recognises this scenario, and avoids expensive SSA computation,
to improve compile-time performance.
Differential Revision: https://reviews.llvm.org/D117877
If AllocationOrder has less than 32 elements, we were treating the extra
positions as if they were valid. This was detected by a subsequent
assert. The fix also tightens the asserts.
Both IDFCalculatorBase and its accompanying DominatorTreeBase only supports pointer nodes. The template argument is the block type itself and any uses of GraphTraits is therefore done via a pointer to the node type.
However, the ChildrenGetterTy type of IDFCalculatorBase has a use on just the node type instead of a pointer to the node type. Various parts of the monorepo has worked around this issue by providing specializations of GraphTraits for the node type directly, or not been affected by using specializations instead of the generic case. These are unnecessary however and instead the generic code should be fixed instead.
An example from within Tree is eg. A use of IDFCalculatorBase in InstrRefBasedImpl.cpp. It basically instantiates a IDFCalculatorBase<MachineBasicBlock, false> but due to the bug above then goes on to specialize GraphTraits<MachineBasicBlock> although GraphTraits<MachineBasicBlock*> exists (and should be used instead).
Similar dead code exists in clang which defines redundant GraphTraits to work around this bug.
This patch fixes both the original issue and removes the dead code that was used to work around the issue.
Differential Revision: https://reviews.llvm.org/D118386
Close#52781: for LTO, the inline asm diagnostic uses `<inline asm>` as the file
name (lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp) and it is unclear which
module has the issue.
With this patch, we will see the module name (say `asm.o`) before `<inline asm>` with ThinLTO.
```
% clang -flto=thin -c asm.c && myld.lld asm.o -e f
ld.lld: error: asm.o <inline asm>:1:2: invalid instruction mnemonic 'invalid'
invalid
^~~~~~~
```
For regular LTO, unfortunately the original module name is lost and we only get
ld-temp.o.
Reviewed By: #lld-macho, ychen, Jez Ng
Differential Revision: https://reviews.llvm.org/D118434
If we have a vector FP division with a splatted divisor, use
getVectorMinNumElements when scaling the num of uses by splat factor.
For AArch64 the combine kicks in for the <vscale x 4 x float> case since it's
above the fdiv threshold (3) when scaling num uses by splat factor, but the
codegen is worse (splat + vector fdiv + vector fmul) than the <vscale x 2 x
double> case (splat + vector fdiv).
If the combine could be converted into a scalar FP division by
scalarizeBinOpOfSplats it may be cheaper, but it looks like this is predicated
on the isExtractVecEltCheap TLI function which is implemented for x86 but not
AArch64. Perhaps for now combineRepeatedFPDivisors should only scale num uses
by splat if the division can be converted into scalar op.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D118343
Shiny new DBG_PHI instruction usually have physical registers as operands
-- however, the machine verifier checks to see whether they're live, and
occasionally this fails. There's a filter for DBG_VALUE instructions to not
get verified in this way: expand it to exempt all debug instructions from
liveness checking, which means DBG_PHIs get treated like DBG_VALUEs.
This also future proofs against us adding new debug instructions.
Differential Revision: https://reviews.llvm.org/D117891
On the level of the generated object files, both symbols (both
original and alias) are generally indistinguishable - both are
regular defined symbols. But previously, only the original
function had the COFF ComplexType set to IMAGE_SYM_DTYPE_FUNCTION,
while the symbol created via an alias had the type set to
IMAGE_SYM_DTYPE_NULL.
This matches what GCC does, which emits directives for setting the
COFF symbol type for this kind of alias symbol too.
This makes a difference when GNU ld.bfd exports symbols without
dllexport directives or a def file - it seems to decide between
function or data exports based on the COFF symbol type. This means
that functions created via aliases, like some C++ constructors,
are exported as data symbols (missing the thunk for calling without
dllimport).
The hasnt been an issue when doing the same with LLD, as LLD decides
between function or data export based on the flags of the section
that the symbol points at.
This should fix the root cause of
https://github.com/msys2/MINGW-packages/issues/10547.
Differential Revision: https://reviews.llvm.org/D118328
Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
* We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
* We use a single byte per function rather than 8 bytes per block
The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.
When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).
[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D116180
We already perform some basic folds (add/sub with zero etc.) on scalar types, this patch adds some basic support for constant splats as well in a few cases (we can add more with future test coverage).
In the cases I've enabled, we can handle buildvector implicit truncation as we're not creating new constant nodes from the vector types - we're just returning existing nodes. This allows us to get a number of extra cases in the aarch64 tests.
I haven't enabled support for undefs in buildvector splats, as we're often checking for zero/allones patterns that return the original constant and we shouldn't be returning undef elements in some of these cases - we can enable this later if we're OK with creating new constants.
Differential Revision: https://reviews.llvm.org/D118264
This patch adds support for expanding VP_MERGE through a sequence of
vector operations producing a full-length mask setting up the elements
past EVL/pivot to be false, combining this with the original mask, and
culminating in a full-length vector select.
This expansion should work for any data type, though the only use for
RVV is for boolean vectors, which themselves rely on an expansion for
the VSELECT.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D118058
A shift-left > 63 triggers a UBSAN failure. This patch kicks the can
down the road (to the consumer) by emitting a more compact
representation of the shift computation in DWARF expressions.
Relanding (I accidentally pushed an earlier version of the patch previously).
Differential Revision: https://reviews.llvm.org/D118183
The physical register in the asm has the wrong type for the declared
IR. It seems to work in the DAG by extracting the 4 elements that are
defined in the IR from the register, but that isn't handled here. This
doesn't seem to be a well tested path since other mismatched cases are
crashing the DAG asm handling.
A shift-left > 63 triggers a UBSAN failure. This patch kicks the can
down the road (to the consumer) by emitting a more compact
representation of the shift computation in DWARF expressions.
Differential Revision: https://reviews.llvm.org/D118183
DIStringType is used to encode the debug info of a character object
in Fortran. A Fortran deferred-length character object is typically
implemented as a pair of the following two pieces of info: An address
of the raw storage of the characters, and the length of the object.
The stringLocationExp field contains the DIExpression to get to the
raw storage.
This patch also enables the emission of DW_AT_data_location attribute
in a DW_TAG_string_type debug info entry based on stringLocationExp
in DIStringType.
A test is also added to ensure that the bitcode reader is backward
compatible with the old DIStringType format.
Differential Revision: https://reviews.llvm.org/D117586
This reverts commit ef82063207.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
The loop below the changed line assumes that the element
width of the target constant is the same as the element
width of the loaded value, but that is not always true.
We could try harder to do some kind of min/max calc even
if the sizes don't match, but that can be another patch
if needed. This fixes#53401 (miscompile) and does not
change the motivating cases added when this analysis
was introduced:
ad298f86b7
Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence.
This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32)
on those targets which have no V_XNOR.
Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form.
We assume that xor (not) is already turned into the not (xor) by the combiner.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D116270
This is the unsigned variant of D111976, where we convert a clamped
fptoui to a fptoui.sat. Because we are unsigned, the condition this time
is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes.
This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.
Differential Revision: https://reviews.llvm.org/D114964
When evicting interference, it causes an asseertion error
since LiveIntervals::intervalIsInOneMBB assumes that input
is not empty.
This patch fixed bug mentioned in D118020.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D118124
A shift-left > 63 triggers a UBSAN failure. This patch kicks the can
down the road (to the consumer) by emitting a more compact
representation of the shift computation in DWARF expressions.
Differential Revision: https://reviews.llvm.org/D118183
During fast-isel calling 'markFunctionEnd' in the base class will call
tidyLandingPads. This can cause an issue where we have determined that
we need ehinfo and emitted a traceback table with the bits set to
indicate that we will be emitting the ehinfo, but the tidying deletes
all landing pads. In this case we end up emitting a reference to
__ehinfo.N symbol, but not emitting a definition to said symbol and the
resulting file fails to assemble.
Differential Revision: https://reviews.llvm.org/D117040
An sra is basically sign-extending a narrower value. Fold away the
shift by doing a sextload of a narrower value, when it is legal to
reduce the load width accordingly.
Differential Revision: https://reviews.llvm.org/D116930
This patch adds widening support for ISD::VP_MERGE, which widens
identically to VP_SELECT and similarly to other select-like nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D118030
This patch adds splitting support for ISD::VP_MERGE, which splits
identically to VP_SELECT and similarly to other select-like nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D118032
Split these nodes in a similar way as their masked versions.
Reviewed By: frasercrmck, craig.topper
Differential Revision: https://reviews.llvm.org/D117760
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecating the API.
DwarfCompileUnit::getOrCreateSourceID() is often called many times
in sequence with the same DIFile. This is currently very expensive,
because it involves creating a string from directory and file name
and looking it up in a string map. This patch remembers the last
DIFile and its ID and directly returns that.
This gives a geomean -1.3% compile-time improvement on CTMark O0-g.
Differential Revision: https://reviews.llvm.org/D118041
This matches the actual runtime function more closely.
I considered also renaming both RetainRV/UnsafeClaimRV to end with
"ARV", for AutoreleasedReturnValue, but there's less potential
for confusion there.