Since 67220c2ad7 empty SPSSequence<char>s deserialize to default-constructed
ArrayRef<char>s, which have a null data field. We need to check for this to
avoid memcpy'ing from a nullptr.
This should fix the bot failure in
https://lab.llvm.org/buildbot/#/builders/85/builds/9323
We were quite conservative when it came to PHI node handling to avoid
recursive reasoning. Now we check more direct if we have seen a PHI
already or not. This allows non-recursive PHI chains to be handled.
This also exposed a bug as we did only model the effect of one loop
traversal. `phi_no_store_3` has been adapted to show how we would have
used `undef` instead of `1` before. With this patch we don't replace
it at all, which is expected as we do not argue about loop iterations
(or alignments).
If we only have exact accesses we should never require the bit-pattern
to be uniform (in this case 0). Only a non-exact access should force us
to require only 0 values.
Add basic support for the MemProf metadata (!memprof and !callsite)
which was initially described in "RFC: IR metadata format for MemProf"
(https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165).
The bulk of the patch is verification support, along with some tests.
There are a couple of changes to the format described in the original
RFC:
Initial measurements suggested that a tree format for the stack ids in
the contexts would be more efficient, but subsequent evaluation with
large applications showed that in fact the cost of the additional
metadata nodes required by this deduplication scheme overwhelmed the
benefit from sharing stack id nodes. Therefore, the implementation here
and in follow on patches utilizes a simpler scheme of lists of stack id
integers in the memprof profile contexts and callsite metadata. The
follow on matching patch employs context trimming optimizations to
reduce the cost.
Secondly, instead of verbosely listing all profiled fields in each
profiled context (memory info block or MIB), and deferring the
interpretation of the profile data, the profile data is evaluated and
converted into string tags specifying the behavior (e.g. "cold") during
profile matching. This reduces the verbosity of the profile metadata,
and allows additional context trimming optimizations. As a result, the
named metadata schema description is also no longer needed.
Differential Revision: https://reviews.llvm.org/D128141
Disable the code responsible for preparing object libraries
for the driver and filling LLVM_DRIVER_* properties
if LLVM_TOOL_LLVM_DRIVER_BUILD is disabled. These properties are
consumed only by tools/llvm-driver, and so they are not used at all
if LLVM_TOOL_LLVM_DRIVER_BUILD is not enabled. At the same time,
the related code breaks standalone clang builds against LLVM built
with LLVM_LINK_LLVM_DYLIB.
Differential Revision: https://reviews.llvm.org/D130158
Noticed via inspection; to my knowledge, impossible to hit today. In theory, we could have a fixed stride check be analyzed, then a scalable one. With the old code, the scalable one would be silently dropped, and the runtime guard would go ahead with only the fixed one. This would be a miscompile.
If we are right shifting a multiply by a negated power of 2 where
the power of 2 is the same as the shift amount, we can replace with
a negate followed by an And.
New tests have not been committed yet but the patch shows the diffs.
Let me know if you want any changes or additional tests.
Differential Revision: https://reviews.llvm.org/D130103
VOPC DPP should not be formed when the row_mask and bank_mask are not
0xf (full) because the resulting VOP DPP would have different semantics
than the MOV DPP followed by VOP. Existing checks in GCNDPPCombine cover
this case but for different reasons, so assert the property for
future-proofing.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D130101
Due to the late expansion of the compare exchange sequences, there's
scope for improving codegen by folding the branches into the cmpxchg
loop (avoiding a branch-to-branch).
This patch restores a call to has_value to make it clear that we are
checking the presence of an optional value, not the underlying value.
This patch partially reverts d08f34b592.
Differential Revision: https://reviews.llvm.org/D129453
Put AllocationFn check before I->willReturn can allow CodeGenPrepare to remove useless malloc instruction
Differential Revision: https://reviews.llvm.org/D130126
This change improves ctags generation for tablegen files.
For the following example
```
class A;
class A {
int a;
}
```
Previously, tags were generated only for a forward declaration of class 'A'.
This patch allows generating tags for the forward declarations
and further definition of class 'A'.
Reviewed By: barannikov88
Original patch by: rusyaev-roman (Roman Rusyaev)
Some adjustments by: nhaehnle (Nicolai Hähnle)
Differential Revision: https://reviews.llvm.org/D129935
Change a couple of RUN lines to not depend on the presence or position
of the IR code sinking pass in the codegen pipeline, since it does not
belong in there anyway.
An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a result, we miscompiled when we encountered e.g. a srem i64 %v, -1 in a conditional block.
Instead of hand rolling the logic, just use the utility function which exists exactly for this purpose.
Differential Revision: https://reviews.llvm.org/D130106
When F calls G calls H, G is nounwind, and G is inlined into F, then the
inlined call-site to H should be effectively nounwind so as not to lose
information during inlining.
If H itself is nounwind (which often happens when H is an intrinsic), we
no longer mark the callsite explicitly as nounwind. Previously, there
were cases where the inlined call-site of H differs from a pre-existing
call-site of H in F *only* in the explicitly added nounwind attribute,
thus preventing common subexpression elimination.
v2:
- just check CI->doesNotThrow
v3 (resubmit after revert at 3443788087):
- update Clang tests
Differential Revision: https://reviews.llvm.org/D129860
In fact, in unreached code we can say that every fact is true. So do not waste time trying to
do something smarter.
Formally it's not an NFC because it may change query results in unreached code, but they
won't have any impact on execution.
Hypothetical CT boost expected but not measured in practice.
Differential Revision: https://reviews.llvm.org/D129878
The n_type field in the symbol table entry has two interpretations in XCOFF32, and a single interpretation in XCOFF64.
The new interpretation is used in XCOFF32 if the value of the o_vstamp field in the auxiliary header is 2.
In XCOFF64 and the new XCOFF32 interpretation, the n_type field is used for the symbol type and visibility.
The patch writes the aux header with an o_vstamp field value of 2 when the visibility is specified in XCOFF32 to make the new XCOFF32 interpretation used.
Reviewed By: DiggerLin, jhenderson
Differential Revision: https://reviews.llvm.org/D128148
Similar to what we already do in getNode for basic ADD/SUB nodes, return the X operand directly, but here we know that there will be no/zero overflow as well.
As noted on D127115 - this path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen without a topological worklist.
Given a patch like D129506, using instructions not valid for the current
target feature set becomes an error. This fixes an issue in
ARMExpandPseudo::ExpandCMP_SWAP where Thumb2 compares were used in
Thumb1Only code, such as thumbv8m.baseline targets.
Differential Revision: https://reviews.llvm.org/D129695
This patch introduces some initial def-use verification. This catches
cases like the one fixed by D129436.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D129717
PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target.
This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur.
Differential Revision: https://reviews.llvm.org/D129641
There is at least one Clang test (clang/test/CodeGen/arm_acle.c) which
has functions guarded by #if's that cause those functions to be compiled
only for a subset of RUN lines.
This results in a case where one RUN line has a body for the function
and another doesn't. Treat this case as a conflict for any prefixes that
the two RUN lines have in common.
This change exposed a bug where functions with '$' in the name weren't
properly recognized in ARM assembly (despite there being a test case
that was supposed to catch the problem!). This bug is fixed as well.
Differential Revision: https://reviews.llvm.org/D130089
By default if SVE is enabled we want the select instruction used for
reductions to be inside the loop, rather than outside. This makes it
possible for the backend to fold the select into the operation to
produce a single predicated add, fadd, etc.
Differential Revision: https://reviews.llvm.org/D129763
Add promotion and expansion of integer operands for
experimental_vp_strided SelectionDAG nodes; the expansion is actually
just a truncation of the stride operand.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D123112