Summary:
The function SplitCriticalEdge (called by SplitEdge) can return a nullptr in
cases where the edge is a critical. SplitEdge uses SplitCriticalEdge assuming it
can always split all critical edges, which is an incorrect assumption.
The three cases where the function SplitCriticalEdge will return a nullptr is:
1. DestBB is an exception block
2. Options.IgnoreUnreachableDests is set to true and
isa(DestBB->getFirstNonPHIOrDbgOrLifetime()) is not equal to a nullptr
3. LoopSimplify form must be preserved (Options.PreserveLoopSimplify is true)
and it cannot be maintained for a loop due to indirect branches
For each of these situations they are handled in the following way:
1. Modified the function ehAwareSplitEdge originally from
llvm/lib/Transforms/Coroutines/CoroFrame.cpp to handle the cases when the DestBB
is an exception block. This function is called directly in SplitEdge.
SplitEdge does not call SplitCriticalEdge in this case
2. Options.IgnoreUnreachableDests is set to false by default, so this situation
does not apply.
3. Return a nullptr in this situation since the SplitCriticalEdge also returned
nullptr. Nothing we can do in this case.
Reviewed By: asbirlea
Differential Revision:https://reviews.llvm.org/D94619
Follow up to a6d2a8d6f5. These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
Follow up to a6d2a8d6f5. This covers all the public interfaces of the bundle related code. I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class. A follow up change will do the same for the newer assumption query/bundle mechanisms.
Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to
reorder FP operations. However, it may still be beneficial to vectorize the loop by moving
the reduction inside the vectorized loop and making sure that the scalar reduction value
be an input to the horizontal reduction, e.g:
%phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ]
%load = load <8 x float>
%reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load)
This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added
by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions.
For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D98435
Problem:
On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable.
Solution:
This patch adds two new flags
- OF_CRLF which indicates that CRLF translation is used.
- OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.
Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.
So this is the behaviour per platform with my patch:
z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode
Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return
The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set.
```
if (Flags & OF_CRLF)
CrtOpenFlags |= _O_TEXT;
```
These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows.
./llvm/lib/Support/raw_ostream.cpp
./llvm/lib/TableGen/Main.cpp
./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
./llvm/unittests/Support/Path.cpp
./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
./clang/lib/Frontend/CompilerInstance.cpp
./clang/lib/Driver/Driver.cpp
./clang/lib/Driver/ToolChains/Clang.cpp
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D99426
When converting a switch with two cases and a default into a
select, also handle the denegerate case where two cases have the
same value.
Generate this case directly as
%or = or i1 %cmp1, %cmp2
%res = select i1 %or, i32 %val, i32 %default
rather than
%sel1 = select i1 %cmp1, i32 %val, i32 %default
%res = select i1 %cmp2, i32 %val, i32 %sel1
as InstCombine is going to canonicalize to the former anyway.
This commit adjusts the order of two swappable if statements to
make code cleaner.
Reviewed By: lattner, nikic
Differential Revision: https://reviews.llvm.org/D99648
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
Using $ breaks demangling of the symbols. For example,
$ c++filt _Z3foov\$123
_Z3foov$123
This causes problems for developers who would like to see nice stack traces
etc., but also for automatic crash tracking systems which try to organize
crashes based on the stack traces.
Instead, use the period as suffix separator, since Itanium demanglers normally
ignore such suffixes:
$ c++filt _Z3foov.123
foo() [clone .123]
This is already done in some places; try to do it everywhere.
Differential revision: https://reviews.llvm.org/D97484
I think byval/sret and the others are close to being able to rip out
the code to support the missing type case. A lot of this code is
shared with inalloca, so catch this up to the others so that can
happen.
This is a small patch to make FoldBranchToCommonDest poison-safe by default.
After fc3f0c9c, only two syntactic changes are needed to fix unit tests.
This does not cause any assembly difference in testsuite as well (-O3, X86-64 Manjaro).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99452
This *only* changes the cases where we *really* don't care
about the iteration order of the underlying contained,
namely when we will use the values from it to form DTU updates.
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
`FoldBranchToCommonDest()` has a certain budget (`-bonus-inst-threshold=`)
for bonus instruction duplication. And currently it calculates the cost
as-if it will actually duplicate into each predecessor.
But ignoring the budget, it won't always duplicate into each predecessor,
there are some correctness and profitability checks.
So when calculating the cost, we should first check into which blocks
will we *actually* duplicate, and only then use that block count
to do budgeting.
We clone bonus instructions to the end of the predecessor block,
and then use `SSAUpdater::RewriteUseAfterInsertions()`.
But that only deals with the cases where the use-to-be-rewritten
are either in different block from the def, or come after the def.
But in some loop cases, the external use may be in the beginning of
predecessor block, before the newly cloned bonus instruction.
`SSAUpdater::RewriteUseAfterInsertions()` does not deal with that.
Notably, the external use can't happen to be both in the same block
and *after* the newly-cloned instruction, because of the fold preconditions.
To properly handle these cases, when the use is in the same block,
we should instead use `SSAUpdater::RewriteUse()`.
TBN, they do the same thing for PHI users.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49510
Likely Fixes https://bugs.llvm.org/show_bug.cgi?id=49689
2nd try (original: 27ae17a6b0) with fix/test for crash. We must make
sure that TTI is available before trying to use it because it is not
required (might be another bug).
Original commit message:
This is one step towards solving:
https://llvm.org/PR49336
In that example, we disregard the recommended usage of builtin_expect,
so an expensive (unpredictable) branch is folded into another branch
that is guarding it.
Here, we read the profile metadata to see if the 1st (predecessor)
condition is likely to cause execution to bypass the 2nd (successor)
condition before merging conditions by using logic ops.
Differential Revision: https://reviews.llvm.org/D98898
- Give unwieldy repeated expression a name
- Use a ranged `for` basic block iterator
Reviewed by: nikic, dexonsmith
Differential Revisision: https://reviews.llvm.org/D98957
Hoist early return for decl-only clones to before DIFinder
calculation.
Also fix an out of date assert message after invariants changed in
22a52dfddc.
Reviewed by: nikic, dexonsmith
Differential Revisision: https://reviews.llvm.org/D98957
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
This reverts commit 27ae17a6b0.
There are bot failures that end with:
#4 0x00007fff7ae3c9b8 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
#5 0x00007fff84e504d8 (linux-vdso64.so.1+0x4d8)
#6 0x00007fff7c419a5c llvm::TargetTransformInfo::getPredictableBranchThreshold() const (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMAnalysis.so.13git+0x479a5c)
...but not sure how to trigger that yet.
This is one step towards solving:
https://llvm.org/PR49336
In that example, we disregard the recommended usage of builtin_expect,
so an expensive (unpredictable) branch is folded into another branch
that is guarding it.
Here, we read the profile metadata to see if the 1st (predecessor)
condition is likely to cause execution to bypass the 2nd (successor)
condition before merging conditions by using logic ops.
Differential Revision: https://reviews.llvm.org/D98898
This attribute represents the minimum and maximum values vscale can
take. For now this attribute is not hooked up to anything during
codegen, this will be added in the future when such codegen is
considered stable.
Additionally hook up the -msve-vector-bits=<x> clang option to emit this
attribute.
Differential Revision: https://reviews.llvm.org/D98030
When eliminating comparisons, we can use common dominator of
all its users as context. This gives better results when ICMP is not
computed right before the branch that uses it.
Differential Revision: https://reviews.llvm.org/D98924
Reviewed By: lebedev.ri
Now that intrinsic name mangling can cope with unnamed types, the custom name mangling in PredicateInfo (introduced by D49126) can be removed.
(See D91250, D48541)
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D91661
We can prove more predicates when we have a context when eliminating ICmp.
As first (and very obvious) approximation we can use the ICmp instruction itself,
though in the future we are going to use a common dominator of all its users.
Need some refactoring before that.
Observed ~0.5% negative compile time impact.
Differential Revision: https://reviews.llvm.org/D98697
Reviewed By: lebedev.ri
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.
This reverts commit 01ac6d1587.
This caused non-deterministic compiler output; see comment on the
code review.
> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232
This reverts commit df69c69427.
This is a patch to add nonnull and align to assume's operand bundle
only if noundef exists.
Since nonnull and align in fn attr have poison semantics, they should be
paired with noundef or noundef-implying attributes to be immediate UB.
Reviewed By: jdoerfert, Tyker
Differential Revision: https://reviews.llvm.org/D98228
The test is reduced from a C source example in:
https://llvm.org/PR49541
It's possible that the test could be reduced further or
the predicate generalized further, but it seems to require
a few ingredients (including the "late" SimplifyCFG options
on the RUN line) to fall into the infinite-loop trap.
For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining.
Differential Revision: https://reviews.llvm.org/D98187