CHUNK_ALLOCATED. CHUNK_QUARANTINE are only states
which make AsanChunk useful for GetAsanChunk callers.
In either case member of AsanChunk are not useful.
Fix few cases which didn't expect nullptr. Most of the callers are already
expects nullptr.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D87135
This implements support for isKnownNonZero, computeKnownBits when freeze is involved.
```
br (x != 0), BB1, BB2
BB1:
y = freeze x
```
In the above program, we can say that y is non-zero. The reason is as follows:
(1) If x was poison, `br (x != 0)` raised UB
(2) If x was fully undef, the branch again raised UB
(3) If x was non-zero partially undef, say `undef | 1`, `freeze x` will return a nondeterministic value which is also non-zero.
(4) If x was just a concrete value, it is trivial
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D75808
Previously, DwarfFDECache::findFDE used 0 as a special value meaning
"search the entire cache, including dynamically-registered FDEs".
Switch this special value to -1, which doesn't make sense as a DSO
base.
Fixes PR47335.
Reviewed By: compnerd, #libunwind
Differential Revision: https://reviews.llvm.org/D86748
It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are
incorrectly processed so one of two packed values were lost.
Introduced new function to check immediate 32-bit operand can be folded.
Converted condition about current op_sel flags value to fall-through.
Fixes: SWDEV-247595
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D87158
We were missing support for the G_ADD_LOW + ADRP folding optimization in the
manual selection code for G_LOAD, G_STORE, and G_ZEXTLOAD.
As a result, we were missing cases like this:
```
@foo = external hidden global i32*
define void @baz(i32* %0) {
store i32* %0, i32** @foo
ret void
}
```
https://godbolt.org/z/16r7ad
This functionality already existed in the addressing mode functions for the
importer. So, this patch makes the manual selection code use
`selectAddrModeIndexed` rather than duplicating work.
This is a 0.2% geomean code size improvement for CTMark at -O3.
There is one code size increase (0.1% on lencod) which is likely because
`selectAddrModeIndexed` doesn't look through constants.
Differential Revision: https://reviews.llvm.org/D87397
Atomic stores are modeled as MemoryDef to model the fact that they may
not be reordered, depending on the ordering constraints.
Atomic stores that are monotonic or weaker do not limit re-ordering, so
we do not have to treat them as potential read clobbers.
Note that llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll
already contains a set of negative test cases.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87386
fminnum(X, NaN) is X, fminimum(X, NaN) is NaN. This mirrors the
behavior of existing InstSimplify folds.
This is expected to improve the reduction lowerings in D87391,
which use NaN as a neutral element.
Differential Revision: https://reviews.llvm.org/D87415
We weren't using this before, so none of the MachineFunction CFG edges had the
branch probability information added. As a result, block placement later in the
pipeline was flying blind.
This is enabled only with optimizations enabled like SelectionDAG.
Differential Revision: https://reviews.llvm.org/D86824
This is a port of the functionality from SelectionDAG, which tries to find
a tree of conditions from compares that are then combined using OR or AND,
before using that result as the input to a branch. Instead of naively
lowering the code as is, this change converts that into a sequence of
conditional branches on the sub-expressions of the tree.
Like SelectionDAG, we re-use the case block codegen functionality from
the switch lowering utils, which causes us to generate some different code.
The result of which I've tried to mitigate in earlier combine patches.
Differential Revision: https://reviews.llvm.org/D86665
This combine previously tried to take sequences like:
%cond = G_ICMP pred, a, b
G_BRCOND %cond, %truebb
G_BR %falsebb
%truebb:
...
%falsebb:
...
and by inverting the compare predicate and swapping branch targets, delete the
G_BR and instead have a single conditional branch to the falsebb. Since in an
earlier patch we have a combine to fold not(icmp) into just an inverted icmp,
we don't need this combine to do as much. This patch instead generalizes the
combine by just looking for:
G_BRCOND %cond, %truebb
G_BR %falsebb
%truebb:
...
%falsebb:
...
and then inverting the condition using a not (xor). The xor can be folded away
in a separate combine. This change also lets us avoid some optimization code
in the IRTranslator.
I also think that deleting G_BRs in the combiner is unnecessary. That's
something that targets can decide to do at selection time and could simplify
generic code in future.
Differential Revision: https://reviews.llvm.org/D86664
I was having a lot of trouble parsing the messages. In particular, the
messages like:
```
<stdin>:3:8: error: 'scf.if' op along control flow edge from Region #0 to scf.if source #1 type '!npcomprt.tensor' should match input #1 type 'tensor<?xindex>'
```
In particular, one thing that kept catching me was parsing the "to scf.if
source #1 type" as one thing, but really it is
"to parent results: source type #1".
Differential Revision: https://reviews.llvm.org/D87334
The entry block is split at the first instruction where `shouldKeepInEntry`
returns false. The created basic block has a br jumping to the original entry
block. The new basic block causes the function label line and the other entry
block lines to be covered by different basic blocks, which can affect line
counts with special control flows (fork/exec in the entry block requires
heuristics in llvm-cov gcov to get consistent line counts).
int main() { // BB0
return 0; // BB2 (due to entry block splitting)
}
// BB1 is the exit block (since gcov 4.8)
This patch adds a synthetic entry block (like PGOInstrumentation and GCC) and
inserts an edge from the synthetic entry block to the original entry block. We
can thus remove the tricky `shouldKeepInEntry` and entry block splitting. The
number of basic blocks does not change, but the emitted .gcno files will be
smaller because we can save one GCOV_TAG_LINES tag.
// BB0 is the synthetic entry block with a single edge to BB2
int main() { // BB2
return 0; // BB2
}
// BB1 is the exit block (since gcov 4.8)
This is the initial part of the implementation of the C++20 likelihood
attributes. It handles the attributes in an if statement.
Differential Revision: https://reviews.llvm.org/D85091
Extract a function for turning `eLaunchFlavorDefault` into a concreate `eLaunchFlavor` value.
This new function encapsulates the few compile time variables involved, and also prevents clang unused code diagnostics.
Differential Revision: https://reviews.llvm.org/D87327
The benchmarks expect to be built in C++17 or newer, but this
isn't always how CMake configures the C++ dialect. Instead
we need to explicitly set the CXX_STANDARD target property.
During the main DAGCombine loop, whenever a node gets replaced, the new
node and all its users are pushed onto the worklist. Omit this if the
new node is the EntryToken (e.g. if a store managed to get optimized
out), because re-visiting the EntryToken and its users will not uncover
any additional opportunities, but there may be a large number of such
users, potentially causing compile time explosion.
This compile time explosion showed up in particular when building the
SingleSource/UnitTests/matrix-types-spec.cpp test-suite case on any
platform without SIMD vector support.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86963
Summary:
This patch starts adding support for adding information dumps to libomptarget
and rtl plugins. The information printing is controlled by the
LIBOMPTARGET_INFO environment variable introduced in D86483. The goal of this
patch is to provide the user with additional information about the device
during kernel execution and providing the user with information dumps in the
case of failure. This patch added the ability to dump the pointer mapping table
as well as printing the number of blocks and threads in the cuda RTL.
Reviewers: jdoerfort gkistanova ye-luo
Subscribers: guansong openmp-commits sstefan1 yaxunl ye-luo
Tags: #OpenMP
Differential Revision: https://reviews.llvm.org/D87165
This commit specifies reduction dimensions for ConvOps. This prevents
running reduction loops in parallel and enables easier detection of kernel dimensions
which we will need later on.
Differential Revision: https://reviews.llvm.org/D87288
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.
One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87220