Add iterator for ReductionNode traversal and use range to indicate the region we would like to keep. Refactor the interaction between Pass/Tester/ReductionNode.
Now it'll be easier to add new traversal type and OpReducer
Reviewed By: jpienaar, rriddle
Differential Revision: https://reviews.llvm.org/D99713
It's okay if the step is zero, we'll just stay at the same non-zero
value in that case. The valuable part of this is that the step
doesn't even need to be a constant anymore.
This is a service function generally useful for selection
of a FI in an SADDR. NFC for now, needed for future patch.
Differential Revision: https://reviews.llvm.org/D100406
Debug the input path for READ statements with END= labels;
don't emit errors when the program can handle them.
BeginReadingRecord() member functions have been made
"bool" for more convenient handling of error cases,
and some code in IoErrorHandler has been cleaned up.
Differential Revision: https://reviews.llvm.org/D100421
Now that these instructions are no longer prototypes, we do not need to be
careful about keeping them opt-in and can use the standard LLVM infrastructure
for them. This commit removes the bespoke intrinsics we were using to represent
these operations in favor of the corresponding target-independent intrinsics.
The clang builtins are preserved because there is no standard way to easily
represent these operations in C/C++.
For consistency with the scalar codegen in the Wasm backend, the intrinsic used
to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though
@llvm.roundeven better captures the semantics of the underlying Wasm
instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is
left to a potential future patch.
Differential Revision: https://reviews.llvm.org/D100411
Some of Microsoft's unit tests in D70631 fail because libc++'s
implementation of std::chars_format isn't a proper bitmask type. Adding
the required functions to make std::chars_format a proper bitmask type.
Implements parts of P0067: Elementary string conversions
Differential Revision: https://reviews.llvm.org/D97115
Multiple lines importing from the same URL can be merged:
import {X} from 'a';
import {Y} from 'a';
Merge to:
import {X, Y} from 'a';
This change implements this merge operation. It takes care not to merge in
various corner case situations (default imports, star imports).
Differential Revision: https://reviews.llvm.org/D100466
Consider the following set of files:
a.cc:
#include "a.h"
a.h:
#ifndef A_H
#define A_H
#include "b.h"
#include "c.h" // This gets "skipped".
#endif
b.h:
#ifndef B_H
#define B_H
#include "c.h"
#endif
c.h:
#ifndef C_H
#define C_H
void c();
#endif
And the output of the -H option:
$ clang -c -H a.cc
. ./a.h
.. ./b.h
... ./c.h
Note that the include of c.h in a.h is not shown in the output (GCC does the
same). This is because of the include guard optimization: clang knows c.h is
covered by an include guard which is already defined, so when it sees the
include in a.h, it skips it. The same would have happened if #pragma once were
used instead of include guards.
However, a.h *does* include c.h, and it may be useful to show that in the -H
output. This patch adds a flag for doing that.
Differential revision: https://reviews.llvm.org/D100480
This transformation is fundamentally broken when it comes to dominance,
it just happened to work when the source of the memcpy can be moved into
the place of the alloca. The bug shows up a lot more often since
077bff39d4 allows the source to be a
switch.
It would be possible to check dominance of the source and all its
operands, but that seems very heavy for instcombine.
Rename the name of "LDS lowering" pass from `amdgpu-disable-lower-module-lds` to
`amdgpu-enable-lower-module-lds` as later is consistent and reads better.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D100441
In the fold SHUFFLE(BINOP(X,Y),BINOP(Z,W)) -> BINOP(SHUFFLE(X,Z),SHUFFLE(Y,W)), check if both X/Z AND Y/W have at least one merge-able shuffle in which case the total number of shuffle should still fall.
Helps with instruction count regressions we saw while fixing PR48823
Only attempt to propagateIRFlags if we have both SelectInst - afaict we shouldn't have matched a min/max reduction without both SelectInst, but static analyzer doesn't know that.
The existing BTI placement pass avoids inserting "BTI c" when the
function has local linkage and is only directly called. However,
even in this case, there is a (small) chance that the linker later
adds a hunk with an indirect call to the function, e.g. if the
function is placed in a separate section and moved far away from
its callers. Make sure to add BTI for these functions too.
Differential Revision: https://reviews.llvm.org/D99417
This refactors SCCP and creates a SCCPSolver interface and class so that it can
be used by other passes and transformations. We will use this in D93838, which
adds a function specialisation pass.
This is based on an early version by Vinay Madhusudan.
Differential Revision: https://reviews.llvm.org/D93762
ICC permits this, and after some extensive testing it looks like we can
support this with very little trouble. We intentionally don't choose to
do this with attribute-target (despite it likely working as well!)
because GCC does not support that, and introducing said
incompatibility doesn't seem worth it.
Stepping through callstacks in the example from D99759 reveals
this potential compile-time improvement.
The savings come from avoiding ValueTracking's computing known
bits if we have already dealt with special-case patterns.
Further improvements in this direction seem possible.
This makes a degenerate test based on PR49785 about 40x faster
(25 sec -> 0.6 sec), but it does not address the larger question
of how to limit computeKnownBitsFromAssume(). Ie, the original
test there is still infinite-time for all practical purposes.
Differential Revision: https://reviews.llvm.org/D100408
Otherwise it reuses the same register for storing the stack slot
offset if the stack slot offset is big.
Differential Revision: https://reviews.llvm.org/D100461
The start value can't be null for something to be a non-zero
recurrence, so hoist that common check out of the switch.
Subsequent checks may be incomplete or over-specified as noted in:
D100408
These were added in 37935405ef,
but they fail on macOS (and on Windows with MSYS based tools, before
relanding D98859). Remove the tests that exercise "not not echo", as
the primary thing to test is the plain echo patterns above.
Aggregate types over 16 bytes are passed by reference.
Contrary to the x86_64 ABI, smaller structs with an odd (non power
of two) are padded and passed in registers.
Differential Revision: https://reviews.llvm.org/D100374
By checking for cpu and toolchain features ahead
of time we don't need the custom return codes.
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/D97684
After 077bff39d4,
isDereferenceableForAllocaSize() can recurse into selects,
which is causing a problem for the new test case,
reduced from https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20210412/904154.html
because the replacement (the select) is defined after the first use
of an alloca, so we'd end up with a verifier error.
Now, this new check is too restrictive.
We likely can handle *some* cases, by trying to sink all uses of an alloca
to after the the def.
Extension to rG74f98391a7a4, we can also include any of the upper (known zero) bits in the comparison in the shuffle removal fold, just as long as we demand all the elements of the movmsk source vector.