Use BFI if it is available and BPI otherwise.
This is a promised follow-up after D89541.
Reviewers: ebrevnov, mkazantsev
Reviewed By: ebrevnov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D89773
Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes.
There are still problems with unsigned i32 conversions too which I'll try to fix in another patch.
Fixes part of PR47393
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87115
Seems users have enough different uses of the symbolizer where they
might have unknown binaries and offsets such that "best effort" behavior
is all that's expected of llvm-symbolizer - so even erroring on unknown
executables and out of bounds offsets might not be suitable.
This reverts commit 1de0199748.
This reverts commit a7b209a6d4.
This reverts commit 338dd138ea.
Prior to this patch, computeKnownBits would only try to deduce trailing zeros
bits for getelementptrs. This patch adds the logic to treat geps as a series
of add * scaling factor.
Thanks to this patch, using a gep or performing an address computation
directly "by hand" (ptrtoint followed by adds and mul followed by inttoptr)
offers the same computeKnownBits information.
Previously, the "by hand" approach would have given more information.
This is related to https://llvm.org/PR47241.
Differential Revision: https://reviews.llvm.org/D86364
PreservedCFGCheckerInstrumentation was saying that LowerMatrixIntrinsics
didn't properly preserve CFG even though it claimed to. The legacy pass
says it doesn't. Match the legacy pass's preserved analyses.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D89175
For GC parseable element atomic memcpy/memmove we'll need to
shuffle statepoint arguments. Make it possible by storing the
arguments as Value *, not Use *.
1. Fixed liveness issue with implicit kills.
2. Fixed potential problem with an indirect mov.
Fixes: SWDEV-256848
Differential Revision: https://reviews.llvm.org/D89599
We use the Real vs Pseudo instruction abstraction for other
types of instructions to facilitate changes in opcode
between gpu generations.
This patch introduces that abstraction to SOPC and SOPP.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D89738
Change-Id: I59d53c2c7058b49d05b60350f4062a9b542d3138
Fixes being overly conservative with the register counts in called
functions. This should try to do a conservative range merge, but for
now just clone.
Also fix not being able to functionally run the pass standalone.
As mentioned post-commit in D85749, the 'substituteDebugValuesForInst'
method added in c521e44def would be better off with a limit on the
number of operands to substitute. This handles the common case of
"substitute the first operand between these two differing instructions",
or possibly up to N first operands.
For historical reasons, the R6 register is a callee-saved argument
register. This means that if it is used to pass an argument to a function
that does not clobber it, it is live throughout the function.
This patch makes sure that in this special case any kill flags of it are
removed.
Review: Ulrich Weigand, Eli Friedman
Differential Revision: https://reviews.llvm.org/D89451
Some instructions may be removable through processes such as IfConversion,
however DefinesPredicate can not be made aware of when this should be considered.
This parameter allows DefinesPredicate to distinguish these removable instructions
on a per-call basis, allowing for more fine-grained control from processes like
ifConversion.
Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose
Differential Revision: https://reviews.llvm.org/D88494
Updates an optimization that relies on boolean contents being either 0
or 1 to properly check for this before triggering.
The following:
(X & 8) != 0 --> (X & 8) >> 3
Produces unexpected results when a boolean 'true' value is represented
by negative one.
Patch by Erik Hogeman.
Differential Revision: https://reviews.llvm.org/D89390
We were previously relying upon the TypeSize comparison operators to
obtain the maximum size of two types, however use of such operators is
being deprecated in favour of making the caller aware that it could
be dealing with scalable vector types. I have changed the code to assert
that the two types have the same scalable property and thus we can
simply take the maximum of the known minimum sizes instead.
Differential Revision: https://reviews.llvm.org/D88563
When we need to prove implication of expressions of different type width,
the default strategy is to widen everything to wider type and prove in this
type. This does not interact well with AddRecs with negative steps and
unsigned predicates: such AddRec will likely not have a `nuw` flag, and its
`zext` to wider type will not be an AddRec. In contraty, `trunc` of an AddRec
in some cases can easily be proved to be an `AddRec` too.
This patch introduces an alternative way to handling implications of different
type widths. If we can prove that wider type values actually fit in the narrow type,
we truncate them and prove the implication in narrow type.
Differential Revision: https://reviews.llvm.org/D89548
Reviewed By: fhahn
This reverts commit a10a64e7e3.
It broke polly/test/ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_3.ll
The difference suggests that this may be a serious issue.
It appears for Swift there was confusing errors when trying to parse APINotes, when libAPINotes and libInterfaceStub are linked, they both export symbol
`__ZN4llvm4yaml7yamlizeINS_12VersionTupleEEENSt3__19enable_ifIXsr16has_ScalarTraitsIT_EE5valueEvE4typeERNS0_2IOERS5_bRNS0_12EmptyContextE`, and discovered
same symbol defined within llvm-ifs.
This consolidates the boilerplate into YAMLTraits and defers the specific validation in reading the whole input.
fixes: rdar://problem/70450563
Reviewed By: phosek, dblaikie
Differential Revision: https://reviews.llvm.org/D89764
Comparing 32-bit `ptrdiff_t` against 32-bit `unsigned` results in
`-Wsign-compare` warnings for both GCC and Clang.
The warning for the cases in question appear to identify an issue
where the `ptrdiff_t` value would be mutated via conversion to an
unsigned type.
The warning is resolved by using the usual arithmetic conversions to
safely preserve the value of the `unsigned` operand while trying to
convert to a signed type. Host platforms where `unsigned` has the same
width as `unsigned long long` will need to make a different change, but
using an explicit cast has disadvantages that can be avoided for now.
Reviewed By: dantrushin
Differential Revision: https://reviews.llvm.org/D89612
Passes that are run after the post-RA scheduler may insert instructions like
waitcnt which eliminate the need for certain noops. After this patch the
scheduler is still aware of possible latency from hazards but noops will
not be inserted until the dedicated hazard recognizer pass is run.
Depends on D89753.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D89754
If a target can encode multiple wait-states into a noop allow emitting such
instructions directly.
Reviewed By: rampitec, dmgreen
Differential Revision: https://reviews.llvm.org/D89753
Change waitcnt insertion to check the memory operand tokens to see if
flat memory operations access VMEM in the same way it does to check if
accessing LDS. This avoids adding waitcnt for counters for address
spaces that are not accessed.
In addition, only generate the pessimistic waitcnt 0 if a flat memory
operation appears to access both VMEM and LDS.
This benefits flat memory operations that explicitly specify the
address space as GLOBAL or LOCAL.
Differential Revision: https://reviews.llvm.org/D89618
This patch teaches BasicBlock::print to construct an instance of
SlotTracker with the containing function.
Without this patch, we dump:
*** IR Dump After LoopInstSimplifyPass ***
; Preheader:
br label %1
; Loop:
<badref>: ; preds = %1, %0
br label %1
Note "<badref>" above. This happens because BasicBlock::print calls:
SlotTracker SlotTable(this->getModule());
Note that this constructor does not add the contents of functions to
the slot table. That is, basic blocks are left unnumbered.
This patch fixes the problem by switching to:
SlotTracker SlotTable(this->getParent());
which does add the contents of the Module and the function,
this->getParent(), to the slot table.
Differential Revision: https://reviews.llvm.org/D89567
This is to simplify icmp instructions in the form like:
%cmp = icmp eq i32 (i8*, i8*)* bitcast (i32 (i32**, i32**)* @f32 to i32
%(i8*, i8*)), bitcast (i32 (i64**, i64**) @f64 to i32 (i8*, i8*)*)
Here @f32 and @f64 are two functions.
Differential Revision: https://reviews.llvm.org/D87850
When generating the use-list order, also consider value uses that are
operands which are wrapped in metadata; e.g. llvm.dbg.value operands.
This fixes PR36778. The test case is based on the reproducer from that
report.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D53758
Avoid having to instantiate and compile a subset of the dominator tree logic
separately for each node type. More importantly, this allows generic
algorithms to be built on top of dominator trees without writing them as
templates -- such algorithms can now use opaque CfgBlockRef and
CfgInterface instead.
A type-erased implementation of dominator trees could be written in
terms of CfgInterface as well, but doing so would change the current
trade-off: it would slightly reduce code size at the cost of a slight
runtime overhead.
This patch does not change the trade-off, as it only does type-erasure
where basic blocks can be treated in a fully opaque way, i.e. it only
moves methods that don't require iteration over CFG successors and
predecessors.
v5:
- rename generic_{begin,end,children} back without the generic_ prefix
and refer explictly to base class methods in NewGVN, which wants to
mutate the order of dominator tree node children directly
v6:
- style change: iDom -> idom; it's arguable whether this is really
invalid, since it is actually standard camelCase, but clang-tidy
complains about it so... *shrug*
- rename {to,from}Generic -> {wrap,unwrap}Ref
Change-Id: Ib860dc04cf8bb093d8ed00be7def40d662213672
Differential Revision: https://reviews.llvm.org/D83089