Currently there is only a member version of isEquality(),
which requires an actual [IF]CmpInst to be avaliable,
which isn't always possible, and is inconsistent with
the general pattern here.
I wanted to use it in a new patch, but it wasn't there..
This patch changes the intrinsics cost model to assume that by default
target intrinsics are cheap. This didn't seem to be the case for all
intrinsics, and is potentially an MVE problem due to our scalarization
overheads. Cheap seems to be a good default in general though.
Differential Revision: https://reviews.llvm.org/D90597
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
This adds support for scalable vector types in the C API and in
llvm-c-test, and also adds a test to ensure that llvm-c-test can properly
roundtrip operations involving scalable vectors.
While creating this diff, I discovered that the C API cannot properly roundtrip
_constant expressions_ involving shufflevector / scalable vectors, but that
seems to be a separate enough issue that I plan to address it in a future diff
(unless reviewers feel it should be addressed here).
Differential Revision: https://reviews.llvm.org/D89816
The support of a few debug info attributes specifically for Fortran
arrays have been added to LLVM recently, but there's no way to take
advantage of them through DIBuilder. This patch extends
DIBuilder::createArrayType to enable the settings of those attributes.
Patch by Chih-Ping Chen!
Differential Review: https://reviews.llvm.org/D90323
This is needed to support fortran assumed rank arrays which
have runtime rank.
Summary:
Fortran assumed rank arrays have dynamic rank. DWARF TAG
DW_TAG_generic_subrange is needed to support that.
Testing:
unit test cases added (hand-written)
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89218
These logically belong together since it's a base commit plus
followup fixes to less common build configurations.
The patches are:
Revert "CfgInterface: rename interface() to getInterface()"
This reverts commit a74fc48158.
Revert "Wrap CfgTraitsFor in namespace llvm to please GCC 5"
This reverts commit f2a06875b6.
Revert "Try to make GCC5 happy about the CfgTraits thing"
This reverts commit 03a5f7ce12.
Revert "Introduce CfgTraits abstraction"
This reverts commit c0cdd22c72.
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
The support of a few debug info attributes specifically for Fortran
arrays have been added to LLVM recently, but there's no way to take
advantage of them through DIBuilder. This patch extends
DIBuilder::createArrayType to enable the settings of those attributes.
Patch by Chih-Ping Chen!
Differential Revision: https://reviews.llvm.org/D89817
Change `ConstantDataSequential::Next` to a
`unique_ptr<ConstantDataSequential>` and update `CDSConstants` to a
`StringMap<unique_ptr<ConstantDataSequential>>`, making the ownership
more obvious.
Differential Revision: https://reviews.llvm.org/D90083
This is a long-delayed follow-up to
5e5b85098d.
`TempMDNode` includes a bunch of machinery for RAUW, and should only be
used when necessary. RAUW wasn't being used in any of these cases... it
was just a placeholder for a self-reference.
Where the real node was using `MDNode::getDistinct`, just replace the
temporary argument with `nullptr`.
Where the real node was using `MDNode::get`, the `replaceOperandWith`
call was "promoting" the node to a distinct one implicitly due to
self-reference detection in `MDNode::handleChangedOperand`. The
`TempMDNode` was serving a purpose by delaying uniquing, but it's way
simpler to just call `MDNode::getDistinct` in the first place.
Note that using a self-reference at all in these places is a hold-over
from before `distinct` metadata existed. It was an old trick to create
distinct nodes. It would be intrusive to change, including bitcode
upgrades, etc., and it's harmless so I'm not sure there's much value in
removing it from existing schemas. After this commit it still has a tiny
memory cost (in the extra metadata operand) but no more overhead in
construction.
Differential Revision: https://reviews.llvm.org/D90079
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.
It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.
While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u
Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining. Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.
Fixes pr/47479.
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D87956
Now there are two main classes in Value hierarchy, which support metadata,
these are Instruction and GlobalObject. They implement different APIs for
metadata manipulation, which however overlap. This change moves metadata
manipulation code into Value, so descendant classes can use this code for
their operations on metadata.
No functional changes intended.
Differential Revision: https://reviews.llvm.org/D67626
Per asbirlea's comment, assert that only instructions, constants
and arguments are passed to this API. Simplify returning true
would not be correct for special Value subclasses like MemoryAccess.
Non-instruction defs like arguments, constants or global values
always dominate all instructions/uses inside the function. This
case currently needs to be treated separately by the caller, see
https://reviews.llvm.org/D89623#inline-832818 for an example.
This patch makes the dominator tree APIs accept a Value instead of
an Instruction and always returns true for the non-Instruction case.
A complication here is that BasicBlocks are also Values. For that
reason we can't support the dominates(Value *, BasicBlock *)
variant, as it would conflict with dominates(BasicBlock *, BasicBlock *),
which has different semantics. For the other two APIs we assert
that the passed value is not a BasicBlock.
Differential Revision: https://reviews.llvm.org/D89632
For GC parseable element atomic memcpy/memmove we'll need to
shuffle statepoint arguments. Make it possible by storing the
arguments as Value *, not Use *.
This patch teaches BasicBlock::print to construct an instance of
SlotTracker with the containing function.
Without this patch, we dump:
*** IR Dump After LoopInstSimplifyPass ***
; Preheader:
br label %1
; Loop:
<badref>: ; preds = %1, %0
br label %1
Note "<badref>" above. This happens because BasicBlock::print calls:
SlotTracker SlotTable(this->getModule());
Note that this constructor does not add the contents of functions to
the slot table. That is, basic blocks are left unnumbered.
This patch fixes the problem by switching to:
SlotTracker SlotTable(this->getParent());
which does add the contents of the Module and the function,
this->getParent(), to the slot table.
Differential Revision: https://reviews.llvm.org/D89567
This is to simplify icmp instructions in the form like:
%cmp = icmp eq i32 (i8*, i8*)* bitcast (i32 (i32**, i32**)* @f32 to i32
%(i8*, i8*)), bitcast (i32 (i64**, i64**) @f64 to i32 (i8*, i8*)*)
Here @f32 and @f64 are two functions.
Differential Revision: https://reviews.llvm.org/D87850
When generating the use-list order, also consider value uses that are
operands which are wrapped in metadata; e.g. llvm.dbg.value operands.
This fixes PR36778. The test case is based on the reproducer from that
report.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D53758
The CfgTraits abstraction simplfies writing algorithms that are
generic over the type of CFG, and enables writing such algorithms
as regular non-template code that operates on opaque references
to CFG blocks and values.
Implementations of CfgTraits provide operations on the concrete
CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock *`.
CfgInterface is an abstract base class which provides operations
on opaque types CfgBlockRef and CfgValueRef. Those opaque types
encapsulate a `void *`, but the meaning depends on the concrete
CFG type. For example, MachineCfgTraits -- for use with MachineIR
in SSA form -- encodes a Register inside CfgValueRef. Converting
between concrete references and opaque/generic ones is done by
CfgTraits::{fromGeneric,toGeneric}. Convenience methods
CfgTraits::{un}wrap{Iterator,Range} are available as well.
Writing algorithms in terms of CfgInterface adds some overhead
(virtual method calls, plus in same cases it removes the
opportunity to inline iterators), but can be much more convenient
since generic algorithms can be written as non-templates.
This patch adds implementations of CfgTraits for all CFGs on
which dominator trees are calculated, so that the dominator
tree can be ported to this machinery. Only IrCfgTraits (LLVM IR)
and MachineCfgTraits (Machine IR in SSA form) are complete, the
other implementations are limited to the absolute minimum
required to make the upcoming dominator tree changes work.
v5:
- fix MachineCfgTraits::blockdef_iterator and allow it to iterate over
the instructions in a bundle
- use MachineBasicBlock::printName
v6:
- implement predecessors/successors for all CfgTraits implementations
- fix error in unwrapRange
- rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming
that is consistent with {wrap,unwrap}{Iterator,Range}
- use getVRegDef instead of getUniqueVRegDef
v7:
- std::forward fix in wrapping_iterator
- fix typos
v8:
- cleanup operators on CfgOpaqueType
- address other review comments
Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d
Differential Revision: https://reviews.llvm.org/D83088
This adds the LLVM IR attribute `mustprogress` as defined in LangRef through D86233. This attribute will be applied to functions with in languages like C++ where forward progress is guaranteed. Functions without this attribute are not required to make progress.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D85393
LLVM rejects DWARF operator DW_OP_over. This DWARF operator is needed
for Flang to support assumed rank array.
Summary:
Currently LLVM rejects DWARF operator DW_OP_over. Below error is
produced when llvm finds this operator.
[..]
invalid expression
!DIExpression(151, 20, 16, 48, 30, 35, 80, 34, 6)
warning: ignoring invalid debug info in over.ll
[..]
There were some parts missing in support of this operator, which are
now completed.
Testing
-added a unit testcase
-check-debuginfo
-check-llvm
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89208
Similar to MCSymbol::print in 3d6c8ebb58
(llvm-svn: 81682, PR4966), these symbols may need to be quoted to be handled by
the linker correctly.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D87099
This was broken by 16295d521e, when
instructions started being handled and not just constant
expressions. This was re-inserting an equivalent bitcast to the
original memcpy operand, which made a non-functional IR change on
every iteration.
This also fixes a secondary problem where it was inserting
addrspacecasts which may not have been legal (i.e. it changed the
source address space). Start visiting all pointer users and fail out
if we can't process them. Also start handling the relevant memory
intrinsic users. These cases can be dealt with by running
InferAddressSpaces separately.
This patch adds support for assemble disassemble intrinsics
for MMA.
Reviewed By: bsaleil, #powerpc
Differential Revision: https://reviews.llvm.org/D88739
Based on the recent patches D88475 and D88429 where we are losing undef values due to extension/comparisons.
I've added a Constant::mergeUndefsWith method that merges the undef scalar/elements from another Constant into a specific Constant.
Differential Revision: https://reviews.llvm.org/D88687
This exposes the helper for other power-of-2 instcombine folds that I'm intending to add vector support to.
The helper only operated on power-of-2 constants so getExactLogBase2 is a more accurate name.
This patch adds support for DWARF attribute DW_AT_rank.
Summary:
Fortran assumed rank arrays have dynamic rank. DWARF attribute
DW_AT_rank is needed to support that.
Testing:
unit test cases added (hand-written)
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89141
It is possible to get a fltSemantics of a particular Type,
but there is no way to produce a Type based on a
fltSemantics.
This adds the function Type::getFloatingPointTy, which
will return the appropriate floating point Type for a given
fltSemantics.
ConstantFP is modified to use this function instead of
implementing it itself. Also some minor refactors to use
Type::getFltSemantics instead of a hand-rolled version.
Differential Revision: https://reviews.llvm.org/D87512
Drop `noundef` for return values that are replaced by void and make it
illegal to put `noundef` on a void value.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87306
Alignment attributes need to be dropped for non-pointer values.
This also introduces a check into the verifier to ensure you don't use
`align` on anything but a pointer. Test needed to be adjusted
accordingly.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87304
This is an alternate fix (see D87835) for a bug where a NaN constant
gets wrongly transformed into Infinity via truncation.
In this patch, we uniformly convert any SNaN to QNaN while raising
'invalid op'.
But we don't have a way to directly specify a 32-bit SNaN value in LLVM IR,
so those are always encoded/decoded by calling convert from/to 64-bit hex.
See D88664 for a clang fix needed to allow this change.
Differential Revision: https://reviews.llvm.org/D88238
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.
The code introduces an abstract template base class that generalizes the
comparison of IRs that takes an IR representation as template parameter.
Derived classes provide overrides that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.
The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.
Reviewed By: aeubanks (Arthur Eubanks), yrouban (Yevgeny Rouban), ychen (Yuanfang Chen), MaskRay (Fangrui Song)
Differential Revision: https://reviews.llvm.org/D86360
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88398
This came from @lebedev.ri's suggestion to use m_SpecificInt_ICMP for D88429 - since I was going to change the m_APInt to m_Constant for that patch I thought I would do it for the only other user of the APInt first.
I've added a ConstantExpr::getUMin helper - its trivial to add UMAX/SMIN/SMAX but thought I'd wait until we have use cases.
Differential Revision: https://reviews.llvm.org/D88475
This reverts commit 55c4ff91bd.
Issues were introduced as discussed in https://reviews.llvm.org/D88241
where this change made previous bugs in the linker and BitCodeWriter
visible.
It is not a good idea to expose raw constants in the LLVM C API. Replace this with an explicit getter.
Differential Revision: https://reviews.llvm.org/D88367
This commit fixes a regression (from LLVM 10 to LLVM 11 RC3) in the LLVM
C API.
Previously, commit 1ee6ec2bf removed the mask operand from the
ShuffleVector instruction, storing the mask data separately in the
instruction instead; this reduced the number of operands of
ShuffleVector from 3 to 2. AFAICT, this change unintentionally caused
a regression in the LLVM C API. Specifically, it is no longer possible
to get the mask of a ShuffleVector instruction through the C API. This
patch introduces new functions which together allow a C API user to get
the mask of a ShuffleVector instruction, restoring the functionality
which was previously available through LLVMGetOperand().
This patch also adds tests for this change to the llvm-c-test
executable, which involved adding support for InsertElement,
ExtractElement, and ShuffleVector itself (as well as constant vectors)
to echo.cpp. Previously, vector operations weren't tested at all in
echo.ll.
I also fixed some typos in comments and help-text nearby these changes,
which I happened to spot while developing this patch. Since the typo
fixes are technically unrelated other than being in the same files, I'm
happy to take them out if you'd rather they not be included in the patch.
Differential Revision: https://reviews.llvm.org/D88190
attachments. They would crash the backend, which expects all
DISubprograms that are not part of the type system to have a unit field.
Clang right before https://reviews.llvm.org/D79967 would generate this
kind of broken IR.
rdar://problem/69534688
Thanks to Fangrui for fixing an assembler test I had missed!
https://reviews.llvm.org/D88270
Make the corresponding change that was made for byval in
b7141207a4. Like byval, this requires a
bulk update of the test IR tests to include the type before this can
be mandatory.
attachments. They would crash the backend, which expects all
DISubprograms that are not part of the type system to have a unit field.
Clang right before https://reviews.llvm.org/D79967 would generate this
kind of broken IR.
rdar://problem/69534688
Introduce a helper which can be used to update the debug location of an
Instruction after the instruction is hoisted. This can be used to safely
drop a source location as recommended by the docs.
For more context, see the discussion in https://reviews.llvm.org/D60913.
Differential Revision: https://reviews.llvm.org/D85670
The langref already states it does, but this wasn't implemented. Also
covers inalloca and preallocated. Also helps fix a dependence on
pointer element types.
Similar to the ConstantRange::getActiveBits(), and to similarly-named
methods in APInt, returns the bitwidth needed to represent
the given signed constant range
Much like APInt::getActiveBits(), computes how many bits are needed
to be able to represent every value in this constant range,
treating the values as unsigned.
Use the fact that `~X` is equivalent to `-1 - X`, which gives us
fully-precise answer, and we only need to special-handle the wrapped case.
This fires ~16k times for vanilla llvm test-suite + RawSpeed.
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.
The code introduces a template base class that generalizes the comparison
of IRs that takes an IR representation as template parameter. The
constructor takes a series of lambdas that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.
The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.
Reviewed By: aeubanks (Arthur Eubanks), yrouban (Yevgeny Rouban), ychen (Yuanfang Chen)
Differential Revision: https://reviews.llvm.org/D86360
This patch prevents the `llvm.masked.gather` and `llvm.masked.scatter` intrinsics to be scalarized when invoked on scalable vectors.
The change in `Function.cpp` is needed to prevent the warning that is raised when `getNumElements` is used in place of `getElementCount` on `VectorType` instances. The tests guards for regressions on this change.
The tests makes sure that calls to `llvm.masked.[gather|scatter]` are still scalarized when:
# the intrinsics are operating on fixed size vectors, and
# the compiler is not targeting fixed length SVE code generation.
Reviewed By: efriedma, sdesmalen
Differential Revision: https://reviews.llvm.org/D86249
This is needed to support assumed size array of fortran which can have missing upperBound/count
, contrary to current DISubrange support.
Example:
subroutine sub (array1, array2)
integer :: array1 (*)
integer :: array2 (4:9, 10:*)
array1(7:8) = 9
array2(5, 10) = 10
end subroutine
Now the validation check is relaxed for fortran.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D87500
We're now getting close to having the necessary analysis/combines etc. for the new generic llvm smax/smin/umax/umin intrinsics.
This patch updates the SSE/AVX integer MINMAX intrinsics to emit the generic equivalents instead of the icmp+select code pattern.
Differential Revision: https://reviews.llvm.org/D87603
I've amended the isLoadInvariantInLoop function to bail out for
scalable vectors for now since the invariant.start intrinsic is only
ever generated by the clang frontend for thread locals or struct
and class constructors, neither of which support sizeless types.
In addition, the intrinsic itself does not currently support the
concept of a scaled size, which makes it impossible to compare
the sizes of different scalable objects, e.g. <vscale x 32 x i8>
and <vscale x 16 x i8>.
Added new tests here:
Transforms/LICM/AArch64/sve-load-hoist.ll
Transforms/LICM/hoisting.ll
Differential Revision: https://reviews.llvm.org/D87227
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
In particular, we shouldn't make assumptions about globals which are
unnamed_addr: we can fold them together with other globals.
Also while I'm here, use isInterposable() instead of trying to
explicitly name all the different kinds of weak linkage.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47090
Differential Revision: https://reviews.llvm.org/D87123
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.
The code introduces a template base class that generalizes the comparison
of IRs that takes an IR representation as template parameter. The
constructor takes a series of lambdas that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.
The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.
See https://hotcrp.llvm.org/usllvm2020/paper/29 for more information.
Reviewed By: yrouban (Yevgeny Rouban)
Differential Revision: https://reviews.llvm.org/D86360
This also changes -lint from an analysis to a pass. It's similar to
-verify, and that is a normal pass, and lives in llvm/IR.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D87057
The 1st try was reverted because I missed an assert that
needed softening.
As discussed in D86798 / rG09652721 , we were potentially
returning a different result for whether an Instruction
is commutable depending on if we call the base class or
derived class method.
This requires relaxing asserts in GVN, but that pass
seems to be working otherwise.
NewGVN requires more work because it uses different
code paths for numbering binops and calls.
As discussed in D86798 / rG09652721 , we were potentially
returning a different result for whether an Instruction
is commutable depending on if we call the base class or
derived class method.
This requires relaxing an assert in GVN, but that pass
seems to be working otherwise.
NewGVN requires more work because it uses different
code paths for numbering binops and calls.
The original take was 6102310d81,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.
However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb
that reverted it :
> It appears to cause compilation non-determinism and caused stage3 mismatches.
However InstCombine already does many different optimizations,
so it should be a safe place to do it here.
Note that we still can't just compare incoming values ranges,
because there is no guarantee that these PHI's we'd simplify to
were already re-visited and sorted.
However coming up with a test is problematic.
Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name | baseline | proposed | Δ | % | |%| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instcombine.NumPHICSEs | 0 | 22228 | 22228 | 0.00% | 0.00% |
| asm-printer.EmittedInsts | 7942329 | 7942456 | 127 | 0.00% | 0.00% |
| assembler.ObjectBytes | 254295632 | 254313792 | 18160 | 0.01% | 0.01% |
| early-cse.NumCSE | 2183283 | 2183272 | -11 | 0.00% | 0.00% |
| early-cse.NumSimplify | 550105 | 541842 | -8263 | -1.50% | 1.50% |
| instcombine.NumAggregateReconstructionsSimplified | 73 | 4506 | 4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined | 3640311 | 3666911 | 26600 | 0.73% | 0.73% |
| instcombine.NumDeadInst | 1778204 | 1783318 | 5114 | 0.29% | 0.29% |
| instcount.NumCallInst | 1758395 | 1758804 | 409 | 0.02% | 0.02% |
| instcount.NumInvokeInst | 59478 | 59502 | 24 | 0.04% | 0.04% |
| instcount.NumPHIInst | 330557 | 330549 | -8 | 0.00% | 0.00% |
| instcount.TotalBlocks | 1077138 | 1077221 | 83 | 0.01% | 0.01% |
| instcount.TotalFuncs | 101442 | 101441 | -1 | 0.00% | 0.00% |
| instcount.TotalInsts | 8831946 | 8832611 | 665 | 0.01% | 0.01% |
| simplifycfg.NumInvokes | 4300 | 4410 | 110 | 2.56% | 2.56% |
| simplifycfg.NumSimpl | 1019813 | 999740 | -20073 | -1.97% | 1.97% |
```
So it fires ~22k times, which is less than ~24k the take 1 did.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).
All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
Rather than calling hasFnAttribute and then calling getFnAttribute
if the attribute exists, its better to just call getFnAttribute and
then check if we got a valid attribute back.
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.
So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.
This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.
Differential Revision: https://reviews.llvm.org/D86744
This patch fixes this crash https://gcc.godbolt.org/z/Ps8d1e
And gives SROA the ability to remove assumes if it allows promoting an alloca to register
Without removing assumes when it can't promote to register.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86570
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().
Differential Revision: https://reviews.llvm.org/D86065
This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).
The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.
This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D86146
Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify,
nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..
While i could teach EarlyCSE how to hash PHI nodes,
we can't really do much (anything?) even if we find two identical
PHI nodes in different basic blocks, same-BB case is the interesting one,
and if we teach InstSimplify about it (which is what i wanted originally,
https://reviews.llvm.org/D86530), we get EarlyCSE support for free.
So i would think this is pretty uncontroversial.
On vanilla llvm test-suite + RawSpeed, this has the following effects:
```
| statistic name | baseline | proposed | Δ | % | \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instsimplify.NumPHICSE | 0 | 23779 | 23779 | 0.00% | 0.00% |
| asm-printer.EmittedInsts | 7942328 | 7942392 | 64 | 0.00% | 0.00% |
| assembler.ObjectBytes | 273069192 | 273084704 | 15512 | 0.01% | 0.01% |
| correlated-value-propagation.NumPhis | 18412 | 18539 | 127 | 0.69% | 0.69% |
| early-cse.NumCSE | 2183283 | 2183227 | -56 | 0.00% | 0.00% |
| early-cse.NumSimplify | 550105 | 542090 | -8015 | -1.46% | 1.46% |
| instcombine.NumAggregateReconstructionsSimplified | 73 | 4506 | 4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined | 3640264 | 3664769 | 24505 | 0.67% | 0.67% |
| instcombine.NumDeadInst | 1778193 | 1783183 | 4990 | 0.28% | 0.28% |
| instcount.NumCallInst | 1758401 | 1758799 | 398 | 0.02% | 0.02% |
| instcount.NumInvokeInst | 59478 | 59502 | 24 | 0.04% | 0.04% |
| instcount.NumPHIInst | 330557 | 330533 | -24 | -0.01% | 0.01% |
| instcount.TotalInsts | 8831952 | 8832286 | 334 | 0.00% | 0.00% |
| simplifycfg.NumInvokes | 4300 | 4410 | 110 | 2.56% | 2.56% |
| simplifycfg.NumSimpl | 1019808 | 999607 | -20201 | -1.98% | 1.98% |
```
I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call`
transforms, and counter-intuitively results in *more* instructions total.
That being said, the PHI count doesn't decrease that much,
and looking at some examples, it seems at least some of them
were previously getting PHI CSE'd in SimplifyCFG of all places..
I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time.
As a comment in `InstCombinerImpl::visitPHINode()` already stated,
there are no guarantees on the ordering of the operands of a PHI node,
so if we just naively compare them, we may false-negatively say that
the nodes are not equal when the only difference is operand order,
which is especially important since the fold is in InstSimplify,
so we can't rely on InstCombine sorting them beforehand.
Fixing this for the general case is costly (geomean +0.02%),
and does not appear to catch anything in test-suite, but for
the same-BB case, it's trivial, so let's fix at least that.
As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
this appears to cause geomean +0.03% compile time increase (regression),
but geomean -0.01%..-0.04% code size decrease (improvement).
This reverts commit 8d5f64c4ed.
Thanks to Eli Friedma for pointing out that this check is not appropiate here,
this check will be moved to the Lint pass.
As FIXME said, they really should be checking for a single user,
not use, so let's do that. It is not *that* unusual to have
the same value as incoming value in a PHI node, not unlike
how a PHI may have the same incoming basic block more than once.
There isn't a nice way to do that, Value::users() isn't uniqified,
and Value only tracks it's uses, not Users, so the check is
potentially costly since it does indeed potentially involes
traversing the entire use list of a value.
This adapts the verifier checks for intrinsic get.active.lane.mask to the new
semantics of it as described in D86147. I.e., the second argument %n, which
corresponds to the loop tripcount, must be greater than 0 if it is a constant,
so check that.
Differential Revision: https://reviews.llvm.org/D86301
Changes:
* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Replaced the uses of `llvm::SmallSet<ElementCount, ...>` with
`llvm::SmallSetVector<ElementCount, ...>`. This avoids the need of an
ordering function for the `ElementCount` class.
* Bits and pieces around printing the `ElementCount` to string streams.
To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:
1. When it doesn't make sense to deal with the scalable property, for
example:
a. When computing unrolling factors.
b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.
Note that in some places the idiom
```
unsigned VF = ...
auto VTy = FixedVectorType::get(ScalarTy, VF)
```
has been replaced with
```
ElementCount VF = ...
assert(!VF.Scalable && ...);
auto VTy = VectorType::get(ScalarTy, VF)
```
The assertion guarantees that the new code is (at least in debug mode)
functionally equivalent to the old version. Notice that this change had been
possible because none of the methods that are specific to `FixedVectorType`
were used after the instantiation of `VTy`.
Reviewed By: rengolin, ctetreau
Differential Revision: https://reviews.llvm.org/D85794
Changes:
* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Add `<` operator to `ElementCount` to be able to use
`llvm::SmallSetVector<ElementCount, ...>`.
* Bits and pieces around printing the ElementCount to string streams.
* Added a static method to `ElementCount` to represent a scalar.
To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:
1. When it doesn't make sense to deal with the scalable property, for
example:
a. When computing unrolling factors.
b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.
Differential Revision: https://reviews.llvm.org/D85794
This patch adds support for representing Fortran `character(n)`.
Primarily patch is based out of D54114 with appropriate modifications.
Test case IR is generated using our downstream classic-flang. We're in process
of upstreaming flang PR's but classic-flang has dependencies on llvm, so
this has to get in first.
Patch includes functional test case for both IR and corresponding
dwarf, furthermore it has been manually tested as well using GDB.
Source snippet:
```
program assumedLength
call sub('Hello')
call sub('Goodbye')
contains
subroutine sub(string)
implicit none
character(len=*), intent(in) :: string
print *, string
end subroutine sub
end program assumedLength
```
GDB:
```
(gdb) ptype string
type = character (5)
(gdb) p string
$1 = 'Hello'
```
Reviewed By: aprantl, schweitz
Differential Revision: https://reviews.llvm.org/D86305
Extend the `applyUpdates` in DominatorTree to allow a post CFG view,
different from the current CFG.
This patch implements the functionality of updating an already up to
date DT, to the desired PostCFGView.
Combining a set of updates towards an up to date DT and a PostCFGView is
not yet supported.
Differential Revision: https://reviews.llvm.org/D85472
If some of gc live value are not used in gc.relocate we can remove them
from gc-live bundle of statepoint instruction.
Also the CL removes duplicated Values in gc-live bundle.
Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85959
Currently ConstantExpr::getWithOperands does not handle FNeg and
subsequently treats FNeg as binary operator, leading to an assertion
failure or segmentation fault if built without assertions.
Originally I reproduced this with llvm-dis on a bitcode file, which I
unfortunately cannot share and also cannot really reduce.
But PR45426 describes the same issue and has a reproducer with Clang, so
I'll go with that.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D86274
Both AfterPass and AfterPassInvalidated pass instrumentation
callbacks get additional parameter of type PreservedAnalyses.
This patch was created by @fedor.sergeev. I have just slightly
changed it.
Reviewers: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D81555
We don't need a std::string for a literal string, we can use a
StringRef.
The addition of StringRefs produces a Twine that we can just call
str() without converting to a SmallString ourselves. Twine will
do that internally.
Existing implementation always aborts on syntax errors in a DataLayout
description. While this is meaningful for consuming textual IR modules, it is
inconvenient for users that may need fine-grained control over the layout from,
e.g., command-line options. Propagate errors through the parsing functions and
only abort in the top-level parsing function instead.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D85650
This code becomes dead for valid IR after 48f4312 and a96fc46. The reason for the test change is that the verifier reports the first verification error encountered, in some non-specified visit order. By removing the verification code in gc.relocates for a statepoint with inline gc operands, I change the error the verifier reports. And in one case, the checked for error is no longer possible with the bundle representation, so I simply delete the file.
The "gc-live" operand bundles were recently added, and all tests have been updated to use that format. A migration period was provided, though it's worth noting these intrinsics are experimental, so formally there is no compatibile requirement.
This is an extension to a96fc46. "gc-live" hadn't been implemented at the point that patch was initially posted.
(Forgot to land this a couple of weeks back.)
In a recent series of changes, I've introduced support for using the respective operand bundle kinds on the statepoint. At the moment, code supports either/or, but there's no need to keep the old support around. For the moment, I am simply changing the specification and verifier to require zero length argument sets in the intrinsic.
The intrinsic itself is experimental. Given that, there's no forward serialization needed. The in tree uses and generation have already been updated to use the new operand bundle based forms, the only folks broken by the change will be those with frontends generating statepoints directly and the updates should be easy.
Why not go ahead and just remove the arguments entirely? Well, I plan to. But while working on this I've found that almost all of the arguments to the statepoint can be expressed via operand bundles or attributes. Given that, I'm planning a radical simplification of the arguments and figured I'd do one update not several small ones.
Differential Revision: https://reviews.llvm.org/D80892
This avoid GUID lookup in Index.findSummaryInModule.
Follow up for D81242.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D85269
This patch restricts the behaviour of referencing via .Lfoo$local
local aliases, introduced in https://reviews.llvm.org/D73230, to
STV_DEFAULT globals only.
Hidden symbols via --fvisiblity=hidden (https://gcc.gnu.org/wiki/Visibility)
is an important scenario.
Benefits:
- Improves the size of object files by using fewer STT_SECTION symbols.
- The code reads a bit better (it was not obvious to me without going
back to the code reviews why the canBenefitFromLocalAlias function
currently doesn't consider visibility).
- There is also a side benefit in restoring the effectiveness of the
--wrap linker option and making the behavior of --wrap consistent
between LTO and normal builds for references within a translation-unit.
Note: this --wrap behavior (which is specific to LLD) should not be
considered reliable. See comments on https://reviews.llvm.org/D73230
for more.
Differential Revision: https://reviews.llvm.org/D85782
Previously ConstantFoldExtractElementInstruction() would only work with
insertelement instructions, not contants. This properly handles
insertelement constants as well.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85865
This recommits the following patches now that D85684 has landed
1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
Introduce a helper on Instruction which can be used to update the debug
location after hoisting.
Use this in GVN and LICM, where we were mistakenly introducing new line
0 locations after hoisting (the docs recommend dropping the location in
this case).
For more context, see the discussion in https://reviews.llvm.org/D60913.
Differential Revision: https://reviews.llvm.org/D85670
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D85385
Replace the `ident_t` handling in Clang with the methods offered by the
OMPIRBuilder. This cuts down on the clang code as well as the
differences between the two, making further transitions easier. Tests
have changed but there should not be a real functional change. The most
interesting difference is probably that we stop generating local ident_t
allocations for now and just use globals. Given that this happens only
with debug info, the location part of the `ident_t` is probably bigger
than the test anyway. As the location part is already a global, we can
avoid the allocation, memcpy, and store in favor of a constant global
that is slightly bigger. This can be revisited if there are
complications.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D80735
The history of dropTriviallyDeadConstantArrays is like this. Because the appending linkage uses too much memory (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150105/251381.html), dropTriviallyDeadConstantArrays was introduced (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to release unused constant arrays. Recently, dropTriviallyDeadConstantArrays was improved (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to reduce its quadratic cost.
Our recent LTO profiling shows that when a target is large, 15-20% of time cost is from the SetVector::insert called by dropTriviallyDeadConstantArrays.
A large application has hundreds or thousands of modules; each module calls dropTriviallyDeadConstantArrays once for cleaning up tens of thousands of ConstantArrays a module has. In those ConstantArrays, usually around 5 can be deleted; a very very few deleted ConstantArrays reference other ConstantArrays: less than 10 out of millions.
Given this, the cost of SetVector::insert is mainly from the construction of WorkList from ArrayConstants. This motivated the fix that iterates ArrayConstants directly, and uses WorkList only when necessary.
Our evaluation shows that
1) The cumulative time percentage of dropTriviallyDeadConstantArrays is reduced from 15-17% to 4-6%.
2) For targets with LTO time > 20min, the time reduction is about 20%.
3) No observable performance impact for build without using LTO.
{F12506218}
{F12506221}
Reviewed By: mehdi_amini, tejohnson, jdoerfert
Differential Revision: https://reviews.llvm.org/D85379
As noted on PR46885, the number of mask elements should always be a power of 2, so to fix the static analyzer warning we are better off replacing the condition to <= 4, and I've added a pow2 assertion as well.
`DenseMapAPIntKeyInfo` is now located in `lib/IR/LLVMContextImpl.h`.
Moved it into `include/ADT/DenseMapInfo.h` to use it.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85131
As discussed on D81500, this adds a more general ElementCount variant of the build helper and converts the (non-scalable) unsigned NumElts variant to use it internally.
This is a split patch of D80991.
This patch introduces AAPotentialValues and its interface only.
For more detail of AAPotentialValues abstract attribute, see the original patch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D83283
I found that propagateAttributes was ~23% of a thin link's run time
(almost 4x higher than the second hottest function). The main reason is
that it re-examines a global var each time it is referenced. This
becomes unnecessary once it is marked both non read only and non write
only. I added a set to avoid doing redundant work, which dropped the
runtime of that thin link by almost 15%.
I made a smaller efficiency improvement (no measurable impact) to skip
all summaries for a VI if the first copy is dead. I added an assert to
ensure that all copies are dead if any is. The code in
computeDeadSymbols marks all summaries for a VI as live. There is one
corner case where it was skipping marking an alias as live, that I
fixed. However, since the code earlier marked all copies of a preserved
GUID's VI as live, and each 'visit' marks all copies live, the only case
where this could make a difference is summaries that were marked live
when they were built initially, and that is only a few special compiler
generated symbols and inline assembly symbols, so it likely is never
provoked in practice.
Differential Revision: https://reviews.llvm.org/D84985
Pass the abs poison flag to the underlying ConstantRange
implementation, allowing CVP to simplify based on it.
Importantly, this recognizes that abs with poison flag is actually
non-negative...
Problem:
Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order.
Solution:
Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder)
This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order.
This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want.
The test fixes looks messy. It includes changes:
- Remove PassInstrumentationAnalysis
- Remove PassAdaptor
- If a PassAdaptor is for a real pass, the pass is added
- Pass reorder (to the correct order), related to PassAdaptor
- Add missing passes (due to Debuglogging not passed down)
Reviewed By: asbirlea, aeubanks
Differential Revision: https://reviews.llvm.org/D84774
This adds a common API for compute constant ranges of intrinsics.
The intention here is that
a) we can reuse the same code across different passes that handle
constant ranges, i.e. this can be reused in SCCP
b) we only have to add knowledge about supported intrinsics to
ConstantRange, not any consumers.
Differential Revision: https://reviews.llvm.org/D84587
All analysis are preserved if there's no local change, and thanks to
3667d87a33 this property is enforced for all
passes.
Skipping the dependency computation improves the performance when there's a lot
of small functions, where only a few change happen.
Thanks to Nikita Popov who provided this numbers (extract below)
https://llvm-compile-time-tracker.com/compare.php?from=183342c0a9850e60dd7a004b651c83dfb3a7d25e&to=f2f91e6a2743070471cc9471e4e8c646e50c653c&stat=instructions
O3: (number of instructions)
Benchmark Old New
kimwitu++ 60783M 59968M (-1.34%)
sqlite3 73200M 73083M (-0.16%)
consumer-typeset 52776M 52712M (-0.12%)
Bullet 133709M 132940M (-0.58%)
tramp3d-v4 123864M 123186M (-0.55%)
mafft 55534M 55477M (-0.10%)
ClamAV 76292M 76164M (-0.17%)
lencod 103190M 103061M (-0.13%)
SPASS 64068M 63713M (-0.55%)
7zip 197332M 196308M (-0.52%)
geomean 85750M 85389M (-0.42%)
Differential Revision: https://reviews.llvm.org/D80707
This is the part of the patch that's moving the Updates to a CFGDiff
object. Splitting off from the clean-up work merging the two branches when BUI is null.
Differential Revision: https://reviews.llvm.org/D77341
This is the first of two patches to address PR46753. We basically allow
mem2reg to promote allocas that are used in doppable instructions, for
now that means `llvm.assume`. The uses of the alloca (or a bitcast or
zero offset GEP from there) are replaced by `undef` in the droppable
instructions.
Reviewed By: Tyker
Differential Revision: https://reviews.llvm.org/D83976
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.
This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.
D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).
This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic
A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.
This allows to move about 3000 lines out from InstCombine to the targets.
Differential Revision: https://reviews.llvm.org/D81728
The new implementation makes it clear that there are exactly two
conditional stores (after the initial no-op optimization). By contrast
the old implementation had seven conditionals, some hidden inside other
functions.
This commit can change the order of operands in operand lists, hence the
tweak to one test case.
Differential Revision: https://reviews.llvm.org/D80116