With the removal of OrcRPCExecutorProcessControl and OrcRPCTPCServer in
6aeed7b19c the ORC RPC library no longer has any in-tree users.
Clients needing serialization for ORC should move to Simple Packed
Serialization (usually by adopting SimpleRemoteEPC for remote JITing).
We expose the fact that we rely on unsigned wrapping to iterate through
all indexes. This can be confusing. Rather, keeping it as an
implementation detail through an iterator is less confusing and is less
code.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D110885
Summary:
for xcoff :
implement the getSymbolFlag and getSymbolType() for option --syms.
llvm-objdump --sym , if the symbol is label, print the containing section for the symbol too.
when using llvm-objdump --sym --symbol--description, print the symbol index and qualname for symbol.
for example:
--symbol-description
00000000000000c0 l .text (csect: (idx: 2) .foov[PR]) (idx: 3) .foov
and without --symbol-description
00000000000000c0 l .text (csect: .foov) .foov
Reviewers: James Henderson,Esme Yi
Differential Revision: https://reviews.llvm.org/D109452
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:
int r = a;
for (int i = 0; i < n; i++) {
if (src[i] > 3) {
r = b;
}
src[i] += 2;
}
We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.
The IR generated by clang typically looks like this:
%phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
...
%pred = icmp ugt i32 %val, i32 3
%phi.update = select i1 %pred, i32 %b, i32 %phi
We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.
Tests have been added here:
Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
Transforms/LoopVectorize/select-cmp-predicated.ll
Transforms/LoopVectorize/select-cmp.ll
Differential Revision: https://reviews.llvm.org/D108136
While these functions are only used in one location in upstream,
it has been reused in multiple downstreams. Restore this file to
a globally visibile location (outside of APInt.h) to eliminate
donwstream breakage and enable potential future reuse.
Additionally, this patch renames types and cleans up
clang-tidy issues.
Add MCDwarfLineStr class to the public API.
Note that MCDwarfLineTableHeader::Emit(), takes MCDwarfLineStr as
an Optional<> parameter making it impossible to use the API if the class
is not publicly defined.
Reviewed By: alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D109412
This avoids creating tons of make_unique template instantiations. And we
only create a unique_ptr of the actual pass concept type, rather than
creating a unique_ptr of the pass model subclass then casting it to the
pass concept type.
This reduces the work spent compiling PassBuilder.cpp from 83M -> 73M
instructions according to perf stat.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D110784
We create many instantiations of PassManager::addPass() in
PassBuilder.cpp. vector::emplace_back() and make_unique() are both
templated and would have many instantiations based on the number of
times we instantiate addPass(). Now we directly construct the
unique_ptr with the type as the actual unique_ptr type in the vector we
are adding it to, so we only have one unique_ptr constructor
instantiation across all addPass() instantiations and only the
non-templated push_back().
This makes PassBuilder.cpp slightly faster to build.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D110775
The expansion for these was updated in https://reviews.llvm.org/D47927 but the cost model was not adjusted.
I believe the cost model was also incorrect for the old expansion.
The expansion prior to D47927 used 3 icmps using LHS, RHS, and Result
to calculate theirs signs. Then 2 icmps to compare the signs. Followed
by an And. The previous cost model was using 3 icmps and 2 selects.
Digging back through git blame, those 2 selects in the cost model used to
be 2 icmps, but were changed in https://reviews.llvm.org/D90681
Differential Revision: https://reviews.llvm.org/D110739
This is analogous to D86156 (which preserves "lossy" BFI in loop
passes). Lossy means that the analysis preserved may not be up to date
with regards to new blocks that are added in loop passes, but BPI will
not contain stale pointers to basic blocks that are deleted by the loop
passes.
This is achieved through BasicBlockCallbackVH in BPI, which calls
eraseBlock that updates the data structures in BPI whenever a basic
block is deleted.
This patch does not have any changes in the upstream pipeline, since
none of the loop passes in the pipeline use BPI currently.
However, since BPI wasn't previously preserved in loop passes, the loop
predication pass was invoking BPI *on the entire
function* every time it ran in an LPM. This caused massive compile time
in our downstream LPM invocation which contained loop predication.
See updated test with an invocation of a loop-pipeline containing loop
predication and -debug-pass turned ON.
Reviewed-By: asbirlea, modimo
Differential Revision: https://reviews.llvm.org/D110438
Rather than separately handling subtraction of offset and variable
indices, make this one operation. Also rewrite the implementation
to use range-based for loops.
getScalarizationOverhead() results in a somewhat better cost estimation than counting the insertion/extraction costs directly. Notably, this is still overestimating the costs.
Original Patch by: @lebedev.ri (Roman Lebedev)
Differential Revision: https://reviews.llvm.org/D110713
The instruction has similar semantics to vbpermq but for doublewords.
It was added in Power9 and the ABI documents the builtin.
Differential revision: https://reviews.llvm.org/D107899
This patch introduces the vector-predicated version of the
experimental_vector_splice intrinsic [1] at the IR level. It considers
the active vector length for both vectors and and uses a vector mask to
disable certain lanes in the result.
[1] https://reviews.llvm.org/D94708
Change originally authored by Vineet Kumar <vineet.kumar@bsc.es>
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D103898
Similar to what SDAG does when it sees a smulo/umulo against 2
(see: `DAGCombiner::visitMULO`)
This pattern is fairly common in Swift code AFAICT.
Here's an example extracted from a Swift testcase:
https://godbolt.org/z/6cT8Mesx7
Differential Revision: https://reviews.llvm.org/D110662
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).
One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.
Previous revisions didn't properly declare the new dependencies.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110364
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).
One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110364
This patch is in a series of patches to provide builtins for
compatability with the XL compiler. This patch adds builtins for compare
exponent and test data class operations on floating point values.
Reviewed By: #powerpc, lei
Differential Revision: https://reviews.llvm.org/D109437
Try to improve vectorization of the PHI nodes by trying to vectorize
similar instructions at the size of the widest possible vectors, then
aggregating with compatible type PHIs and trying to vectoriza again and
only if this failed, try smaller sizes of the vector factors for
compatible PHI nodes. This restores performance of several benchmarks
after tuning of the fp/int conversion instructions costs.
Differential Revision: https://reviews.llvm.org/D108740
When TargetLibraryInfoImpl::isValidProtoForLibFunc is checking
function signatures to detect lib calls it may check that a parameter
or return value matches with the "size_t" type. For this to work it
has to derive the IR type matching with "size_t". Depending on if
a DataLayout is provided or not, this has been done in two different
way. Either a more strict check being based on IntPtrType (which is
given by the DataLayout) or a more relaxed check assuming that any
integer type matches with "size_t".
Given that the stricter approach exist it seems like we do not want
to trigger rewrites etc if we aren't sure that a function calls
actually match with the library function. Therefore it was questioned
why we actually have the more relaxed approach when not being able
to derive an IR type for "size_t". This patch will take a more
defensive approach, requiring that a DataLayout is passed to
isValidProtoForLibFunc.
Differential Revision: https://reviews.llvm.org/D110584
This patch is for fixing potential insertElement-related bugs like D93818.
```
V = UndefValue::get(VecTy);
for(...)
V = Builder.CreateInsertElementy(V, Elt, Idx);
=>
V = PoisonValue::get(VecTy);
for(...)
V = Builder.CreateInsertElementy(V, Elt, Idx);
```
Like above, this patch changes the placeholder V to poison.
The patch will be separated into several commits.
Reviewed By: aqjune
Differential Revision: https://reviews.llvm.org/D110311
With improved analysis in determining CFG equivalence that does
not require strict dominance and post-dominance conditions, we
now relax isSafeToMoveBefore() such that an instruction I can
be moved before InsertPoint even if they do not strictly dominate
each other, as long as they follow the same control flow path.
For example, we can move Instruction 0 before Instruction 1,
and vice versa.
```
if (cond1)
// Instruction 0: %add = add i32 1, 2
if (cond1)
// Instruction 1: %add2 = add i32 2, 1
```
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D110456
The MSP430 ABI supports build attributes for specifying
the ISA, code model, data model and enum size in ELF object files.
Differential Revision: https://reviews.llvm.org/D107969
Thinlink provides an opportunity to propagate function attributes across modules, enabling additional propagation opportunities.
This change propagates (currently default off, turn on with `disable-thinlto-funcattrs=1`) noRecurse and noUnwind based off of function summaries of the prevailing functions in bottom-up call-graph order. Testing on clang self-build:
1. There's a 35-40% increase in noUnwind functions due to the additional propagation opportunities.
2. Throughput is measured at 10-15% increase in thinlink time which itself is 1.5% of E2E link time.
Implementation-wise this adds the following summary function attributes:
1. noUnwind: function is noUnwind
2. mayThrow: function contains a non-call instruction that `Instruction::mayThrow` returns true on (e.g. windows SEH instructions)
3. hasUnknownCall: function contains calls that don't make it into the summary call-graph thus should not be propagated from (e.g. indirect for now, could add no-opt functions as well)
Testing:
Clang self-build passes and 2nd stage build passes check-all
ninja check-all with newly added tests passing
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D36850
This can avoid a loss of decoupling with the scalar unit on cores
with decoupled scalar and vector units.
We should support FP too, but those use extract_element and not a
custom ISD node so it is a little different. I also left a FIXME
in the test for i64 extract and store on RV32.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D109482
This patch adds a new RTL function for worksharing. Currently we use
`__kmpc_for_static_init` for both the `distribute` and `parallel`
portion of the loop clause. This patch replaces the `distribute` portion
with a new runtime call `__kmpc_distribute_static_init`. Currently this
will be used exactly the same way, but will make it easier in the future
to fine-tune the distribute and parallel portion of the loop.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110429