Use -0.0 instead of 0.0 as the start value. The previous use of 0.0
was fine for all existing uses of this function though, as it is
always generated with fast flags right now, and thus nsz.
Some architectures do not have general vector select instructions (e.g.
AArch64). But some cmp/select patterns can be vectorized using other
instructions/intrinsics.
One example is using min/max instructions for certain patterns.
This patch updates the cost calculations for selects in the SLP
vectorizer to consider using min/max intrinsics.
This patch does not change SLP vectorizer's codegen itself to actually
generate those intrinsics, but relies on the backends to lower the
vector cmps & selects. This keeps things simple on the SLP side and
works well in practice for AArch64.
This exposes additional SLP vectorization opportunities in some
benchmarks on AArch64 (-O3 -flto).
Metric: SLP.NumVectorInstructions
Program base slp diff
test-suite...ications/JM/ldecod/ldecod.test 502.00 697.00 38.8%
test-suite...ications/JM/lencod/lencod.test 1023.00 1414.00 38.2%
test-suite...-typeset/consumer-typeset.test 56.00 65.00 16.1%
test-suite...6/464.h264ref/464.h264ref.test 804.00 822.00 2.2%
test-suite...006/453.povray/453.povray.test 3335.00 3357.00 0.7%
test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2121.00 0.5%
test-suite...:: External/Povray/povray.test 2378.00 2382.00 0.2%
Reviewed By: RKSimon, samparker
Differential Revision: https://reviews.llvm.org/D89969
When we promote pointer arguments we did compute a wrong offset and use
a wrong type for the array case.
Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.
The following constraints hold for swifterror values:
A swifterror value (either the parameter or the alloca) can only
be loaded and stored from, or used as a swifterror argument.
This patch updates instcombine to not try to convert a bitcast of a
function into a bitcast of a swifterror argument.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D90258
This reverts commit e038b60d91.
This reverts commit a0d84d8031.
This revert was a mistake. The reason of the failures was
"Use uint64_t for branch weights instead of uint32_t"
Differential Revision: https://reviews.llvm.org/D87832
Instead of getting the defining access we should be able to use
getClobberingMemoryAccess to skip non-aliasing MemoryDefs. No additional
checks should be needed, because we only remove the starting def if it
matches the defining access of the load. All we need to worry about is
that there are no (may)alias stores between the starting def and the
load and getClobberingMemoryAccess should guarantee that.
Partly fixes PR47887.
This improves the number of redundant stores removed in some cases
(numbers below for MultiSource, SPEC2000, SPEC2006 on X86 with -flto
-O3).
Same hash: 226 (filtered out)
Remaining: 11
Metric: dse.NumRedundantStores
Program base patch1 diff
test-suite...:: External/Povray/povray.test 1.00 5.00 400.0%
test-suite...chmarks/MallocBench/gs/gs.test 1.00 3.00 200.0%
test-suite...0/253.perlbmk/253.perlbmk.test 21.00 37.00 76.2%
test-suite...0.perlbench/400.perlbench.test 24.00 37.00 54.2%
test-suite.../Applications/SPASS/SPASS.test 3.00 4.00 33.3%
test-suite...006/453.povray/453.povray.test 15.00 18.00 20.0%
test-suite...T2006/445.gobmk/445.gobmk.test 27.00 29.00 7.4%
test-suite.../CINT2006/403.gcc/403.gcc.test 136.00 137.00 0.7%
test-suite.../CINT2000/176.gcc/176.gcc.test 6.00 6.00 0.0%
test-suite.../Benchmarks/Bullet/bullet.test NaN 3.00 nan%
test-suite.../Benchmarks/Ptrdist/bc/bc.test NaN 1.00 nan%
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89647
The types of SEH aren't x86(-32) vs x64 but rather stack-based exception chaining
vs table-based exception handling. x86-32 is the only arch for which Windows
uses the former. 32-bit ARM would use what is called Win64SEH today, which
is a bit confusing so instead let's just rename it to be a bit more clear.
Reviewed By: compnerd, rnk
Differential Revision: https://reviews.llvm.org/D90117
This patch removes extraneous calls to setEdgeProbability introduced
in c91487769d.
The follow-up patch, a7b662d0f4, has
since fixed BranchProbabilityInfo::eraseBlock, so we don't need to
worry about getting stale values from getEdgeProbability.
Also, since getEdgeProbability(BB, BB->getSingleSuccessor()) returns
edge probability 1/1 by default for BB with exactly one successor
edge, we don't need to explicitly call setEdgeProbability.
This patch introduces almost no functional change, but we do end up
reducing debug messages from setEdgeProbability.
Differential Revision: https://reviews.llvm.org/D90284
Before we used to only mark unreachable static functions as dead if all
uses were known dead. Now we optimistically assume uses to be dead until
proven otherwise.
If we are looking at a call site argument it might be a load or call
which is in a different context than the call site argument. We cannot
simply use the call site argument range for the call or load.
Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.
In the AANoAlias logic we determine if a pointer may have been captured
before a call. We need to look at other uses in the call not uses of the
call.
The new code is not perfect as it does not allow trivial cases where the
call has multiple arguments but it is at least not unsound and a TODO
was added.
This patch teaches the jump threading pass to set edge probabilities
whenever the pass creates new basic blocks.
Without this patch, the compiler sometimes produces non-deterministic
results. The non-determinism comes from the jump threading pass using
stale edge probabilities in BranchProbabilityInfo. Specifically, when
the jump threading pass creates a new basic block, we don't initialize
its outgoing edge probability.
Edge probabilities are maintained in:
DenseMap<Edge, BranchProbability> Probs;
in class BranchProbabilityInfo, where Edge is an ordered pair of
BasicBlock * and a successor index declared as:
using Edge = std::pair<const BasicBlock *, unsigned>;
Probs maps edges to their corresponding probabilities.
Now, we rarely remove entries from this map, so if we happen to
allocate a new basic block at the same address as a previously deleted
basic block with an edge probability assigned, the newly created basic
block appears to have an edge probability, albeit a stale one.
This patch fixes the problem by explicitly setting edge probabilities
whenever the jump threading pass creates new basic blocks.
Differential Revision: https://reviews.llvm.org/D90106
Summary:
This patch adds support for passing in the original delcaration name in the
source file to the libomptarget runtime. This will allow the runtime to provide
more intelligent debugging messages. This patch takes the original expression
parsed from the OpenMP map / update clause and provides a textual
representation if it was explicitly mapped, otherwise it takes the name of the
variable declaration as a fallback. The information in passed to the runtime in
a global array of strings that matches the existing ident_t source location
strings using ";name;filename;column;row;;". See
clang/test/OpenMP/target_map_names.cpp for an example of the generated output
for a given map clause.
Reviewers: jdoervert
Differential Revision: https://reviews.llvm.org/D89802
These logically belong together since it's a base commit plus
followup fixes to less common build configurations.
The patches are:
Revert "CfgInterface: rename interface() to getInterface()"
This reverts commit a74fc48158.
Revert "Wrap CfgTraitsFor in namespace llvm to please GCC 5"
This reverts commit f2a06875b6.
Revert "Try to make GCC5 happy about the CfgTraits thing"
This reverts commit 03a5f7ce12.
Revert "Introduce CfgTraits abstraction"
This reverts commit c0cdd22c72.
This patch changes MergeBlockIntoPredecessor to skip the call to
RemoveRedundantDbgInstrs, in effect partially reverting D71480 due to
some compile-time issues spotted in LoopUnroll and SimplifyCFG.
The call to RemoveRedundantDbgInstrs appears to have changed the
worst-case behavior of the merging utility. Loosely speaking, it seems
to have gone from O(#phis) to O(#insts).
It might not be possible to mitigate this by scanning a block to
determine whether there are any debug intrinsics to remove, since such a
scan costs O(#insts).
So: skip the call to RemoveRedundantDbgInstrs. There's surprisingly
little fallout from this, and most of it can be addressed by doing
RemoveRedundantDbgInstrs later. The exception is (the block-local
version of) SimplifyCFG, where it might just be too expensive to call
RemoveRedundantDbgInstrs.
Differential Revision: https://reviews.llvm.org/D88928
This reverts commit c6ca26c0bf.
This breaks stage2 builds due to hitting this assert:
```
Assertion failed: (WeightSum <= UINT32_MAX && "Expected weights to scale down to 32 bits"), function calcMetadataWeights
```
when compiling AArch64RegisterBankInfo.cpp in LLVM.
-Oz normally does not allow loop header duplication so this loop wouldn't be
vectorized. However the vectorization pragma should override this and allow
for loop rotation.
rdar://problem/49281061
Original patch by Adam Nemet.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D59832
GVN Load PRE can split the backedge causing breaking the loop structure where the latch
contains the conditional branch with for example induction variable.
Different optimizations expect this form of the loop, so it is better to preserve it for some time.
This CL adds an option to control an ability to split backedge.
Default value is true so technically it is NFC and current behavior is not changed.
Reviewers: fedor.sergeev, mkazantsev, nikic, reames, fhahn
Reviewed By: mkazasntsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D89854
Even if the exact exit count is unknown, we can still prove that this
exit will not be taken. If we can prove that the predicate is monotonic,
fulfilled on first & last iteration, and no overflow happened in between,
then the check can be removed.
Differential Revision: https://reviews.llvm.org/D87832
Reviewed By: apilipenko
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
`-separate-const-offset-from-gep` has not yet be ported, so some tests are not updated.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D90149
Prepend the module name hash with a fixed string ".__uniq." which helps tools
that consume sampled profiles and attribute it to functions to understand
that this symbol belongs to a unique internal linkage type symbol.
Symbols with suffixes can result from various optimizations in the compiler.
Function Multiversioning, function splitting, parameter constant propogation,
unique internal linkage names.
External tools like sampled profile aggregators combine profiles from multiple
runs of a binary. They use various heuristics with symbols that have suffixes
to try and attribute the profile to the right function instance. For instance
multi-versioned symbols like foo.avx, foo.sse4.2, etc even though different
should be attributed to the same source function if a single function is
versioned, using attribute target_clones (supported in GCC but yet to land in
LLVM). Similarly, functions that are split (split part having a .cold suffix)
could have profiles for both the original and split symbols but would be
aggregated and attributed to the original function that was split.
Unique internal linkage functions however have different source instances and
the aggregator must not put them together but attribute it to the appropriate
function instance. To be sure that we are dealing with a symbol of a unique
internal linkage function, we would like to prepend the hash with a known
string ".__uniq." which these tools can check to understand the suffix type.
Differential Revision: https://reviews.llvm.org/D89617
This fixes the bug 47945. It is legal to have a PHI with values
from from the same block, but values must stay the same. In this
case it is illegal to merge different values.
Differential Revision: https://reviews.llvm.org/D89978