MSAN bot complains that there is use-of-uninitialized-value
of this FreeStores later in IsWorthwhile().
Perhaps FreeStores needs to be stored in a vector?
llvm-svn: 372262
Summary:
`DAGCombiner::visitADDLikeCommutative()` already has a sibling fold:
`(add X, Carry) -> (addcarry X, 0, Carry)`
This fold, as suggested by @efriedma, helps recover from //some//
of the regressions of D62266
Reviewers: efriedma, deadalnix
Subscribers: javed.absar, kristof.beyls, llvm-commits, efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62392
llvm-svn: 372259
Summary:
I don't have a direct motivational case for this,
but it would be good to have this for completeness/symmetry.
This pattern is basically the motivational pattern from
https://bugs.llvm.org/show_bug.cgi?id=43251
but with different predicate that requires that the offset is non-zero.
The completeness bit comes from the fact that a similar pattern (offset != zero)
will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259,
so it'd seem to be good to not overlook very similar patterns..
Proofs: https://rise4fun.com/Alive/21b
Also, there is something odd with `isKnownNonZero()`, if the non-zero
knowledge was specified as an assumption, it didn't pick it up (PR43267)
With this, i see no other missing folds for
https://bugs.llvm.org/show_bug.cgi?id=43251
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67412
llvm-svn: 372257
Summary:
AArch64 GlobalISel doesn't support MachO's large code model, so this patch
adds a check for that combination before implicitly enabling it.
Reviewers: paquette
Subscribers: kristof.beyls, ributzka, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67724
llvm-svn: 372256
Summary:
As it can be see in the changed test, while `div` is really costly,
we were speculating it. This does not seem correct.
Also, the old code would run for every single insturuction in BB,
instead of eagerly bailing out as soon as there are too many instructions.
This function still has a problem that `PHINodeFoldingThreshold` is
per-basic-block, while it should be for all the basic blocks.
Reviewers: efriedma, craig.topper, dmgreen, jmolloy
Reviewed By: jmolloy
Subscribers: xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67315
llvm-svn: 372255
Summary:
As discussed in https://reviews.llvm.org/D62341#1515637,
for MIPS `add %x, -1` isn't optimal. Unlike X86 there
are no fastpaths to matearialize such `-1`/`1` vector constants,
and `sub %x, 1` results in better codegen,
so undo canonicalization
Reviewers: atanasyan, Petar.Avramovic, RKSimon
Reviewed By: atanasyan
Subscribers: sdardis, arichardson, hiraditya, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66805
llvm-svn: 372254
If the variable, used in the loop boundaries, is not captured in the
construct, this variable must be considered as undefined if it was
privatized.
llvm-svn: 372252
In case of using 32-bit GOT access to the table requires two instructions
with attached %got_hi and %got_lo relocations. This patch implements
correct expansion of 'lw/sw' instructions in that case.
Differential Revision: https://reviews.llvm.org/D67705
llvm-svn: 372251
Checking that the created output matches something is nice, but
this should also check whether the output makes sense.
Differential Revision: https://reviews.llvm.org/D63979
llvm-svn: 372250
Summary:
https://bugs.llvm.org/show_bug.cgi?id=41899
```auto lambda = [&a = a]() { a = 2; };```
is formatted as
```auto lambda = [& a = a]() { a = 2; };```
With an extra space if PointerAlignment is set to Left
> The space "& a" looks strange when there is no type in the lambda's intializer expression. This can be worked around with by setting "PointerAlignment: Right", but ideally "PointerAlignment: Left" would not add a space in this case.
Reviewers: klimek, owenpan, krasimir, timwoj
Reviewed By: klimek
Subscribers: cfe-commits
Tags: #clang-tools-extra, #clang
Differential Revision: https://reviews.llvm.org/D67718
llvm-svn: 372249
Those conditions may use __has_include, which needs to be rewritten.
The existing code has already tried to rewrite just __has_include,
but it didn't work with macro expansion, so e.g. Qt's
"#define QT_HAS_INCLUDE(x) __has_include(x)" didn't get handled
properly. Since the preprocessor run knows what each condition evaluates
to, just rewrite the entire condition. This of course requires that
the -frewrite-include pass has the same setup as the following
compilation, but that has always been the requirement.
Differential Revision: https://reviews.llvm.org/D63508
llvm-svn: 372248
Also, add a diagnostic under -Wformat for printing a boolean value as a
character.
rdar://54579473
Differential revision: https://reviews.llvm.org/D66856
llvm-svn: 372247
For patterns c/d/e we too can deal with the pattern even if we can't
just drop the mask, we can just apply it afterwars:
https://rise4fun.com/Alive/gslRa
llvm-svn: 372244
Summary:
Also fixup rL371928 for cases that occur on our out-of-tree backend
There were still quite a few intermediate APInts and this caused the
compile time of MCCodeEmitter for our target to jump from 16s up to
~5m40s. This patch, brings it back down to ~17s by eliminating pretty
much all of them using two new APInt functions (extractBitsAsZExtValue(),
insertBits() but with a uint64_t). The exact conditions for eliminating
them is that the field extracted/inserted must be <=64-bit which is
almost always true.
Note: The two new APInt API's assume that APInt::WordSize is at least
64-bit because that means they touch at most 2 APInt words. They
statically assert that's true. It seems very unlikely that someone
is patching it to be smaller so this should be fine.
Reviewers: jmolloy
Reviewed By: jmolloy
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67686
llvm-svn: 372243
Summary:
This is the first patch in a series of patches that will implement data dependence graph in LLVM. Many of the ideas used in this implementation are based on the following paper:
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS.
This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges.
The implementation takes a list of basic-blocks and only considers dependencies among instructions in those basic blocks. Any dependencies coming into or going out of instructions that do not belong to those basic blocks are ignored.
The algorithm for building the graph involves the following steps in order:
1. For each instruction in the range of basic blocks to consider, create an atomic node in the resulting graph.
2. For each node in the graph establish def-use edges to/from other nodes in the graph.
3. For each pair of nodes containing memory instruction(s) create memory edges between them. This part of the algorithm goes through the instructions in lexicographical order and creates edges in reverse order if the sink of the dependence occurs before the source of it.
Authored By: bmahjour
Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert
Reviewed By: Meinersbur, fhahn, myhsu
Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto
Tag: #llvm
Differential Revision: https://reviews.llvm.org/D65350
llvm-svn: 372238
sets.
According to OpenMP 5.0, context selector set might include several
context selectors, separated with commas. Patch fixes this problem.
llvm-svn: 372235
is enabled.
We can save memory and reduce binary size significantly by enabling
ProfileSampleAccurate. However when ProfileSampleAccurate is true,
function without sample will be regarded as cold and this could
potentially cause performance regression.
To minimize the potential negative performance impact, we want to be
a little conservative here saying if a function shows up in the profile,
no matter as outline instance, inline instance or call targets, treat
the function as not being cold. This will handle the cases such as most
callsites of a function are inlined in sampled binary (thus outline copy
don't get any sample) but not inlined in current build (because of source
code drift, imprecise debug information, or the callsites are all cold
individually but not cold accumulatively...), so that the outline function
showing up as cold in sampled binary will actually not be cold after current
build. After the change, such function will be treated as not cold even
profile-sample-accurate is enabled.
At the same time we lower the hot criteria of callsiteIsHot check when
profile-sample-accurate is enabled. callsiteIsHot is used to determined
whether a callsite is hot and qualified for early inlining. When
profile-sample-accurate is enabled, functions without profile will be
regarded as cold and much less inlining will happen in CGSCC inlining pass,
so we can worry less about size increase and be aggressive to allow more
early inlining to happen for warm callsites and it is helpful for performance
overall.
Differential Revision: https://reviews.llvm.org/D67561
llvm-svn: 372232
Commit message below, original caused the sphinx build bot to fail, this
one should fix it.
Create UsersManual section entitled 'Controlling Floating Point
Behavior'
Create a new section for documenting the floating point options. Move
all the floating point options into this section, and add new entries
for the floating point options that exist but weren't previously
described in the UsersManual.
Patch By: mibintc
Differential Revision: https://reviews.llvm.org/D67517
llvm-svn: 372229
This broke the Chromium build. Consider the following code:
float ScaleSumSamples_C(const float* src, float* dst, float scale, int width) {
float fsum = 0.f;
int i;
#if defined(__clang__)
#pragma clang loop vectorize_width(4)
#endif
for (i = 0; i < width; ++i) {
float v = *src++;
fsum += v * v;
*dst++ = v * scale;
}
return fsum;
}
Compiling at -Oz, Clang now warns:
$ clang++ -target x86_64 -Oz -c /tmp/a.cc
/tmp/a.cc:1:7: warning: loop not vectorized: the optimizer was unable to
perform the requested transformation; the transformation might be disabled or
specified as part of an unsupported transformation ordering
[-Wpass-failed=transform-warning]
this suggests it's not actually enabling vectorization hard enough.
At -Os it asserts instead:
$ build.release/bin/clang++ -target x86_64 -Os -c /tmp/a.cc
clang-10: /work/llvm.monorepo/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2734: void
llvm::InnerLoopVectorizer::emitMemRuntimeChecks(llvm::Loop*, llvm::BasicBlock*): Assertion `
!BB->getParent()->hasOptSize() && "Cannot emit memory checks when optimizing for size"' failed.
Of course neither of these are what the developer expected from the pragma.
> Specifying the vectorization width was supposed to implicitly enable
> vectorization, except that it wasn't really doing this. It was only
> setting the vectorize.width metadata, but not vectorize.enable.
>
> This should fix PR27643.
>
> Differential Revision: https://reviews.llvm.org/D66290
llvm-svn: 372225
different compilers will put different things into __PRETTY_FUNCTION__.
For instance gcc will not put a " " in the "const char *" argument,
causing our regex matching to fail.
This patch relaxes the regexes in this test to account for this
difference.
llvm-svn: 372224
Summary:
This fixes B42473 and B42706.
This patch makes the SDA propagate branch divergence until the end of the RPO traversal. Before, the SyncDependenceAnalysis propagated divergence only until the IPD in rpo order. RPO is incompatible with post dominance in the presence of loops. This made the SDA crash because blocks were missed in the propagation.
Reviewers: foad, nhaehnle
Reviewed By: foad
Subscribers: jvesely, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65274
llvm-svn: 372223
The test was expecting the value of "lldb.frame" to be None, because it
is cleared after each python interpreter session. However, this is not
true in the very first session, because lldb.py sets these values to
invalid objects (lldb.SBFrame(), etc.).
I have not investigated why is it that this test passes on darwin, but
my guess is that this is because we do extra work on darwin (loading the
objc runtime, etc), which causes us to enter the python interpreter
sooner.
This patch changes lldb.py to also initialize these values to None, as
that seems to make more sense. I also fixed some typos in the test while
I was in there.
llvm-svn: 372222
Summary:
The `CHECK: frame:py: None` seems to have been a typo, causing build bot failures:
```
# CHECK: frame:py: None
^
<stdin>:1:1: note: scanning from here
(lldb) command source -s 0 'E:/build_slave/lldb-x64-windows-ninja/build/tools/lldb\lit\lit-lldb-init'
^
<stdin>:23:1: note: possible intended match here
frame:py: No value
^
```
This update fixes the build bots.
--
Reviewers: bkramer
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D67702
llvm-svn: 372221
We need "xgot" flag in the MipsAsmParser to implement correct expansion
of some pseudo instructions in case of using 32-bit GOT (XGOT).
MipsAsmParser does not have reference to MipsSubtarget but has a
reference to "feature bit set".
llvm-svn: 372220
The static analyzer noticed that we were dereferencing it even when the default null value was being used. Further investigation showed that we never explicitly set the parameter so I've just removed it entirely.
llvm-svn: 372217