sets.
According to OpenMP 5.0, context selector set might include several
context selectors, separated with commas. Patch fixes this problem.
llvm-svn: 372235
is enabled.
We can save memory and reduce binary size significantly by enabling
ProfileSampleAccurate. However when ProfileSampleAccurate is true,
function without sample will be regarded as cold and this could
potentially cause performance regression.
To minimize the potential negative performance impact, we want to be
a little conservative here saying if a function shows up in the profile,
no matter as outline instance, inline instance or call targets, treat
the function as not being cold. This will handle the cases such as most
callsites of a function are inlined in sampled binary (thus outline copy
don't get any sample) but not inlined in current build (because of source
code drift, imprecise debug information, or the callsites are all cold
individually but not cold accumulatively...), so that the outline function
showing up as cold in sampled binary will actually not be cold after current
build. After the change, such function will be treated as not cold even
profile-sample-accurate is enabled.
At the same time we lower the hot criteria of callsiteIsHot check when
profile-sample-accurate is enabled. callsiteIsHot is used to determined
whether a callsite is hot and qualified for early inlining. When
profile-sample-accurate is enabled, functions without profile will be
regarded as cold and much less inlining will happen in CGSCC inlining pass,
so we can worry less about size increase and be aggressive to allow more
early inlining to happen for warm callsites and it is helpful for performance
overall.
Differential Revision: https://reviews.llvm.org/D67561
llvm-svn: 372232
Commit message below, original caused the sphinx build bot to fail, this
one should fix it.
Create UsersManual section entitled 'Controlling Floating Point
Behavior'
Create a new section for documenting the floating point options. Move
all the floating point options into this section, and add new entries
for the floating point options that exist but weren't previously
described in the UsersManual.
Patch By: mibintc
Differential Revision: https://reviews.llvm.org/D67517
llvm-svn: 372229
This broke the Chromium build. Consider the following code:
float ScaleSumSamples_C(const float* src, float* dst, float scale, int width) {
float fsum = 0.f;
int i;
#if defined(__clang__)
#pragma clang loop vectorize_width(4)
#endif
for (i = 0; i < width; ++i) {
float v = *src++;
fsum += v * v;
*dst++ = v * scale;
}
return fsum;
}
Compiling at -Oz, Clang now warns:
$ clang++ -target x86_64 -Oz -c /tmp/a.cc
/tmp/a.cc:1:7: warning: loop not vectorized: the optimizer was unable to
perform the requested transformation; the transformation might be disabled or
specified as part of an unsupported transformation ordering
[-Wpass-failed=transform-warning]
this suggests it's not actually enabling vectorization hard enough.
At -Os it asserts instead:
$ build.release/bin/clang++ -target x86_64 -Os -c /tmp/a.cc
clang-10: /work/llvm.monorepo/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2734: void
llvm::InnerLoopVectorizer::emitMemRuntimeChecks(llvm::Loop*, llvm::BasicBlock*): Assertion `
!BB->getParent()->hasOptSize() && "Cannot emit memory checks when optimizing for size"' failed.
Of course neither of these are what the developer expected from the pragma.
> Specifying the vectorization width was supposed to implicitly enable
> vectorization, except that it wasn't really doing this. It was only
> setting the vectorize.width metadata, but not vectorize.enable.
>
> This should fix PR27643.
>
> Differential Revision: https://reviews.llvm.org/D66290
llvm-svn: 372225
different compilers will put different things into __PRETTY_FUNCTION__.
For instance gcc will not put a " " in the "const char *" argument,
causing our regex matching to fail.
This patch relaxes the regexes in this test to account for this
difference.
llvm-svn: 372224
Summary:
This fixes B42473 and B42706.
This patch makes the SDA propagate branch divergence until the end of the RPO traversal. Before, the SyncDependenceAnalysis propagated divergence only until the IPD in rpo order. RPO is incompatible with post dominance in the presence of loops. This made the SDA crash because blocks were missed in the propagation.
Reviewers: foad, nhaehnle
Reviewed By: foad
Subscribers: jvesely, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65274
llvm-svn: 372223
The test was expecting the value of "lldb.frame" to be None, because it
is cleared after each python interpreter session. However, this is not
true in the very first session, because lldb.py sets these values to
invalid objects (lldb.SBFrame(), etc.).
I have not investigated why is it that this test passes on darwin, but
my guess is that this is because we do extra work on darwin (loading the
objc runtime, etc), which causes us to enter the python interpreter
sooner.
This patch changes lldb.py to also initialize these values to None, as
that seems to make more sense. I also fixed some typos in the test while
I was in there.
llvm-svn: 372222
Summary:
The `CHECK: frame:py: None` seems to have been a typo, causing build bot failures:
```
# CHECK: frame:py: None
^
<stdin>:1:1: note: scanning from here
(lldb) command source -s 0 'E:/build_slave/lldb-x64-windows-ninja/build/tools/lldb\lit\lit-lldb-init'
^
<stdin>:23:1: note: possible intended match here
frame:py: No value
^
```
This update fixes the build bots.
--
Reviewers: bkramer
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D67702
llvm-svn: 372221
We need "xgot" flag in the MipsAsmParser to implement correct expansion
of some pseudo instructions in case of using 32-bit GOT (XGOT).
MipsAsmParser does not have reference to MipsSubtarget but has a
reference to "feature bit set".
llvm-svn: 372220
The static analyzer noticed that we were dereferencing it even when the default null value was being used. Further investigation showed that we never explicitly set the parameter so I've just removed it entirely.
llvm-svn: 372217
Fixes quoting of profile arguments to work on Windows
Suppresses adding profile arguments to linker flags when using lld-link
Avoids -fprofile-instr-use being added to rc.exe flags
Removes duplicated adding of -fprofile-instr-use to linker flags (since
r355541)
Move handling LLVM_PROFDATA_FILE to HandleLLVMOptions.cmake
Differential Revision: https://reviews.llvm.org/D62063
llvm-svn: 372209
This patch fixes a bug exposed by D65653 where a subsequent invocation
of `determineCalleeSaves` ends up with a different size for the callee
save area, leading to different frame-offsets in debug information.
In the invocation by PEI, `determineCalleeSaves` tries to determine
whether it needs to spill an extra callee-saved register to get an
emergency spill slot. To do this, it calls 'estimateStackSize' and
manually adds the size of the callee-saves to this. PEI then allocates
the spill objects for the callee saves and the remaining frame layout
is calculated accordingly.
A second invocation in LiveDebugValues causes estimateStackSize to return
the size of the stack frame including the callee-saves. Given that the
size of the callee-saves is added to this, these callee-saves are counted
twice, which leads `determineCalleeSaves` to believe the stack has
become big enough to require spilling an extra callee-save as emergency
spillslot. It then updates CalleeSavedStackSize with a larger value.
Since CalleeSavedStackSize is used in the calculation of the frame
offset in getFrameIndexReference, this leads to incorrect offsets for
variables/locals when this information is recalculated after PEI.
Reviewers: omjavaid, eli.friedman, thegameg, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D66935
llvm-svn: 372204
Summary:
Currently our expression evaluators only prints very basic errors that are not very useful when writing complex expressions.
For example, in the expression below the user made a type error, but it's not clear from the diagnostic what went wrong:
```
(lldb) expr printf("Modulos are:", foobar%mo1, foobar%mo2, foobar%mo3)
error: invalid operands to binary expression ('int' and 'double')
```
This patch enables full Clang diagnostics in our expression evaluator. After this patch the diagnostics for the expression look like this:
```
(lldb) expr printf("Modulos are:", foobar%mo1, foobar%mo2, foobar%mo3)
error: <user expression 1>:1:54: invalid operands to binary expression ('int' and 'float')
printf("Modulos are:", foobar%mo1, foobar%mo2, foobar%mo3)
~~~~~~^~~~
```
To make this possible, we now emulate a user expression file within our diagnostics. This prevents that the user is exposed to
our internal wrapper code we inject.
Note that the diagnostics that refer to declarations from the debug information (e.g. 'note' diagnostics pointing to a called function)
will not be improved by this as they don't have any source locations associated with them, so caret or line printing isn't possible.
We instead just suppress these diagnostics as we already do with warnings as they would otherwise just be a context message
without any context (and the original diagnostic in the user expression should be enough to explain the issue).
Fixes rdar://24306342
Reviewers: JDevlieghere, aprantl, shafik, #lldb
Reviewed By: JDevlieghere, #lldb
Subscribers: usaxena95, davide, jingham, aprantl, arphaman, kadircet, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D65646
llvm-svn: 372203
Summary:
The latter is slightly more efficient and communicates the intent of the
API: writeFileAtomically does not own or copy the callback, it merely
calls it at some point.
Reviewers: jkorous
Reviewed By: jkorous
Subscribers: hiraditya, dexonsmith, jfb, llvm-commits, cfe-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67584
llvm-svn: 372201
This generates worse code, but matches what is done for avx2 and
prevents crashes when more arguments are passed than we have
registers for.
llvm-svn: 372200
Currently, not all user specified relocations
(with clang intrinsic __builtin_preserve_access_index())
will turn into relocations.
In the current implementation, a __builtin_preserve_access_index()
chain is turned into relocation only if the result of the clang
intrinsic is used in a function call or a nonzero offset computation
of getelementptr. For all other cases, the relocatiion request
is ignored and the __builtin_preserve_access_index() is turned
into regular getelementptr instructions.
The main reason is to mimic bpf_probe_read() requirement.
But there are other use cases where relocatable offset is
generated but not used for bpf_probe_read(). This patch
relaxed previous constraints when to generate relocations.
Now, all user __builtin_preserve_access_index() will have
relocations generated.
Differential Revision: https://reviews.llvm.org/D67688
llvm-svn: 372198
I don't know what the intent of parts of this test were. We set a
bunch of breakpoints and ran from one to the other, doing "self.runCmd("thread backtrace")"
then continuing to the next one. We didn't actually verify the contents of the backtrace,
nor that we hit the breakpoints we set in any particular order. The only actual test was
to run sel_getName at two of these stops.
So I reduced the test to just stopping at the places where we were actually going to run
an expression, and tested the expression.
llvm-svn: 372196
The filename in the RemarkStreamer should be optional to allow clients
to stream remarks to memory or to existing streams.
This introduces a new overload of `setupOptimizationRemarks`, and avoids
enforcing the presence of a filename at different places.
llvm-svn: 372195
Summary: This way it can be overwritten when cross compiling.
Subscribers: mgorny, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D67641
llvm-svn: 372194
This test is about disassembling symbols in a framework without debug information.
So we don't need to run it once per debug info flavor.
llvm-svn: 372193
Jim pointed out that the LLDB global variables should only be available
in interactive mode. When used from a command for example, their values
might be stale or not at all what the user expects. Therefore we want to
explicitly make these variables unavailable.
Differential revision: https://reviews.llvm.org/D67685
llvm-svn: 372192
Starting from r324788 timer groups aren't cleared automatically when
printed out. As a result some timer groups were printed one more time.
For example, "Pass execution timing report" was printed again in
`ManagedStatic<PassTimingInfo>` destructor, "DWARF Emission" in
`ManagedStatic<Name2PairMap> NamedGroupedTimers` destructor.
Fix by clearing timer groups manually.
Reviewers: thegameg, george.karpenkov
Reviewed By: thegameg
Subscribers: aprantl, jkorous, dexonsmith, ributzka, aras-p, cfe-commits
Differential Revision: https://reviews.llvm.org/D67683
llvm-svn: 372191
Summary:
The PGO counter reading will add cold and inlinehint (hot) attributes
to functions that are very cold or hot. This was using hardcoded
thresholds, instead of the profile summary cutoffs which are used in
other hot/cold detection and are more dynamic and adaptable. Switch
to using the summary-based cold/hot detection.
The hardcoded limits were causing some code that had a medium level of
hotness (per the summary) to be incorrectly marked with a cold
attribute, blocking inlining.
Reviewers: davidxl
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67673
llvm-svn: 372189
Since the removal of extensions nodes from schedule trees in r362257 it
is possible to emit parallel code for SCoPs containing
matrix-multiplications. However, the code looking for references used in
outlined statement was not prepared to handle CopyStmts introduced by
the matrix-matrix multiplication detection.
In this case, CopyStmts do not introduce references in addition to the
ones captured by MemoryAccesses, i.e. we change the assertion to accept
CopyStmts and add a regression test for this case.
This fixes llvm.org/PR43164
llvm-svn: 372188
r361845 changed the way we handle "D16" vs. "D32" targets; there used to
be a negative "d16" which removed instructions from the instruction set,
and now there's a "d32" feature which adds instructions to the
instruction set. This is good, but there was an oversight in the
implementation: the behavior of VFPv2 was changed. In particular, the
"vfp2" feature was changed to imply "d32". This is wrong: VFPv2 only
supports 16 D registers.
In practice, this means if you specify -mfpu=vfpv2, the compiler will
generate illegal instructions.
This patch gets rid of "vfp2d16" and "vfp2d16sp", and fixes "vfp2" and
"vfp2sp" so they don't imply "d32".
Differential Revision: https://reviews.llvm.org/D67375
llvm-svn: 372186