For patterns c/d/e we too can deal with the pattern even if we can't
just drop the mask, we can just apply it afterwars:
https://rise4fun.com/Alive/gslRa
llvm-svn: 372244
Summary:
Also fixup rL371928 for cases that occur on our out-of-tree backend
There were still quite a few intermediate APInts and this caused the
compile time of MCCodeEmitter for our target to jump from 16s up to
~5m40s. This patch, brings it back down to ~17s by eliminating pretty
much all of them using two new APInt functions (extractBitsAsZExtValue(),
insertBits() but with a uint64_t). The exact conditions for eliminating
them is that the field extracted/inserted must be <=64-bit which is
almost always true.
Note: The two new APInt API's assume that APInt::WordSize is at least
64-bit because that means they touch at most 2 APInt words. They
statically assert that's true. It seems very unlikely that someone
is patching it to be smaller so this should be fine.
Reviewers: jmolloy
Reviewed By: jmolloy
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67686
llvm-svn: 372243
Summary:
This is the first patch in a series of patches that will implement data dependence graph in LLVM. Many of the ideas used in this implementation are based on the following paper:
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS.
This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges.
The implementation takes a list of basic-blocks and only considers dependencies among instructions in those basic blocks. Any dependencies coming into or going out of instructions that do not belong to those basic blocks are ignored.
The algorithm for building the graph involves the following steps in order:
1. For each instruction in the range of basic blocks to consider, create an atomic node in the resulting graph.
2. For each node in the graph establish def-use edges to/from other nodes in the graph.
3. For each pair of nodes containing memory instruction(s) create memory edges between them. This part of the algorithm goes through the instructions in lexicographical order and creates edges in reverse order if the sink of the dependence occurs before the source of it.
Authored By: bmahjour
Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert
Reviewed By: Meinersbur, fhahn, myhsu
Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto
Tag: #llvm
Differential Revision: https://reviews.llvm.org/D65350
llvm-svn: 372238
sets.
According to OpenMP 5.0, context selector set might include several
context selectors, separated with commas. Patch fixes this problem.
llvm-svn: 372235
is enabled.
We can save memory and reduce binary size significantly by enabling
ProfileSampleAccurate. However when ProfileSampleAccurate is true,
function without sample will be regarded as cold and this could
potentially cause performance regression.
To minimize the potential negative performance impact, we want to be
a little conservative here saying if a function shows up in the profile,
no matter as outline instance, inline instance or call targets, treat
the function as not being cold. This will handle the cases such as most
callsites of a function are inlined in sampled binary (thus outline copy
don't get any sample) but not inlined in current build (because of source
code drift, imprecise debug information, or the callsites are all cold
individually but not cold accumulatively...), so that the outline function
showing up as cold in sampled binary will actually not be cold after current
build. After the change, such function will be treated as not cold even
profile-sample-accurate is enabled.
At the same time we lower the hot criteria of callsiteIsHot check when
profile-sample-accurate is enabled. callsiteIsHot is used to determined
whether a callsite is hot and qualified for early inlining. When
profile-sample-accurate is enabled, functions without profile will be
regarded as cold and much less inlining will happen in CGSCC inlining pass,
so we can worry less about size increase and be aggressive to allow more
early inlining to happen for warm callsites and it is helpful for performance
overall.
Differential Revision: https://reviews.llvm.org/D67561
llvm-svn: 372232
Commit message below, original caused the sphinx build bot to fail, this
one should fix it.
Create UsersManual section entitled 'Controlling Floating Point
Behavior'
Create a new section for documenting the floating point options. Move
all the floating point options into this section, and add new entries
for the floating point options that exist but weren't previously
described in the UsersManual.
Patch By: mibintc
Differential Revision: https://reviews.llvm.org/D67517
llvm-svn: 372229
This broke the Chromium build. Consider the following code:
float ScaleSumSamples_C(const float* src, float* dst, float scale, int width) {
float fsum = 0.f;
int i;
#if defined(__clang__)
#pragma clang loop vectorize_width(4)
#endif
for (i = 0; i < width; ++i) {
float v = *src++;
fsum += v * v;
*dst++ = v * scale;
}
return fsum;
}
Compiling at -Oz, Clang now warns:
$ clang++ -target x86_64 -Oz -c /tmp/a.cc
/tmp/a.cc:1:7: warning: loop not vectorized: the optimizer was unable to
perform the requested transformation; the transformation might be disabled or
specified as part of an unsupported transformation ordering
[-Wpass-failed=transform-warning]
this suggests it's not actually enabling vectorization hard enough.
At -Os it asserts instead:
$ build.release/bin/clang++ -target x86_64 -Os -c /tmp/a.cc
clang-10: /work/llvm.monorepo/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2734: void
llvm::InnerLoopVectorizer::emitMemRuntimeChecks(llvm::Loop*, llvm::BasicBlock*): Assertion `
!BB->getParent()->hasOptSize() && "Cannot emit memory checks when optimizing for size"' failed.
Of course neither of these are what the developer expected from the pragma.
> Specifying the vectorization width was supposed to implicitly enable
> vectorization, except that it wasn't really doing this. It was only
> setting the vectorize.width metadata, but not vectorize.enable.
>
> This should fix PR27643.
>
> Differential Revision: https://reviews.llvm.org/D66290
llvm-svn: 372225
different compilers will put different things into __PRETTY_FUNCTION__.
For instance gcc will not put a " " in the "const char *" argument,
causing our regex matching to fail.
This patch relaxes the regexes in this test to account for this
difference.
llvm-svn: 372224
Summary:
This fixes B42473 and B42706.
This patch makes the SDA propagate branch divergence until the end of the RPO traversal. Before, the SyncDependenceAnalysis propagated divergence only until the IPD in rpo order. RPO is incompatible with post dominance in the presence of loops. This made the SDA crash because blocks were missed in the propagation.
Reviewers: foad, nhaehnle
Reviewed By: foad
Subscribers: jvesely, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65274
llvm-svn: 372223
The test was expecting the value of "lldb.frame" to be None, because it
is cleared after each python interpreter session. However, this is not
true in the very first session, because lldb.py sets these values to
invalid objects (lldb.SBFrame(), etc.).
I have not investigated why is it that this test passes on darwin, but
my guess is that this is because we do extra work on darwin (loading the
objc runtime, etc), which causes us to enter the python interpreter
sooner.
This patch changes lldb.py to also initialize these values to None, as
that seems to make more sense. I also fixed some typos in the test while
I was in there.
llvm-svn: 372222
Summary:
The `CHECK: frame:py: None` seems to have been a typo, causing build bot failures:
```
# CHECK: frame:py: None
^
<stdin>:1:1: note: scanning from here
(lldb) command source -s 0 'E:/build_slave/lldb-x64-windows-ninja/build/tools/lldb\lit\lit-lldb-init'
^
<stdin>:23:1: note: possible intended match here
frame:py: No value
^
```
This update fixes the build bots.
--
Reviewers: bkramer
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D67702
llvm-svn: 372221
We need "xgot" flag in the MipsAsmParser to implement correct expansion
of some pseudo instructions in case of using 32-bit GOT (XGOT).
MipsAsmParser does not have reference to MipsSubtarget but has a
reference to "feature bit set".
llvm-svn: 372220
The static analyzer noticed that we were dereferencing it even when the default null value was being used. Further investigation showed that we never explicitly set the parameter so I've just removed it entirely.
llvm-svn: 372217
Fixes quoting of profile arguments to work on Windows
Suppresses adding profile arguments to linker flags when using lld-link
Avoids -fprofile-instr-use being added to rc.exe flags
Removes duplicated adding of -fprofile-instr-use to linker flags (since
r355541)
Move handling LLVM_PROFDATA_FILE to HandleLLVMOptions.cmake
Differential Revision: https://reviews.llvm.org/D62063
llvm-svn: 372209
This patch fixes a bug exposed by D65653 where a subsequent invocation
of `determineCalleeSaves` ends up with a different size for the callee
save area, leading to different frame-offsets in debug information.
In the invocation by PEI, `determineCalleeSaves` tries to determine
whether it needs to spill an extra callee-saved register to get an
emergency spill slot. To do this, it calls 'estimateStackSize' and
manually adds the size of the callee-saves to this. PEI then allocates
the spill objects for the callee saves and the remaining frame layout
is calculated accordingly.
A second invocation in LiveDebugValues causes estimateStackSize to return
the size of the stack frame including the callee-saves. Given that the
size of the callee-saves is added to this, these callee-saves are counted
twice, which leads `determineCalleeSaves` to believe the stack has
become big enough to require spilling an extra callee-save as emergency
spillslot. It then updates CalleeSavedStackSize with a larger value.
Since CalleeSavedStackSize is used in the calculation of the frame
offset in getFrameIndexReference, this leads to incorrect offsets for
variables/locals when this information is recalculated after PEI.
Reviewers: omjavaid, eli.friedman, thegameg, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D66935
llvm-svn: 372204
Summary:
Currently our expression evaluators only prints very basic errors that are not very useful when writing complex expressions.
For example, in the expression below the user made a type error, but it's not clear from the diagnostic what went wrong:
```
(lldb) expr printf("Modulos are:", foobar%mo1, foobar%mo2, foobar%mo3)
error: invalid operands to binary expression ('int' and 'double')
```
This patch enables full Clang diagnostics in our expression evaluator. After this patch the diagnostics for the expression look like this:
```
(lldb) expr printf("Modulos are:", foobar%mo1, foobar%mo2, foobar%mo3)
error: <user expression 1>:1:54: invalid operands to binary expression ('int' and 'float')
printf("Modulos are:", foobar%mo1, foobar%mo2, foobar%mo3)
~~~~~~^~~~
```
To make this possible, we now emulate a user expression file within our diagnostics. This prevents that the user is exposed to
our internal wrapper code we inject.
Note that the diagnostics that refer to declarations from the debug information (e.g. 'note' diagnostics pointing to a called function)
will not be improved by this as they don't have any source locations associated with them, so caret or line printing isn't possible.
We instead just suppress these diagnostics as we already do with warnings as they would otherwise just be a context message
without any context (and the original diagnostic in the user expression should be enough to explain the issue).
Fixes rdar://24306342
Reviewers: JDevlieghere, aprantl, shafik, #lldb
Reviewed By: JDevlieghere, #lldb
Subscribers: usaxena95, davide, jingham, aprantl, arphaman, kadircet, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D65646
llvm-svn: 372203
Summary:
The latter is slightly more efficient and communicates the intent of the
API: writeFileAtomically does not own or copy the callback, it merely
calls it at some point.
Reviewers: jkorous
Reviewed By: jkorous
Subscribers: hiraditya, dexonsmith, jfb, llvm-commits, cfe-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67584
llvm-svn: 372201
This generates worse code, but matches what is done for avx2 and
prevents crashes when more arguments are passed than we have
registers for.
llvm-svn: 372200
Currently, not all user specified relocations
(with clang intrinsic __builtin_preserve_access_index())
will turn into relocations.
In the current implementation, a __builtin_preserve_access_index()
chain is turned into relocation only if the result of the clang
intrinsic is used in a function call or a nonzero offset computation
of getelementptr. For all other cases, the relocatiion request
is ignored and the __builtin_preserve_access_index() is turned
into regular getelementptr instructions.
The main reason is to mimic bpf_probe_read() requirement.
But there are other use cases where relocatable offset is
generated but not used for bpf_probe_read(). This patch
relaxed previous constraints when to generate relocations.
Now, all user __builtin_preserve_access_index() will have
relocations generated.
Differential Revision: https://reviews.llvm.org/D67688
llvm-svn: 372198
I don't know what the intent of parts of this test were. We set a
bunch of breakpoints and ran from one to the other, doing "self.runCmd("thread backtrace")"
then continuing to the next one. We didn't actually verify the contents of the backtrace,
nor that we hit the breakpoints we set in any particular order. The only actual test was
to run sel_getName at two of these stops.
So I reduced the test to just stopping at the places where we were actually going to run
an expression, and tested the expression.
llvm-svn: 372196
The filename in the RemarkStreamer should be optional to allow clients
to stream remarks to memory or to existing streams.
This introduces a new overload of `setupOptimizationRemarks`, and avoids
enforcing the presence of a filename at different places.
llvm-svn: 372195
Summary: This way it can be overwritten when cross compiling.
Subscribers: mgorny, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D67641
llvm-svn: 372194
This test is about disassembling symbols in a framework without debug information.
So we don't need to run it once per debug info flavor.
llvm-svn: 372193