Factor out CodeExtractor's analysis of allocas (for shrinkwrapping
purposes), and allow the analysis to be reused.
This resolves a quadratic compile-time bug observed when compiling
AMDGPUDisassembler.cpp.o.
Pre-patch (Release + LTO clang):
```
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
176.5278 ( 57.8%) 0.4915 ( 18.5%) 177.0192 ( 57.4%) 177.4112 ( 57.3%) Hot Cold Splitting
```
Post-patch (ReleaseAsserts clang):
```
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
1.4051 ( 3.3%) 0.0079 ( 0.3%) 1.4129 ( 3.2%) 1.4129 ( 3.2%) Hot Cold Splitting
```
Testing: check-llvm, and comparing the AMDGPUDisassembler.cpp.o binary
pre- vs. post-patch.
An alternate approach is to hide CodeExtractorAnalysisCache from clients
of CodeExtractor, and to recompute the analysis from scratch inside of
CodeExtractor::extractCodeRegion(). This eliminates some redundant work
in the shrinkwrapping legality check. However, some clients continue to
exhibit O(n^2) compile time behavior as computing the analysis is O(n).
rdar://55912966
Differential Revision: https://reviews.llvm.org/D68616
llvm-svn: 374089
Summary:
Without offsets on the MachineMemOperands (MMOs),
MachineInstr::mayAlias() will return true for all reads and writes to the
same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler
when analyzing dependencies of buffer loads and stores. It also limits
the SILoadStoreOptimizer from merging more instructions.
This patch reduces the compile time of one pathological compute shader
from 12 seconds to 1 second.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65097
llvm-svn: 374087
D65402 causes test failure related to attributor-max-iterations.
This commit removes attributor-max-iterations-verify for now.
I'll examine the factor and the flag should be reverted.
llvm-svn: 374086
The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<> directly and if not assert will fire for us.
llvm-svn: 374085
This makes parsing the symbol table of clang marginally faster. (Hashtable versus tree).
Differential Revision: https://reviews.llvm.org/D68605
llvm-svn: 374084
Inhibit generation of unused real dpp instructions on gfx10 just
like it is done on other subtargets. This does not change anything
because these are illegal anyway and not accepted, but it does
reduce the number of instruction definitions generated.
Differential Revision: https://reviews.llvm.org/D68607
llvm-svn: 374083
This change is mostly performance-neutral since our regex engine is
fast, but it's IMHO slightly more readable. Also, matching matching
parenthesis is not a great match for regular expressions.
Differential Revision: https://reviews.llvm.org/D68609
llvm-svn: 374082
It turns out that r374056 broke _some_ build bots again, specifically
the ones using sanitizers. Instead of trying to link the right system
libraries to the benchmarks bit-by-bit, let's just link exactly the
system libraries that libc++ itself needs.
llvm-svn: 374079
Sometimes functions with large comment blocks in front of them have their
declarations output on several lines by c-index-test. Hence the one-line
function name/line/mangled pattern will not work to detect them. Break the
pattern up into two patterns and keep state after seeing the name/line
information until we finally see the mangled name.
Differential Revision: https://reviews.llvm.org/D68272
llvm-svn: 374078
Most of the secondary Makefiles we have are just a couple variable
definitions and then an include of Makefile.rules. This patch removes
most of the secondary Makefiles and replaces them with a direct
invocation of Makefile.rules in the main Makefile. The specificities
of each sub-build are listed right there on the recursive $(MAKE)
call. All the variables that matter are being passed automagically by
make as they have been passed on the command line. The only things you
need to specify are the variables customizating the Makefile.rules
logic for each image.
This patch also removes most of the clean logic from those Makefiles
and from Makefile.rules. The clean rule is not required anymore now
that we run the testsuite in a separate build directory that is wiped
with each run. The patch leaves a very crude version of clean in
Makefile.rules which removes everything inside of $(BUILDDIR). It does
this only when the $(BUILDDIR) looks like a sub-directory of our
standard testsuite build directory to be extra safe.
Reviewers: aprantl, labath
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D68558
llvm-svn: 374076
If the sign bit of the value that is being sign-extended is not set,
i.e. the value is non-negative (s>= 0), then zero-extension will suffice,
and is better for analysis: https://rise4fun.com/Alive/a8PD
llvm-svn: 374075
Summary:
When searching for local expression tree created by stackified
registers, for 'block' placement, we start the search from the previous
instruction of a BB's terminator. But in 'try''s case, we should start
from the previous instruction of a call that can throw, or a EH_LABEL
that precedes the call, because the return values of the call's previous
instructions can be stackified and consumed by the throwing call.
For example,
```
i32.call @foo
call @bar ; may throw
br $label0
```
In this case, if we start the search from the previous instruction of
the terminator (`br` here), we end up stopping at `call @bar` and place
a 'try' between `i32.call @foo` and `call @bar`, because `call @bar`
does not have a return value so it is not a local expression tree of
`br`.
But in this case, unlike when placing 'block's, we should start the
search from `call @bar`, because the return value of `i32.call @foo` is
stackified and used by `call @bar`.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68619
llvm-svn: 374073
According to OpenMP 5.0, 2.3.2 Context Selectors, Restrictions, each
trait-set-selector-name can only be specified once. Added check to
implement this restriction.
llvm-svn: 374072
Summary:
Before the pointer variable `args_dict` was assigned the result of an
allocation with `new` and then `args_dict` is passed to
`GetValueForKeyAsDictionary` which immediatly and unconditionally
assigns `args_dict` to `nullptr`:
```
bool GetValueForKeyAsDictionary(llvm::StringRef key,
Dictionary *&result) const {
result = nullptr;
```
This caused a memory leak which was found in my coverity scan instance
under CID 224753: https://scan.coverity.com/projects/kwk-llvm-project.
Reviewers: jankratochvil, teemperor
Reviewed By: teemperor
Subscribers: teemperor, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D68638
llvm-svn: 374071
During the If-Converter optimization pay attention when copying or
deleting call instructions in order to keep call site information in
valid state.
Reviewers: aprantl, vsk, efriedma
Reviewed By: vsk, efriedma
Differential Revision: https://reviews.llvm.org/D66955
llvm-svn: 374068
Summary:
Using enumerators as flags is standard practice. This patch adds
support to LLDB to display such enum values symbolically, eg:
(E) e1 = A | B
If enumerators don't cover the whole value, the remaining bits are
displayed as hexadecimal:
(E) e4 = A | 0x10
Detecting whether an enum is used as a bitfield or not is
complicated. This patch implements a heuristic that assumes that such
enumerators will either have only 1 bit set or will be a combination
of previous values.
This patch doesn't change the way we currently display enums which the
above heuristic would not consider as bitfields.
Reviewers: jingham, labath
Subscribers: lldb-commits
Differential Revision: https://reviews.llvm.org/D67520
llvm-svn: 374067
Summary:
In D65186 and related patches, MustBeExecutedContextExplorer is introduced. This enables us to traverse instructions guaranteed to execute from function entry. If we can know the argument is used as `dereferenceable` or `nonnull` in these instructions, we can mark `dereferenceable` or `nonnull` in the argument definition:
1. Memory instruction (similar to D64258)
Trace memory instruction pointer operand. Currently, only inbounds GEPs are traced.
```
define i64* @f(i64* %a) {
entry:
%add.ptr = getelementptr inbounds i64, i64* %a, i64 1
; (because of inbounds GEP we can know that %a is at least dereferenceable(16))
store i64 1, i64* %add.ptr, align 8
ret i64* %add.ptr ; dereferenceable 8 (because above instruction stores into it)
}
```
2. Propagation from callsite (similar to D27855)
If `deref` or `nonnull` are known in call site parameter attributes we can also say that argument also that attribute.
```
declare void @use3(i8* %x, i8* %y, i8* %z);
declare void @use3nonnull(i8* nonnull %x, i8* nonnull %y, i8* nonnull %z);
define void @parent1(i8* %a, i8* %b, i8* %c) {
call void @use3nonnull(i8* %b, i8* %c, i8* %a)
; Above instruction is always executed so we can say that@parent1(i8* nonnnull %a, i8* nonnull %b, i8* nonnull %c)
call void @use3(i8* %c, i8* %a, i8* %b)
ret void
}
```
Reviewers: jdoerfert, sstefan1, spatel, reames
Reviewed By: jdoerfert
Subscribers: xbolva00, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65402
llvm-svn: 374063
Second Landing Attempt:
This patch enables end to end support for generating ELF interface stubs
directly from clang. Now the following:
clang -emit-interface-stubs -o libfoo.so a.cpp b.cpp c.cpp
will product an ELF binary with visible symbols populated. Visibility attributes
and -fvisibility can be used to control what gets populated.
* Adding ToolChain support for clang Driver IFS Merge Phase
* Implementing a default InterfaceStubs Merge clang Tool, used by ToolChain
* Adds support for the clang Driver to involve llvm-ifs on ifs files.
* Adds -emit-merged-ifs flag, to tell llvm-ifs to emit a merged ifs text file
instead of the final object format (normally ELF)
Differential Revision: https://reviews.llvm.org/D63978
llvm-svn: 374061
Summary: This patch introduces a generic way to compose two structured deductions. This will be used for composing generic deduction with `MustBeExecutedExplorer` and other existing generic deduction.
Reviewers: jdoerfert, sstefan1
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66645
llvm-svn: 374060
Summary:
This format introduces new features and platforms
The motivation for this format is to support more than 1 platform since previous versions only supported additional architectures and 1 platform,
for example ios + ios-simulator and macCatalyst.
Reviewers: ributzka, steven_wu
Reviewed By: ributzka
Subscribers: mgorny, hiraditya, mgrang, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67529
llvm-svn: 374058
C linkage.
After some discussion with OpenMP developers, it was decided that the
functions with the different C linkage can be used in declare variant
directive.
llvm-svn: 374057
We tried doing that previously (in r373487) and failed (reverted in
r373506) because the benchmarks needed to link against system libraries
and relied on libc++'s dependencies being propagated. Now that this has
been fixed (in r374053), this commit marks the system libraries as
PRIVATE dependencies of libc++.
llvm-svn: 374056
When -pg option is present than a call to _mcount is inserted into every
function. However since the proper ABI was not followed then the generated
gmon.out did not give proper results. By inserting needed instructions
before every _mcount we can fix this.
Differential Revision: https://reviews.llvm.org/D68390
llvm-svn: 374055
Summary:
This adds a `-max-configs-per-opcode` option to limit the number of
configs per opcode.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68642
llvm-svn: 374054
Since the benchmarks build with -nostdlib, they need to manually link
against some system libraries that are used by the benchmarks and the
GoogleBenchmark library itself.
Previously, we'd rely on the fact that these libraries were linked
through the PUBLIC dependencies of cxx_shared/cxx_static. However,
if we were to make these dependencies PRIVATE (as they should be
because they are implementation details of libc++), the benchmarks
would fail to link. This commit remediates that.
llvm-svn: 374053
Summary:
This patch adds the definitions of the constants and structures
necessary to interpret the MemoryInfoList minidump stream, as well as
the object::MinidumpFile interface to access the stream.
While the code is fairly simple, there is one important deviation from
the other minidump streams, which is worth calling out explicitly.
Unlike other "List" streams, the size of the records inside
MemoryInfoList stream is not known statically. Instead it is described
in the stream header. This makes it impossible to return
ArrayRef<MemoryInfo> from the accessor method, as it is done with other
streams. Instead, I create an iterator class, which can be parameterized
by the runtime size of the structure, and return
iterator_range<iterator> instead.
Reviewers: amccarth, jhenderson, clayborg
Subscribers: JosephTremoulet, zturner, markmentovai, lldb-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68210
llvm-svn: 374051
It's better style to use PRIVATE when linking libraries to executables,
and it doesn't make a difference since executables don't need to propagate
their link-time dependencies anyway.
llvm-svn: 374050
it just happened to break the bot right when I did my push. So I'm undoing
this mornings incorrect push. I've also kicked off an email to hopefully
get the bot fixed the correct way.
llvm-svn: 374049
On my system, llvm-objcopy was refusing to remove the .dynsym section
because it was still referenced from .rela.plt. Remove that section too,
and clarify that this is needed only because llvm-objcopy
--only-keep-debug does not work (does not set the sections to
SHT_NOBITS). Also, ensure that the test is not creating temporary files
in the source tree.
llvm-svn: 374046
Tim Northover remarked that the added patterns for fmls fp16
produce wrong code in case the fsub instruction has a
multiplication as its first operand, i.e., all the patterns FMLSv*_OP1:
> define <8 x half> @test_FMLSv8f16_OP1(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
> ; CHECK-LABEL: test_FMLSv8f16_OP1:
> ; CHECK: fmls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h
> entry:
>
> %mul = fmul fast <8 x half> %c, %b
> %sub = fsub fast <8 x half> %mul, %a
> ret <8 x half> %sub
> }
>
> This doesn't look right to me. The exact instruction produced is "fmls
> v0.8h, v2.8h, v1.8h", which I think calculates "v0 - v2*v1", but the
> IR is calculating "v2*v1-v0". The equivalent <4 x float> code also
> doesn't emit an fmls.
This patch generates an fmla and negates the value of the operand2 of the fsub.
Inspecting the pattern match, I found that there was another mistake in the
opcode to be selected: matching FMULv4*16 should generate FMLSv4*16
and not FMLSv2*32.
Tested on aarch64-linux with make check-all.
Differential Revision: https://reviews.llvm.org/D67990
llvm-svn: 374044
* Adds a TypeSize struct to represent the known minimum size of a type
along with a flag to indicate that the runtime size is a integer multiple
of that size
* Converts existing size query functions from Type.h and DataLayout.h to
return a TypeSize result
* Adds convenience methods (including a transparent conversion operator
to uint64_t) so that most existing code 'just works' as if the return
values were still scalars.
* Uses the new size queries along with ElementCount to ensure that all
supported instructions used with scalable vectors can be constructed
in IR.
Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen
Reviewed By: rovka, sdesmalen
Differential Revision: https://reviews.llvm.org/D53137
llvm-svn: 374042