Minor cleanup in ARMInstrVFP.td: removed some FIXMEs and added a MC test for
vcmp that was actually missing.
Differential Revision: https://reviews.llvm.org/D30745
llvm-svn: 297376
In case LLVM pointers are annotated with !dereferencable attributes/metadata
or LLVM can look at the allocation from which a pointer is derived, we can know
that dereferencing pointers is safe and can be done unconditionally. We use this
information to proof certain pointers as save to hoist and then hoist them
unconditionally.
llvm-svn: 297375
entire SCC before iterating on newly-introduced call edges resulting
from any inlined function bodies.
This more closely matches the behavior of the old PM's inliner. While it
wasn't really clear to me initially, this behavior is actually essential
to the inliner behaving reasonably in its current design.
Because the inliner is fundamentally a bottom-up inliner and all of its
cost modeling is designed around that it often runs into trouble within
an SCC where we don't have any meaningful bottom-up ordering to use. In
addition to potentially cyclic, infinite inlining that we block with the
inline history mechanism, it can also take seemingly simple call graph
patterns within an SCC and turn them into *insanely* large functions by
accidentally working top-down across the SCC without any of the
threshold limitations that traditional top-down inliners use.
Consider this diabolical monster.cpp file that Richard Smith came up
with to help demonstrate this issue:
```
template <int N> extern const char *str;
void g(const char *);
template <bool K, int N> void f(bool *B, bool *E) {
if (K)
g(str<N>);
if (B == E)
return;
if (*B)
f<true, N + 1>(B + 1, E);
else
f<false, N + 1>(B + 1, E);
}
template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }
extern bool *arr, *end;
void test() { f<false, 0>(arr, end); }
```
When compiled with '-DMAX=N' for various values of N, this will create an SCC
with a reasonably large number of functions. Previously, the inliner would try
to exhaust the inlining candidates in a single function before moving on. This,
unfortunately, turns it into a top-down inliner within the SCC. Because our
thresholds were never built for that, we will incrementally decide that it is
always worth inlining and proceed to flatten the entire SCC into that one
function.
What's worse, we'll then proceed to the next function, and do the exact same
thing except we'll skip the first function, and so on. And at each step, we'll
also make some of the constant factors larger, which is awesome.
The fix in this patch is the obvious one which makes the new PM's inliner use
the same technique used by the old PM: consider all the call edges across the
entire SCC before beginning to process call edges introduced by inlining. The
result of this is essentially to distribute the inlining across the SCC so that
every function incrementally grows toward the inline thresholds rather than
allowing the inliner to grow one of the functions vastly beyond the threshold.
The code for this is a bit awkward, but it works out OK.
We could consider in the future doing something more powerful here such as
prioritized order (via lowest cost and/or profile info) and/or a code-growth
budget per SCC. However, both of those would require really substantial work
both to design the system in a way that wouldn't break really useful
abstraction decomposition properties of the current inliner and to be tuned
across a reasonably diverse set of code and workloads. It also seems really
risky in many ways. I have only found a single real-world file that triggers
the bad behavior here and it is generated code that has a pretty pathological
pattern. I'm not worried about the inliner not doing an *awesome* job here as
long as it does *ok*. On the other hand, the cases that will be tricky to get
right in a prioritized scheme with a budget will be more common and idiomatic
for at least some frontends (C++ and Rust at least). So while these approaches
are still really interesting, I'm not in a huge rush to go after them. Staying
even closer to the existing PM's behavior, especially when this easy to do,
seems like the right short to medium term approach.
I don't really have a test case that makes sense yet... I'll try to find a
variant of the IR produced by the monster template metaprogram that is both
small enough to be sane and large enough to clearly show when we get this wrong
in the future. But I'm not confident this exists. And the behavior change here
*should* be unobservable without snooping on debug logging. So there isn't
really much to test.
The test case updates come from two incidental changes:
1) We now visit functions in an SCC in the opposite order. I don't think there
really is a "right" order here, so I just update the test cases.
2) We no longer compute some analyses when an SCC has no call instructions that
we consider for inlining.
llvm-svn: 297374
One of the current limitations of DeLICM is that it only creates
PHI WRITEs that it knows are read by some PHI. Such writes may not span
all instances of a statement. Polly's code generator currently does not
support MemoryAccesses that are not executed in all instances
('partial accesses') and so has to give up on a possible mapping.
This workaround has once been suggested by Tobias Grosser: Try to
interpolate an arbitrary expansion to all instances. It will be checked
for possible conflicts with the existing Knowledge and can be applied if
the conflict checking result is that no semantics are changed.
Expansion is done by simplifying the mapping by coalescing with the hope
that coalescing will find a polyhedral 'rule' of the relevant map. It is
then 'gist'-ed using the domain of the relevant instances such that the
rule is expanded to the universe and finally intersected with the domain
of all statement instances.
The expansion makes conflicts become more likely, the found rule may
still not encompass all statement instances and the found rule exposes
internals of isl's implementation of coalesce and gist. The latter means
that the result depends on how much effort the implementation invests
into finding a rule which may change between versions of isl. Trivial
implementations of gist and coalesce just return the input arguments.
A patch that makes codegen support partial accesses is in preparation
as well.
Differential Revision: https://reviews.llvm.org/D30763
llvm-svn: 297373
Fix a machine verifier issue where a instruction was using a invalid
register. The return pseudo is expanded and has the return address
register added to it. The return register may have been spuriously
mark as killed earlier.
This partially resolves PR/27458
Thanks to Quentin Colombet for reporting the issue!
llvm-svn: 297372
Summary:
Remove line number from Symbol identity.
For our purposes (include-fixer and clangd autocomplete), function overloads
within the same header should mostly be treated as a single combined symbol.
We may want to track individual occurrences (line number, full type info)
and aggregate this during mapreduce, but that's not done here.
Reviewers: hokein, bkramer
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D30685
llvm-svn: 297371
People keep hitting on spurious failures in malloc/free routines when using sanitizers
with shared libraries dlopened with RTLD_DEEPBIND (see https://github.com/google/sanitizers/issues/611 for details).
Let's check for this flag and bail out with warning message instead of failing in random places.
Differential Revision: https://reviews.llvm.org/D30504
llvm-svn: 297370
This is necessary to get debug builds of unit tests working on linux.
I think we are at a point where removing dependencies does not prevent
us from depending on the whole world yet. What it does do though, is
make the dependency chains longer as the dependency graph gets sparser,
which means we need to repeat the libraries more times to get the thing
to link.
llvm-svn: 297369
Summary:
This fixes two threading issues in the logging code. The access to the
mask and options flags had data races when we were trying to
enable/disable logging while another thread was writing to the log.
Since we can log from almost any context, and we want it to be fast, so
I avoided locking primitives and used atomic variables instead. I have
also removed the (unused) setters for the mask and flags to make sure
that the only way to set them is through the enable/disable channel
functions.
I also add tests, which when run under tsan, verify that the use cases
like "doing an LLDB_LOGV while another thread disables logging" are
data-race-free.
Reviewers: zturner, clayborg
Subscribers: lldb-commits
Differential Revision: https://reviews.llvm.org/D30702
llvm-svn: 297368
gold linker manual describes them as:
-z text Do not permit relocations in read-only segments
-z notext Permit relocations in read-only segments (default)
In LLD default is to not permit them. Patch implements -z notext.
Differential revision: https://reviews.llvm.org/D30530
llvm-svn: 297366
.eh_frame_hdr is a header constructed for .eh_frame sections.
We do not proccess .eh_frame when doing relocatable output,
so should not try to create .eh_frame_hdr too.
Previous behavior without this patch is segfault.
Fixes PR32118.
Differential revision: https://reviews.llvm.org/D30566
llvm-svn: 297365
The AddressSpace.hpp header declares two classes: LocalAddressSpace and
RemoteAddressSpace. These classes are only used in a very small number
of source files, but passed in as template arguments to many other
classes.
Let's go ahead and only include AddressSpace.hpp in source files where
at least one of these two classes is mentioned. This gets rid of a
cyclic header dependency that was already present, but only caused
breakage on macOS until recently.
Reported by: Marshall Clow
llvm-svn: 297364
Summary:
A `co_await arg` expression has a dependent type whenever the promise type is still dependent, even if the argument to co_await is not. This is because we cannot attempt the `await_transform(<arg>)` until after we know the promise type.
This patch fixes an assertion in the constructor of `DependentCoawaitExpr` that asserted that `arg` must also be dependent.
Reviewers: rsmith, GorNishanov, aaron.ballman
Reviewed By: GorNishanov
Subscribers: mehdi_amini, cfe-commits
Differential Revision: https://reviews.llvm.org/D30772
llvm-svn: 297358
Previously, if you have foo=bar in a definition file, this assertion
could fire because when symbols are read from file they could be mangled.
It seems that due to historical reasons underscore mangling scheme is
really ad-hoc, and I cannot find a clean way to handle this. I had
to just de-mangle symbols to search again.
llvm-svn: 297357
Summary:
This patch adds passing a coroutine_handle object to await_suspend calls.
It builds the coroutine_handle using coroutine_handle<PromiseType>::from_address(__builtin_coro_frame()).
(a revision of https://reviews.llvm.org/D26316 that for some reason refuses to apply via arc patch)
Reviewers: GorNishanov
Subscribers: mehdi_amini, cfe-commits, EricWF
Differential Revision: https://reviews.llvm.org/D30769
llvm-svn: 297356
basic_string::replace() has the below line
__sz += __n2 - __n1;
which fails overflow checks if __n1 > __n2, as the negative result
from the subtraction then overflows the original __sz when added to
it.
This behavior is valid as unsigned integer overflow is defined to wrap
around the maximum value and that produces the correct final value for
__sz. Therefore, we disable this check on this function.
llvm-svn: 297355
Summary:
In a .symver assembler directive like:
.symver name, name2@@nodename
"name2@@nodename" should get the same symbol binding as "name".
While the ELF object writer is updating the symbol binding for .symver
aliases before emitting the object file, not doing so when the module
inline assembly is handled by the RecordStreamer is causing the wrong
behavior in *LTO mode.
E.g. when "name" is global, "name2@@nodename" must also be marked as
global. Otherwise, the symbol is skipped when iterating over the LTO
InputFile symbols (InputFile::Symbol::shouldSkip). So, for example,
when performing any *LTO via the gold-plugin, the versioned symbol
definition is not recorded by the plugin and passed back to the
linker. If the object was in an archive, and there were no other symbols
needed from that object, the object would not be included in the final
link and references to the versioned symbol are undefined.
The llvm-lto2 tests added will give an error about an unused symbol
resolution without the fix.
Reviewers: rafael, pcc
Reviewed By: pcc
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D30485
llvm-svn: 297332
Put proper guards around _LIBCPP_METHOD_TEMPLATE_IMPLICIT_INSTANTIATION_VIS.
No functional change on non-Windows. Avoids incorrect macro redefinition
on Windows.
llvm-svn: 297330
!type metadata can not be dropped. An alternative to this is adding
!type metadata from the replaced globals to the replacement, but that
may weaken type tests and make them slower at the same time.
The merged global gets !dbg metadata from replaced globals, and can
end up with multiple debug locations.
llvm-svn: 297327
Some of the magic functions take arguments of arbitrary type. However,
for semantic correctness, the compiler still requires a declaration
of these functions with the correct type. Since C does not have
argument-type-overloaded function, this made those functions hard to
use in C code. Improve this situation by allowing arbitrary suffixes
in the affected magic functions' names, thus allowing the user to
create different declarations for different types.
A patch by Keno Fischer!
Differential Revision: https://reviews.llvm.org/D30589
llvm-svn: 297325
Add a bug visitor to the taint checker to make it easy to distinguish where
the tainted value originated. This is especially useful when the original
taint source is obscured by complex data flow.
A patch by Vlad Tsyrklevich!
Differential Revision: https://reviews.llvm.org/D30289
llvm-svn: 297324
This commit changes the BumpPtrAllocator for suffix tree nodes to a SpecificBumpPtrAllocator.
Before, node construction was leaking memory because of the DenseMap in SuffixTreeNodes.
Changing this to a SpecificBumpPtrAllocator allows this memory to properly be released.
llvm-svn: 297319
When the array indexes are all determined by GVN to be constants,
a call is made to constant-folding to optimize/simplify the address
computation.
The constant-folding, however, makes a mistake in that it sometimes reads
back stale Idxs instead of NewIdxs, that it re-computed in previous iteration.
This leads to incorrect addresses coming out of constant-folding to GEP.
A test case is included. The error is only triggered when indexes have particular
patterns that the stale/new index updates interplay matters.
Reviewers: Daniel Berlin
Differential Revision: https://reviews.llvm.org/D30642
llvm-svn: 297317
There are two possible return values for strerror_r:
On OS X, the return value is always `int`.
On Linux, the return value can be either `char *` or `int`, depending
on the value of:
`(_POSIX_C_SOURCE >= 200112L || _XOPEN_SOURCE >= 600) && ! _GNU_SOURCE`
Because OS X interceptors require a matching function signature,
split out the two cases into separate interceptors, using the above
information to determine the correct signature for a given build.
llvm-svn: 297315
We already have a function create_directories() which can create
an entire tree, and remove() which can remove an empty directory,
but we do not have remove_directories() which can remove an entire
tree. This patch adds such a function.
Because removing a directory tree can have dangerous consequences
when the tree contains a directory symlink, the patch here updates
the existing directory_iterator construct to optionally not follow
symlinks (previously it would always follow symlinks). The delete
algorithm uses this flag so that for symlinks, only the links are
removed, and not the targets.
On Windows this is implemented with SHFileOperation, which also
does not recurse into symbolic links or junctions.
Differential Revision: https://reviews.llvm.org/D30676
llvm-svn: 297314
With this we have a single section hierarchy. It is a bit less code,
but the main advantage will be in a future patch being able to handle
foo = symbol_in_obj;
in a linker script. Currently that fails since we try to find the
output section of symbol_in_obj. With this we should be able to just
return an InputSection from the expression.
llvm-svn: 297313
- Mips is architecture, not a toolchain
- Might help eliminate the confusion in the future by not having header files with the same name
Differential Revision: https://reviews.llvm.org/D30753
llvm-svn: 297312