We have wrappers for several other ValueTracking methods that take care of passing all of the analysis and assumption cache parameters. This extends it to isKnownToBeAPowerOfTwo.
llvm-svn: 303924
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.
long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();
void foo(long a, long b, long c, long d) {
g1 = a * b;
if (__builtin_expect(g2 > 3, 0)) {
a = c;
b = d;
g2 = a * b;
}
g3 = a * b; // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.
Differential Revision: https://reviews.llvm.org/D32252
llvm-svn: 303923
Rename the DEBUG_TYPE to match the names of corresponding passes where
it makes sense. Also establish the pattern of simply referencing
DEBUG_TYPE instead of repeating the passname where possible.
llvm-svn: 303921
Originally this was intended to be set up so that when linking
a PDB which refers to a type server, it would only visit the
PDB once, and on subsequent visitations it would just skip it
since all the records had already been added.
Due to some C++ scoping issues, this was not occurring and it
was revisiting the type server every time, which caused every
record to end up being thrown away on all subsequent visitations.
This doesn't affect the performance of linking clang-cl generated
object files because we don't use type servers, but when linking
object files and libraries generated with /Zi via MSVC, this means
only 1 object file has to be linked instead of N object files, so
the speedup is quite large.
llvm-svn: 303920
Previously, every time we wanted to serialize a field list record, we
would create a new copy of FieldListRecordBuilder, which would in turn
create a temporary instance of TypeSerializer, which itself had a
std::vector<> that was about 128K in size. So this 128K allocation was
happening every time. We can re-use the same instance over and over, we
just have to clear its internal hash table and seen records list between
each run. This saves us from the constant re-allocations.
This is worth an ~18.5% speed increase (3.75s -> 3.05s) in my tests.
Differential Revision: https://reviews.llvm.org/D33506
llvm-svn: 303919
Previously it would do a character by character search for a null
terminator, to account for the fact that an arbitrary stream need not
store its data contiguously so you couldn't just do a memchr. However, the
stream API has a function which will return the longest contiguous chunk
without doing a copy, and by using this function we can do a memchr on the
individual chunks. For certain types of streams like data from object
files etc, this is guaranteed to find the null terminator with only a
single memchr, but even with discontiguous streams such as
MappedBlockStream, it's rare that any given string will cross a block
boundary, so even those will almost always be satisfied with a single
memchr.
This optimization is worth a 10-12% reduction in link time (4.2 seconds ->
3.75 seconds)
Differential Revision: https://reviews.llvm.org/D33503
llvm-svn: 303918
Summary:
DbiStreamBuilder calculated the offset of the source file names inside
the file info substream as the size of the file info substream minus
the size of the file names. Since the file info substream is padded to
a multiple of 4 bytes, this caused the first file name to be aligned
on a 4-byte boundary. By contrast, DbiModuleList would read the file
names immediately after the file name offset table, without skipping
to the next 4-byte boundary. This change makes it so that the file
names are written to the location where DbiModuleList expects them,
and puts any necessary padding for the file info substream after the
file names instead of before it.
Reviewers: amccarth, rnk, zturner
Reviewed By: amccarth, zturner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33475
llvm-svn: 303917
It was using the number of blocks of the entire PDB file as the number
of blocks of each stream that was created. This was only an issue in
the readLongestContiguousChunk function, which was never called prior.
This bug surfaced when I updated an algorithm to use this function and
the algorithm broke.
llvm-svn: 303916
Also, include global entries for all data symbols, not
just external ones, since these are referenced by the
relocation records.
Add a test case that includes unnamed data.
Differential Revision: https://reviews.llvm.org/D33079
llvm-svn: 303915
A profile shows the majority of time doing type merging is spent
deserializing records from sequences of bytes into friendly C++ structures
that we can easily access members of in order to find the type indices to
re-write.
Records are prefixed with their length, however, and most records have
type indices that appear at fixed offsets in the record. For these
records, we can save some cycles by just looking at the right place in the
byte sequence and re-writing the value, then skipping the record in the
type stream. This saves us from the costly deserialization of examining
every field, including potentially null terminated strings which are the
slowest, even though it was unnecessary to begin with.
In addition, we apply another optimization. Previously, after
deserializing a record and re-writing its type indices, we would
unconditionally re-serialize it in order to compute the hash of the
re-written record. This would result in an alloc and memcpy for every
record. If no type indices were re-written, however, this was an
unnecessary allocation. In this patch re-writing is made two phase. The
first phase discovers the indices that need to be rewritten and their new
values. This information is passed through to the de-duplication code,
which only copies and re-writes type indices in the serialized byte
sequence if at least one type index is different.
Some records have type indices which only appear after variable length
strings, or which have lists of type indices, or various other situations
that can make it tricky to make this optimization. While I'm not giving up
on optimizing these cases as well, for now we can get the easy cases out
of the way and lay the groundwork for more complicated cases later.
This patch yields another 50% speedup on top of the already large speedups
submitted over the past 2 days. In two tests I have run, I went from 9
seconds to 3 seconds, and from 16 seconds to 8 seconds.
Differential Revision: https://reviews.llvm.org/D33480
llvm-svn: 303914
By default, CMake uses a 32-bit toolchain, even when on a 64-bit platform targeting a 64-bit build. However, due to the size of the binaries involved, this can cause linker instabilities (such as the linker running out of memory). Guide people to the correct solution to get CMake to use the native toolchain.
llvm-svn: 303912
lldb: libedit produces garbled, unusable input on Linux
Apply patch from Christos Zoulas, upstream libedit developer.
It has been tested on NetBSD/amd64.
New code supports combination of wide libedit and disabled
LLDB_EDITLINE_USE_WCHAR, which was the popular case on Linux
systems.
llvm-svn: 303907
PPC::GETtlsADDR is lowered to a branch and a nop, by the assembly
printer. Its size was incorrectly marked as 4, correct it to 8. The
incorrect size can cause incorrect branch relaxation in
PPCBranchSelector under the right conditions.
llvm-svn: 303904
Summary:
This is used in the Linux kernel, and effectively just means "print an
address". This brings back r193593.
Reviewed by: Renato Golin
Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls
Subscribers: aemerson, javed.absar, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D33558
llvm-svn: 303901
This fixes an oversight in r300522, which changed alloca
dbg.values to no longer emit a DW_OP_deref.
The array.ll testcase was regenerated from source.
Fixes PR33166:
https://bugs.llvm.org/show_bug.cgi?id=33166
llvm-svn: 303897
This patch updates the promise() member to match the current spec.
Specifically it removes the non-const overload and make the return
type of the const overload non-const.
This patch also makes the ASSERT_NOT_NOEXCEPT tests libc++ specific,
since other implementations may be free to strengthen the specification.
llvm-svn: 303895
Turns out gold doesn't use the DW_AT_GNU_pubnames to decide whether to
parse the rest of the DIEs when building gdb-index. This causes gold to
trip over LLVM's output when there are DW_FORM_ref_addr present.
Gold does use the presence of a debug_gnu_pub{names,types} entry for the
CU to skip parsing the debug_info portion, so make sure that's included
even when empty (technically, when empty there couldn't be any ref_addr
anyway - it only came up when gmlt didn't produce any (even non-empty)
pubnames - but given what that reveals about gold's implementation, this
seems like a good thing to do for consistency).
llvm-svn: 303894
Summary:
Previously, the yaml2pdb subcommand of llvm-pdbdump only
included object file names in module info if a module info stream was
present. This change makes it so that we include the object file name
even if there is no module info stream for the module. As a result,
running
llvm-pdbdump pdb2yaml -dbi-module-info original.pdb > original.yaml &&
llvm-pdbdump yaml2pdb -pdb=new.pdb original.yaml && llvm-pdbdump
pdb2yaml -dbi-module-info new.pdb > new.yaml now produces identical
original.yaml and new.yaml files.
Reviewers: amccarth, zturner
Reviewed By: zturner
Subscribers: fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D33463
llvm-svn: 303891
If you pass /delayload:<dllname> to the COFF linker, it creates thunks
so that DLLs are loaded when they are used for the first time instead of
load-time.
This mechanism do not work for data symbols as there's no way to trap
acccesses to data imported from DLLs. (Technically, I think if we do not
initially map dllimport tables in memory, we could actually trap accesses
and delay-load data symbols, but that's not what Windows do.)
This patch is to report an error when you try to delay-load data symbols.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33106
Differential Revision: https://reviews.llvm.org/D33557
llvm-svn: 303890
VSO#391542 "Types can't be convertible to nullptr_t"
Also put internal bug numbers on the workarounds in test_workarounds.h for correlation.
Differential Revision: https://reviews.llvm.org/D33290
llvm-svn: 303889
Summary:
This required for any users who call exit() after creating
thread-specific data, as tls destructors are only called when
pthread_exit() or pthread_cancel() are used. This should also
match tls behavior on linux.
Getting the base address of the tls section is straightforward,
as it's stored as a section offset in %gs. The size is a bit trickier
to work out, as there doesn't appear to be any official documentation
or source code referring to it. The size used in this patch was determined
by taking the difference between the base address and the address of the
subsequent memory region returned by vm_region_recurse_64, which was
1024 * sizeof(uptr) on all threads except the main thread, where it was
larger. Since the section must be the same size on all of the threads,
1024 * sizeof(uptr) seemed to be a reasonable size to use, barring
a more programtic way to get the size.
1024 seems like a reasonable number, given that PTHREAD_KEYS_MAX
is 512 on darwin, so pthread keys will fit inside the region while
leaving space for other tls data. A larger size would overflow the
memory region returned by vm_region_recurse_64, and a smaller size
wouldn't leave room for all the pthread keys. In addition, the
stress test added here passes, which means that we are scanning at
least the full set of possible pthread keys, and probably
the full tls section.
Reviewers: alekseyshl, kubamracek
Subscribers: krytarowski, llvm-commits
Differential Revision: https://reviews.llvm.org/D33215
llvm-svn: 303887
The existing implementation ran CHECKs to assert that the thread state
was stored inside the tls. However, the mac implementation of tsan doesn't
store the thread state in tls, so these checks fail once darwin tls support
is added to the sanitizers. Only run these checks on platforms where
the thread state is expected to be contained in the tls.
llvm-svn: 303886
Summary:
Apparently Windows's `UnmapOrDie` doesn't support partial unmapping. Which
makes the new region allocation technique not Windows compliant.
Reviewers: alekseyshl, dvyukov
Reviewed By: alekseyshl
Subscribers: llvm-commits, kubamracek
Differential Revision: https://reviews.llvm.org/D33554
llvm-svn: 303883
Summary:
Currently, AllocateRegion has a tendency to fragment memory: it allocates
`2*kRegionSize`, and if the memory is aligned, will unmap `kRegionSize` bytes,
thus creating a hole, which can't itself be reused for another region. This
is exacerbated by the fact that if 2 regions get allocated one after another
without any `mmap` in between, the second will be aligned due to mappings
generally being contiguous.
An idea, suggested by @alekseyshl, to prevent such a behavior is to have a
stash of regions: if the `2*kRegionSize` allocation is properly aligned, split
it in two, and stash the second part to be returned next time a region is
requested.
At this point, I thought about a couple of ways to implement this:
- either an `IntrusiveList` of regions candidates, storing `next` at the
begining of the region;
- a small array of regions candidates existing in the Primary.
While the second option is more constrained in terms of size, it offers several
advantages:
- security wise, a pointer in a region candidate could be overflowed into, and
abused when popping an element;
- we do not dirty the first page of the region by storing something in it;
- unless several threads request regions simultaneously from different size
classes, the stash rarely goes above 1 entry.
I am not certain about the Windows impact of this change, as `sanitizer_win.cc`
has its own version of MmapAlignedOrDie, maybe someone could chime in on this.
MmapAlignedOrDie is effectively unused after this change and could be removed
at a later point. I didn't notice any sizeable performance gain, even though we
are saving a few `mmap`/`munmap` syscalls.
Reviewers: alekseyshl, kcc, dvyukov
Reviewed By: alekseyshl
Subscribers: llvm-commits, kubamracek
Differential Revision: https://reviews.llvm.org/D33454
llvm-svn: 303879