This patch adds --thread option and use parallel_for_each to write
sections in regular OutputSections.
This is the first patch to use more than one threads.
Note that --thread is off by default because it is experimental.
At this moment I still want to focus on single thread performance
because multi-threading is not a magic wand to fix performance
problems after all. It is generally very hard to make a slow program
faster by threads. Therefore, I want to make the linker as efficient
as possible first and then look for opportunity to make it even faster
using more than one core.
Here are some numbers to link programs with and without --threads
and using GNU gold. Numbers are in seconds.
Clang
w/o --threads 0.697
w --threads 0.528
gold 1.643
Scylla
w/o --threads 5.032
w --threads 4.935
gold 6.791
GNU gold
w/o --threads 0.550
w --threads 0.551
gold 0.737
I limited the number of cores these processes can use to 4 using
perf command, so although my machine has 20 physical cores, the
performance gain I observed should be reproducible with a machine
which is not as beefy as mine.
llvm-svn: 263190
llvm::getDISubprogram walks the instructions in a function, looking for one in the scope of the current function, so that it can find the !dbg entry for the subprogram itself.
Now that !dbg is attached to functions, this should not be necessary. This patch changes all uses to just query the subprogram directly on the function.
Ideally this should be NFC, but in reality its possible that a function:
has no !dbg (in which case there's likely a bug somewhere in an opt pass), or
that none of the instructions had a scope referencing the function, so we used to not find the !dbg on the function but now we will
Reviewed by Duncan Exon Smith.
Differential Revision: http://reviews.llvm.org/D18074
llvm-svn: 263184
The swig typemaps had some magic for output File *'s on OS X that made:
SBDebugger.GetOutputFileHandle()
actually work. That was protected by a "#ifdef __MACOSX__", but the corresponding define
got lost going from the Darwin shell scripts to the python scripts for running
swig, so the code was elided. I need to pass the define to SWIG, but only when
targetting Darwin.
So I added a target-platform argument to prepare_bindings, and if that
is Darwin, I pass -D__APPLE__ to swig, and that activates this code again, and
GetOutputFileHandle works again. Note, I only pass that argument for the Xcode
build. I'm sure it is possible to do that for cmake, but my cmake-foo is weak.
I should have been able to write a test for this by creating a debugger, setting the
output file handle to something file, writing to it, getting the output file handle
and reading it. But SetOutputFileHandle doesn't seem to work from Python, so I'd
have to write a pexpect test to test this, which I'd rather not do.
llvm-svn: 263183
This is reduced from an issue found in practice.
The original version of D18012 needed another patch to handle this, but
it now works since we are using a more correct GV->hasAppendingLinkage()
check that Rafael suggested.
This is what remains of that other patch.
llvm-svn: 263181
LLVM Gold plugin decides which instance of a common symbol it wants
based on the symbol size in claim_file_hook. If the file that
contains the chosen instance is later dropped from the link, we end
up with an undefined reference.
This change delays this decision until the set of the included files
is known.
llvm-svn: 263180
Summary:
More generally, appending linkage is a special case that we don't want
to create a SymbolBody for.
Reviewers: rafael, ruiu
Subscribers: Bigcheese, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18012
llvm-svn: 263179
Summary:
Adds strlen to the common interceptors, under a new common flag
intercept_strlen. This provides better sharing of interception code among
sanitizers and cleans up the inconsistent type declarations of the
previously duplicated interceptors.
Removes the now-duplicate strlen interceptor from asan, msan, and tsan.
The entry check semantics are normalized now for msan and asan, whose
private strlen interceptors contained multiple layers of checks that
included impossible-to-reach code. The new semantics are identical to the
old: bypass interception if in the middle of init or if both on Mac and not
initialized; else, call the init routine and proceed.
Patch by Derek Bruening!
Reviewers: samsonov, vitalybuka
Subscribers: llvm-commits, kcc, zhaoqin
Differential Revision: http://reviews.llvm.org/D18020
llvm-svn: 263177
Summary:
Use InternalScopedString more extensively. This reduces the number of
write() syscalls, and reduces the chance that UBSan output will be
mixed with program output.
Reviewers: vitalybuka
Subscribers: kcc, llvm-commits
Differential Revision: http://reviews.llvm.org/D18068
llvm-svn: 263176
Only around 25% of the intrinsics in this file are documented here. The patches for the other half will be sent out later.
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream.
llvm-svn: 263175
The code assumed that we always had a preheader without making the pass
dependent on LoopSimplify.
Thanks to Mattias Eriksson V for reporting this.
llvm-svn: 263173
Looking at the IR definition of a masked load made me realize
there was no reason to use a shuffle here, so we don't need
to convert the format of the mask at all.
llvm-svn: 263167
When the parent of an expression is anonymous, skip adding '.' or '->' before the expression name.
Differential Revision: http://reviews.llvm.org/D18005
llvm-svn: 263166
Removed lldb_private::File::Duplicate() and the copy constructor and the assignment operator that used to duplicate the file handles and made them private so no one uses them. Previously the lldb_private::File::Duplicate() function duplicated files that used file descriptors, (int) but not file streams (FILE *), so the lldb_private::File::Duplicate() function only worked some of the time. No one else excep thee ScriptInterpreterPython was using these functions, so that aren't needed nor desired. Previously every time you would drop into the python interpreter we would duplicate files, and now we avoid this file churn.
<rdar://problem/24877720>
llvm-svn: 263161
Now ASan can return virtual memory to the underlying OS. Portable
sanitizer runtime code needs to be aware that UnmapOrDie cannot unmap
part of previous mapping.
In particular, this required changing how we implement MmapAlignedOrDie
on Windows, which is what Allocator32 uses.
The new code first attempts to allocate memory of the given size, and if
it is appropriately aligned, returns early. If not, it frees the memory
and attempts to reserve size + alignment bytes. In this region there
must be an aligned address. We then free the oversized mapping and
request a new mapping at the aligned address immediately after. However,
a thread could allocate that virtual address in between our free and
allocation, so we have to retry if that allocation fails. The existing
thread creation stress test managed to trigger this condition, so the
code isn't totally untested.
Reviewers: samsonov
Differential Revision: http://reviews.llvm.org/D17431
llvm-svn: 263160
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 263159
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17270
llvm-svn: 263158
Summary:
Recently I saw the test `TestCases/Posix/print_cmdline.cc` failing on
FreeBSD, with "expected string not found in input". This is because
asan could not retrieve the command line arguments properly.
In `lib/sanitizer_common/sanitizer_linux.cc`, this is taken care of by
the `GetArgsAndEnv()` function, but it uses `__libc_stack_end` to get at
the required data. This variable does not exist on BSDs; the regular
way to retrieve the arguments and environment information is via the
`kern.ps_strings` sysctl.
I added this functionality in sanitizer_linux.cc, as a separate #ifdef
block in `GetArgsAndEnv()`. Also, `ReadNullSepFileToArray()` becomes
unused due to this change. (It won't work on FreeBSD anyway, since
`/proc` is not mounted by default.)
Reviewers: kcc, emaste, joerg, davide
Subscribers: llvm-commits, emaste
Differential Revision: http://reviews.llvm.org/D17832
llvm-svn: 263157
We can argue about a maximum alignment of a group of symbols,
but for each symbol, there is only one alignment.
So it is a bit weird that each symbol has a "maximum alignment".
llvm-svn: 263151
R_X86_64_DTPOFF64 was not handled properly.
Next sample app was impossible to link before this patch:
~/pg/release/bin/clang -target x86_64-pc-linux testthread.cpp -c -g
~/pg/d+a/bin/ld.lld testthread.o
"Unknown TLS optimization" (value was 17)
__thread int x = 0;
void _start() {
}
It works fine now.
Differential revision: http://reviews.llvm.org/D18039
llvm-svn: 263150
Given the following test case:
typedef struct {
const char *name;
id field;
} Test9;
extern void doSomething(Test9 arg);
void test9() {
Test9 foo2 = {0, 0};
doSomething(foo2);
}
With a release compiler, we don't emit any message and silently ignore the
variable "foo2". With an assert compiler, we get an assertion failure.
The root cause —————————————
Back in r140457 we gave InitListChecker a verification-only mode, and will use
CanUseDecl instead of DiagnoseUseOfDecl for verification-only mode.
These two functions handle unavailable issues differently:
In Sema::CanUseDecl, we say the decl is invalid when the Decl is unavailable and
the current context is available.
In Sema::DiagnoseUseOfDecl, we say the decl is usable by ignoring the return
code of DiagnoseAvailabilityOfDecl
So with an assert build, we will hit an assertion in diagnoseListInit
assert(DiagnoseInitList.HadError() &&
"Inconsistent init list check result.");
The fix -------------------
If we follow what is implemented in CanUseDecl and treat Decls with
unavailable issues as invalid, the variable decl of “foo2” will be marked as
invalid. Since unavailable checking is processed in delayed diagnostics
(r197627), we will silently ignore the diagnostics when we find out that
the variable decl is invalid.
We add a flag "TreatUnavailableAsInvalid" for the verification-only mode.
For overload resolution, we want to say decls with unavailable issues are
invalid; but for everything else, we should say they are valid and
emit diagnostics. Depending on the value of the flag, CanUseDecl
can return different values for unavailable issues.
rdar://23557300
Differential Revision: http://reviews.llvm.org/D15314
llvm-svn: 263149
Summary:
Unless we plan to do later postpass metadata linking (ThinLTO special mode),
always invoke metadata materialization at the start of IRLinker::run().
This avoids the need for clients who use lazy metadata loading to
explicitly invoke materializeMetadata before the IRMover, which in
turn invokes IRLinker::run and needs materialized metadata for mapping.
Came up in the context of an LLD issue (D17982).
Reviewers: rafael
Subscribers: silvas, llvm-commits
Differential Revision: http://reviews.llvm.org/D17992
llvm-svn: 263143
Summary: This is an initial setup in order to move some additional tests from Linux onto Posix.
I also moved decorate_proc_maps onto the Linux directory
Finally added msan's definition for "stable-runtime".
Only a test requires it, and its commit message (r248014) seems to imply
that AArch64 is problematic with MSan.
Reviewers: samsonov, rengolin, t.p.northover, eugenis
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17928
llvm-svn: 263142
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.
The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.
For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.
BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17277
llvm-svn: 263140
When trying to replace an add to esp with pops, we need to choose dead
registers to pop into. Registers clobbered by the call and not imp-def'd
by it should be safe. Except that it's not enough to check the register
itself isn't defined, we also need to make sure no overlapping registers
are defined either.
This fixes PR26711.
Differential Revision: http://reviews.llvm.org/D18029
llvm-svn: 263139