Summary:
This change depends on D23986 which adds tail call-specific sleds. For
now we treat them first as normal exits, and in the future leave room
for implementing this as a different kind of log entry.
The reason for deferring the change is so that we can keep the naive
logging implementation more accurate without additional complexity for
reading the log. The accuracy is gained in effectively interpreting call
stacks like:
A()
B()
C()
Which when tail-call merged will end up not having any exit entries for
A() nor B(), but effectively in turn can be reasoned about as:
A()
B()
C()
Although we lose the fact that A() had called B() then had called C()
with the naive approach, a later iteration that adds the explicit tail
call entries would be a change in the log format and thus necessitate a
version change for the header. We can do this later to have a chance at
releasing some tools (in D21987) that are able to handle the naive log
format, then support higher version numbers of the log format too.
Reviewers: echristo, kcc, rSerge, majnemer
Subscribers: mehdi_amini, llvm-commits, dberris
Differential Revision: https://reviews.llvm.org/D23988
llvm-svn: 284178
compiles without -fmodules-local-submodule-visibility. Original commit message:
[modules] When merging one definition into another, propagate the list of
re-exporting modules from the discarded definition to the retained definition.
llvm-svn: 284176
Windows itanium is identical to MSVC when dealing with everything but C++.
Lower the math routines into msvcrt rather than compiler-rt.
llvm-svn: 284175
The backtrace on the bot does not give me any indication what is wrong.
The test case interestingly passes in stage2 of the build.
I don't have a way of debugging this.
Disable the test on windows and hope if there is truly a bug in the code that
was causing we will eventually run into this on other platforms.
llvm-svn: 284174
Windows itanium is equivalent to MSVC except in C++ mode. Ensure that the
promote the 32-bit floating point operations to their 64-bit equivalences.
llvm-svn: 284173
Summary: Previously global 64-bit versions of _Interlocked functions broke buildbots on i386, so now I'm adding them as builtins for x86-64 and ARM only (should they be also on AArch64? I had problems with testing it for AArch64, so I left it)
Reviewers: hans, majnemer, mstorsjo, rnk
Subscribers: cfe-commits, aemerson
Differential Revision: https://reviews.llvm.org/D25576
llvm-svn: 284172
Summary:
LeakSanitizer does not work with ptrace but currently it
will print warnings (only under verbosity=1) and then proceed
to print tons of false reports.
This patch makes lsan fail hard under ptrace with a verbose message.
https://github.com/google/sanitizers/issues/728
Reviewers: eugenis, vitalybuka, aizatsky
Subscribers: kubabrecka, llvm-commits
Differential Revision: https://reviews.llvm.org/D25538
llvm-svn: 284171
Previously we would fail to synthesise a __start_ or __stop_ symbol if
there existed a definition in a DSO. Instead, we would try to link against
the DSO definition. This became possible after D23552 when linking against
lld-produced DSOs but could in principle also occur when linking against
DSOs produced by other linkers.
Not only does it seem more likely that a user would expect the resolved
definition to be local to the executable, but if a __start_ or __stop_
symbol was synthesised by the linker, it is effectively impossible to link
against correctly from a non-PIC executable in a read-only section. Neither
a PLT nor a copy relocation would give us the right semantics here. The only
way the link could succeed is if the executable provided its own synthetic
definition of the symbol.
The fix is to also synthesise the definition if the only definition comes
from a DSO. Since this is what the addOptionalSynthetic function does,
switch to using that function.
Fixes PR30680.
Differential Revision: https://reviews.llvm.org/D25544
llvm-svn: 284168
The class DataflowWorklist internally maintains a sorted list of pointers to CFGBlock
and the method enqueuePredecessors has to call sortWorklist to maintain the invariant.
The implementation based on vector + sort works well for small sizes
but gets infeasible for relatively large sizes. In particular the issue takes place
for some cryptographic libraries which use code generation.
The diff replaces vector + sort with priority queue.
For one of the implementations of AES this patch reduces
the time for analysis from 204 seconds to 8 seconds.
Test plan: make -j8 check-clang
Differential revision: https://reviews.llvm.org/D25503
llvm-svn: 284166
Summary:
This operation is promoted the same way was ISD::BSWAP. This will
prevent a regression in test/Target/AMDGOU/bitreverse.ll when i16
support is implemented.
Reviewers: bogner, hfinkel
Subscribers: hfinkel, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D25202
llvm-svn: 284163
the X86 subdirectory. Original commit message:
Requires a valid TargetMachine to be passed to the SafeStack pass.
Patch by Michael LeMay
Differential revision: http://reviews.llvm.org/D24896
llvm-svn: 284161
This option indicates copy relocations support is available from the linker
when building as PIE and allows accesses to extern globals to avoid the GOT.
Differential Revision: https://reviews.llvm.org/D24849
llvm-svn: 284160
Summary:
Emitting deferred diagnostics during codegen was a hack. It did work,
but usability was poor, both for us as compiler devs and for users. We
don't codegen if there are any sema errors, so for users this meant that
they wouldn't see deferred errors if there were any non-deferred errors.
For devs, this meant that we had to carefully split up our tests so that
when we tested deferred errors, we didn't emit any non-deferred errors.
This change moves checking for deferred errors into Sema. See the big
comment in SemaCUDA.cpp for an overview of the idea.
This checking adds overhead to compilation, because we have to maintain
a partial call graph. As a result, this change makes deferred errors a
CUDA-only concept (whereas before they were a general concept). If
anyone else wants to use this framework for something other than CUDA,
we can generalize at that time.
This patch makes the minimal set of test changes -- after this lands,
I'll go back through and do a cleanup of the tests that we no longer
have to split up.
Reviewers: rnk
Subscribers: cfe-commits, rsmith, tra
Differential Revision: https://reviews.llvm.org/D25541
llvm-svn: 284158
Incorrect specification of the calling convention results in UB which can cause
the code path to be eliminated. Simplify the existing code by using the
RuntimeCall constructor in `CodeGenFunction`.
llvm-svn: 284154
Relax the constraint for empty live-ranges while doing last chance
recoloring. Indeed, those live-ranges do not need an actual color to be
fond for the recoloring to work.
Empty live-range may happen as a result of splitting/spilling.
Unfortunately no test case for in-tree targets.
llvm-svn: 284152
Retrying after upstream changes.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and
merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 284151
Summary:
Together these let you easily create diagnostics that
- are never emitted for host code
- are always emitted for __device__ and __global__ functions, and
- are emitted for __host__ __device__ functions iff these functions are
codegen'ed.
At the moment there are only three diagnostics that need this treatment,
but I have more to add, and it's not sustainable to write code for emitting
every such diagnostic twice, and from a special wrapper in SemaCUDA.cpp.
While we're at it, don't emit the function name in
err_cuda_device_exceptions: It's not necessary to print it, and making
this work in the new framework in the face of a null value for
dyn_cast<FunctionDecl>(CurContext) isn't worth the effort.
Reviewers: rnk
Subscribers: cfe-commits, tra
Differential Revision: https://reviews.llvm.org/D25139
llvm-svn: 284143
In r276159, we started to defer emitting initializers for VarDecls, but
forgot to add the initializers for non-C++ language.
rdar://28740482
llvm-svn: 284142
Summary:
These options need to be passed to the plugin in order to have
an effect on LTO/ThinLTO compiles.
Reviewers: mehdi_amini, pcc
Subscribers: jfb, dschuff, mehdi_amini, cfe-commits
Differential Revision: https://reviews.llvm.org/D24644
llvm-svn: 284140
Summary:
For now I have only added support for x86_64 Linux, but other systems
can be added incrementally.
This is to be used for setting the default parallelism for ThinLTO
backends (instead of thread::hardware_concurrency which includes
hyperthreading and is too aggressive). I'll send this as a follow-on
patch, and it will fall back to hardware_concurrency when the new
getHostNumPhysicalCores returns -1 (when not supported for a given
host system).
I also added an interface to MemoryBuffer to force reading a file
as a stream - this is required for /proc/cpuinfo which is a special
file that looks like a normal file but appears to have 0 size.
The existing readers of this file in Host.cpp are reading the first
1024 or so bytes from it, because the necessary info is near the top.
But for the new functionality we need to be able to read the entire
file. I can go back and change the other readers to use the new
getFileAsStream as a follow-on patch since it seems much more robust.
Added a unittest.
Reviewers: mehdi_amini
Subscribers: beanz, mgorny, llvm-commits, modocache
Differential Revision: https://reviews.llvm.org/D25564
llvm-svn: 284138
In the MS ABI, the frontend is supposed to MD5 such pathologically long
names. LLVM should still defend itself from long names, though.
Fixes part of PR29098.
llvm-svn: 284136
We don't need to return a MachineInstr* from these stack probe insertion
calls anyway. If we ever need to add it back, we can return an iterator
instead.
Based on a patch by David Kreitzer
This bug is a consequence of
r279314 | dexonsmith | 2016-08-19 13:40:12 -0700 (Fri, 19 Aug 2016) | 110 lines
We hit the "Assertion `!NodePtr->isKnownSentinel()' failed" assertion,
but only when inserting a stack probe call at the end of an MBB, which
isn't necessarily a common situation.
Differential Revision: https://reviews.llvm.org/D25566
llvm-svn: 284130