Currently ( D10321, http://reviews.llvm.org/rL239486 ), we can use the machine combiner pass
to reassociate the following sequence to reduce the critical path:
A = ? op ?
B = A op X
C = B op Y
-->
A = ? op ?
B = X op Y
C = A op B
'op' is currently limited to x86 AVX scalar FP adds (with fast-math on), but in theory, it could
be any associative math/logic op (see TODO in code comment).
This patch generalizes the pattern match to ignore the instruction that defines 'A'. So instead of
a sequence of 3 adds, we now only need to find 2 dependent adds and decide if it's worth
reassociating them.
This generalization has a compile-time cost because we can now match more instruction sequences
and we rely more heavily on the machine combiner to discard sequences where reassociation doesn't
improve the critical path.
For example, in the new test case:
A = M div N
B = A add X
C = B add Y
We'll match 2 reassociation patterns, but this transform doesn't reduce the critical path:
A = M div N
B = A add Y
C = B add X
We need the combiner to reject that pattern but select this:
A = M div N
B = X add Y
C = B add A
Differential Revision: http://reviews.llvm.org/D10460
llvm-svn: 240361
This version fixes a missing include that MSVC noticed and
clarifies the ownership of the counter buffer that's passed to
InstrProfRecord.
This restores r240206, which was reverted in r240208.
Patch by Betul Buyukkurt.
llvm-svn: 240360
This is similar to Metadata.def and Instructions.def but for Value's.
It will be used in upcoming commits to devirtualize the Value class.
Reviewed by Duncan Exon Smith.
llvm-svn: 240358
Make sure that sanitizer runtimes target OS X version provided in
-mmacosx-version-min= flag. Enforce that it should be at least 10.7.
llvm-svn: 240356
Remove std::move() around xvalue so that copy elision is eligible.
In case that copy elision is not appliable, the c++ standard also
guarantees the move semantics on xvalue. Thus, it is not necessary
to wrap Args with std::move.
This also silence a warning since r240345.
llvm-svn: 240355
We have been working on reducing the packet count that is sent between LLDB and the debugserver on MacOSX and iOS. Our approach to this was to reduce the packets required when debugging multiple threads. We currently make one qThreadStopInfoXXXX call (where XXXX is the thread ID in hex) per thread except the thread that stopped with a stop reply packet. In order to implement multiple thread infos in a single reply, we need to use structured data, which means JSON. The new jThreadsInfo packet will attempt to retrieve all thread infos in a single packet. The data is very similar to the stop reply packets, but packaged in JSON and uses JSON arrays where applicable. The JSON output looks like:
[
{ "tid":1580681,
"metype":6,
"medata":[2,0],
"reason":"exception",
"qaddr":140735118423168,
"registers": {
"0":"8000000000000000",
"1":"0000000000000000",
"2":"20fabf5fff7f0000",
"3":"e8f8bf5fff7f0000",
"4":"0100000000000000",
"5":"d8f8bf5fff7f0000",
"6":"b0f8bf5fff7f0000",
"7":"20f4bf5fff7f0000",
"8":"8000000000000000",
"9":"61a8db78a61500db",
"10":"3200000000000000",
"11":"4602000000000000",
"12":"0000000000000000",
"13":"0000000000000000",
"14":"0000000000000000",
"15":"0000000000000000",
"16":"960b000001000000",
"17":"0202000000000000",
"18":"2b00000000000000",
"19":"0000000000000000",
"20":"0000000000000000"},
"memory":[
{"address":140734799804592,"bytes":"c8f8bf5fff7f0000c9a59e8cff7f0000"},
{"address":140734799804616,"bytes":"00000000000000000100000000000000"}
]
}
]
It contains an array of dicitionaries with all of the key value pairs that are normally in the stop reply packet. Including the expedited registers. Notice that is also contains expedited memory in the "memory" key. Any values in this memory will get included in a new L1 cache in lldb_private::Process where if a memory read request is made and that memory request fits into one of the L1 memory cache blocks, it will use that memory data. If a memory request fails in the L1 cache, it will fall back to the L2 cache which is the same block sized caching we were using before these changes. This allows a process to expedite memory that you are likely to use and it reduces packet count. On MacOSX with debugserver, we expedite the frame pointer backchain for a thread (up to 256 entries) by reading 2 pointers worth of bytes at the frame pointer (for the previous FP and PC), and follow the backchain. Most backtraces on MacOSX and iOS now don't require us to read any memory!
We will try these packets out and if successful, we should port these to lldb-server in the near future.
<rdar://problem/21494354>
llvm-svn: 240354
As with the previous patch, the goal is to turn the class into a general
loop-versioning class. This patch removes any references to loop
distribution.
llvm-svn: 240352
The one caller that does anything other than keep this variable on the
stack is the single use of DerivedArgList in Clang, which is a bit more
interesting but can probably be cleaned up/simplified a bit further
(have DerivedArgList take ownership of the InputArgList rather than
needing to reference its Args indirectly) which I'll try to after this.
llvm-svn: 240345
Summary: I don't think anyone is relying on this behavior for bootstrapping (because I don't think it works), but if you do need it, speak now or forever hold your peace.
Reviewers: chapuni, samsonov
Reviewed By: samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10613
llvm-svn: 240344
The reason we need to search by name rather than by Triple::ArchType
is to handle subarchitecture correclty. There is no different ArchType
for the x86_64h architecture (it identifies itself as x86_64), or for
the various ARM subarches. The only way to get to the subarch slice
in an universal binary is to search by name.
This issue led to hard to debug and transient symbolication failures
in Asan tests (it mostly works, because the files are very similar).
This also affects the Profiling infrastucture as it is the other user
of that API.
Reviewers: samsonov, bogner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10604
llvm-svn: 240339
As specified in the SysV AVX512 ABI drafts. It follows the same scheme
as AVX2:
Arguments of type __m512 are split into eight eightbyte chunks.
The least significant one belongs to class SSE and all the others
to class SSEUP.
This also means we change the OpenMP SIMD default alignment on AVX512.
Based on r240337.
Differential Revision: http://reviews.llvm.org/D9894
llvm-svn: 240338
Such conflicts are an accident waiting to happen, and this feature conflicts
with the desire to include existing headers into multiple modules and merge the
results. (In an ideal world, it should not be possible to export internal
linkage symbols from a module, but sadly the glibc and libstdc++ headers
provide 'static inline' functions in a few cases.)
llvm-svn: 240335
Summary: This will be used by the R600 backend.
Reviewers: chandlerc, rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10389
llvm-svn: 240329
The _Int instructions are special, in that they operate on the full
VR128 instead of FR32. The load folding then looks at MOVSS, at the
user, and bails out when it sees a size mismatch.
What we really know is that the rm_Int instructions don't load the
higher lanes, so folding is fine.
This happens for the straightforward intrinsic code, e.g.:
_mm_add_ss(a, _mm_load_ss(p));
Fixes PR23349.
Differential Revision: http://reviews.llvm.org/D10554
llvm-svn: 240326
This commit adds a function that tokenizes the string containing
the machine instruction. This commit also adds a struct called
'MIToken' which is used to represent the lexer's tokens.
Reviewers: Sean Silva
Differential Revision: http://reviews.llvm.org/D10521
llvm-svn: 240323
ISL with small integer optimization requires C99 to compile. gcc < 5.0
still uses C89 as default, so we need to enable the options to compile
in C99 mode.
This patch is preparing the actual activation of small integer
optimization.
Differential version: http://reviews.llvm.org/D10610
Reviewers: grosser
llvm-svn: 240322
This avoids creating an unnecessary undefined reference on targets such as
NVPTX that require such references to be declared in asm output.
llvm-svn: 240321
This is a reapplication of r239440 which was reverted in r239441.
There are no changes to this patch from then, but this had instead exposed
a bug in .thumb_set which was fixed in r240318. Having fixed that bug, it
is now safe to re-apply this code.
Original commit message below:
It wasn't possible to have a variable Symbol with offset or 'isCommon' so
this just enables better packing of the MCSymbol class.
Reviewed by Rafael Espindola.
llvm-svn: 240320
Before this change, you got to cast a symbol to DefinedRegular and then
call isCOMDAT() to determine if a given symbol is a COMDAT symbol.
Now you can just use isa<DefinedCOMDAT>().
As to the class definition of DefinedCOMDAT, I could remove duplicate
code from DefinedRegular and DefinedCOMDAT by introducing another base
class for them, but I chose to not do that to keep the class hierarchy
shallow. This amount of code duplication doesn't worth to define a new
class.
llvm-svn: 240319
According to the documentation, .thumb_set is 'the equivalent of a .set directive'.
We didn't have equivalent behaviour in terms of all the errors we could throw, for
example, when a symbol is redefined.
This change refactors parseAssignment so that it can be used by .set and .thumb_set
and implements tests for .thumb_set for all the errors thrown by that method.
Reviewed by Rafael Espíndola.
llvm-svn: 240318
ISL's ./configure examines the system for the stdint.h to include and
creates a header file that points to it. On C99-compatible system
#include <stdint.h>
is always valid such there no need for system introspection. This should
unbreak the build bots.
llvm-svn: 240315
The asan/not_asan and ubsan/not_ubsan features weren't being set
correctly when LLVM_USE_SANITIZER is set to 'Address;Undefined'. Fix
this by doing substring instead of exact matching. Also simplify the
msan check for consistency.
llvm-svn: 240314
D8982 ( checked in at http://reviews.llvm.org/rL239001 ) added command-line
options to allow reciprocal estimate instructions to be used in place of
divisions and square roots.
This patch changes the default settings for x86 targets to allow that recip
codegen (except for scalar division because that breaks too much code) when
using -ffast-math or its equivalent.
This matches GCC behavior for this kind of codegen.
Differential Revision: http://reviews.llvm.org/D10396
llvm-svn: 240310