FastRegAlloc works only at the basic-block level and spills all live-out
registers. Unfortunately for a stack-based cmpxchg near the spill slots, this
can perpetually clear the exclusive monitor, which means the cmpxchg will never
succeed.
I believe the only way to handle this within LLVM is by expanding the loop
post-regalloc. We don't want this in general because it severely limits the
optimisations that can be done, so we limit this to -O0 compilations.
It's an ugly hack, and about the one good point in the whole mess is that we
can treat all cmpxchg operations in the most naive way possible (seq_cst, no
clrex faff) without affecting correctness.
Should fix PR25526.
llvm-svn: 266339
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.
This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.
Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.
Reviewers: mareko, arsenm, tstellarAMD, nhaehnle
Subscribers: FireBurn, kerberizer, llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D18340
Patch By: Bas Nieuwenhuizen
llvm-svn: 266337
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,
The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewers: mareko, arsenm, nhaehnle, tstellarAMD
Subscribers: qcolombet, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18941
Patch By: Bas Nieuwenhuizen
llvm-svn: 266336
Summary:
Add a print method to Predicated Scalar Evolution which prints all interesting
transformations done by PSE.
Loop Access Analysis will now print this as part of the analysis output.
We now use this to check the exact expression transformations that were done
by PSE in LAA.
The additional checking also acts as white-box testing for the getAsAddRec method.
Reviewers: anemet, sanjoy
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18792
llvm-svn: 266334
Summary: The patch is fixing the generation of clang-tidy documentation.
Reviewers: alexfh
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D19121
llvm-svn: 266333
ittnotify fix for barrier imbalance time in case tasks exist. In the current
implementation, task execution time is included into aggregated time on a
barrier. This fix calculates task execution time and corrects the arrive time
by subtracting the task execution time.
Since __kmp_invoke_task() can not only be called on a barrier, the field
th.th_bar_arrive_time is used to check if the function was called at the
barrier (th.th_bar_arrive_time != 0). So for this check, th_bar_arrive_time
is set to zero right after the value is used on the barrier.
Differential Revision: http://reviews.llvm.org/D19030
llvm-svn: 266332
This change adds back off logic in the test and set lock for better contended
lock performance. It uses a simple truncated binary exponential back off
function. The default back off parameters are tuned for x86.
The main back off logic has a two loop structure where each is controlled by a
user-level parameter:
max_backoff - limits the outer loop number of iterations.
This parameter should be a power of 2.
min_ticks - the inner spin wait loop number of "ticks" which is system
dependent and should be tuned for your system if you so choose.
The "ticks" on x86 correspond to the time stamp counter,
but on other architectures ticks is a timestamp derived
from gettimeofday().
The user can modify these via the environment variable:
KMP_SPIN_BACKOFF_PARAMS=max_backoff[,min_ticks]
Currently, since the default user lock is a queuing lock,
one would have to also specify KMP_LOCK_KIND=tas to use the test-and-set locks.
Differential Revision: http://reviews.llvm.org/D19020
llvm-svn: 266329
The android dirty stderr problem has uncovered an issue where lldbutil.expect_state_changes was
reading events other than state change events, which resulted in general confusion. Make it more
strict to accept *only* state changes.
llvm-svn: 266327
Summary:
On some android targets, a binary can produce additional garbage (e.g. warning messages from the
dynamic linker) on the standard error, which confuses some tests. This relaxes the stderr
expectations for targets known for their chattyness.
Reviewers: tfiala, ovyalov
Subscribers: tberghammer, danalbert, srhines, lldb-commits
Differential Revision: http://reviews.llvm.org/D19114
llvm-svn: 266326
As discussed in the Polly weekly phone call and reviews.llvm.org/D18878,
the assumed contexts changed (widen) due to D18878/r265942. Also check
these contexts in the tests affected by that change.
llvm-svn: 266323
Indentation of the last line was reset to the initial indentation of the block when reaching EOF.
Patch by Maxime Beaulieu
Differential Revision: http://reviews.llvm.org/D19065
llvm-svn: 266321
Including VirtualFileSystem.h in the clangFormat.h indirectly includes <atomic>.
This header is blocked when compiling with /clr.
Patch by Maxime Beaulieu
Differential Revision: http://reviews.llvm.org/D19064
llvm-svn: 266319
Code in ObjectFileELF::ParseTrampolineSymbols assumes that the sh_info
field of the .rel(a).plt section identifies the .plt section.
However, with recent GNU ld this is no longer true. As a result of this:
https://sourceware.org/bugzilla/show_bug.cgi?id=18169
in object files generated with current linkers the sh_info field of
.rel(a).plt now points to the .got.plt section (or .got on some targets).
This causes LLDB to fail to identify any PLT stubs, causing a number of
test case failures.
This patch changes LLDB to simply always look for the .plt section by
name. This should be safe across all linkers and targets.
Differential Revision: http://reviews.llvm.org/D18973
llvm-svn: 266316
A number of test cases were failing on big-endian systems simply due to
byte order assumptions in the tests themselves, and no underlying bug
in LLDB.
These two test cases:
tools/lldb-server/lldbgdbserverutils.py
python_api/process/TestProcessAPI.py
actually check for big-endian target byte order, but contain Python errors
in the corresponding code paths.
These test cases:
functionalities/data-formatter/data-formatter-python-synth/TestDataFormatterPythonSynth.py
functionalities/data-formatter/data-formatter-smart-array/TestDataFormatterSmartArray.py
functionalities/data-formatter/synthcapping/TestSyntheticCapping.py
lang/cpp/frame-var-anon-unions/TestFrameVariableAnonymousUnions.py
python_api/sbdata/TestSBData.py (first change)
could be fixed to check for big-endian target byte order and update the
expected result strings accordingly. For the two synthetic tests, I've
also updated the source to make sure the fake_a value is always nonzero
on both big- and little-endian platforms.
These test case:
python_api/sbdata/TestSBData.py (second change)
functionalities/memory/cache/TestMemoryCache.py
simply accessed memory with the wrong size, which wasn't noticed on LE
but fails on BE.
Differential Revision: http://reviews.llvm.org/D18985
llvm-svn: 266315
Running the ARM instruction emulation test on a big-endian system
would fail, since the code doesn't respect endianness properly.
In EmulateInstructionARM::TestEmulation, code assumes that an
instruction opcode read in from the test file is in target byte
order, but it was in fact read in in host byte order.
More difficult to fix, the EmulationStateARM structure models
the overlapping sregs and dregs by a union in _sd_regs. This
only works correctly if the host is a little-endian system.
I've removed the union in favor of a simple array containing
the 32 sregs, and changed any code accessing dregs to explicitly
use the correct two sregs overlaying that dreg in the proper
target order.
Also, the EmulationStateARM::ReadPseudoMemory and WritePseudoMemory
track memory as a map of uint32_t values in host byte order, and
implement 64-bit memory accessing by splitting them up into two
uint32_t ones. However, callers expect memory contents to be
provided in the form of a byte array (in target byte order).
This means the uint32_t contents need to be byte-swapped on
BE systems, and when splitting up a 64-bit access into two 32-bit
ones, byte order has to be respected.
Differential Revision: http://reviews.llvm.org/D18984
llvm-svn: 266314
This patch fixes a bunch of issues that show up on big-endian systems:
- The gnu_libstdcpp.py script doesn't follow the way libstdc++ encodes
bit vectors: it should identify the enclosing *word* and then access
the appropriate bit within that word. Instead, the script simply
operates on bytes. This gives the same result on little-endian
systems, but not on big-endian.
- lldb_private::formatters::WCharSummaryProvider always assumes wchar_t
is UTF16, even though it could also be UTF8 or UTF32. This is mostly
not an issue on little-endian systems, but immediately fails on BE.
Fixed by checking the size of wchar_t like WCharStringSummaryProvider
already does.
- ClangASTContext::GetChildCompilerTypeAtIndex uses uint32_t to access
the virtual base offset stored in the vtable, even though the size
of this field matches the target pointer size according to the C++
ABI. Again, this is mostly not visible on LE, but fails on BE.
- Process::ReadStringFromMemory uses strncmp to search for a terminator
consisting of multiple zero bytes. This doesn't work since strncmp
will stop already at the first zero byte. Use memcmp instead.
Differential Revision: http://reviews.llvm.org/D18983
llvm-svn: 266313
Currently, the DataExtractor::GetMaxU64Bitfield and GetMaxS64Bitfield
routines assume the incoming "bitfield_bit_offset" parameter uses
little-endian bit numbering, i.e. a bitfield_bit_offset 0 refers to
a bitfield whose least-significant bit coincides with the least-
significant bit of the surrounding integer.
On many big-endian systems, however, the big-endian bit numbering
is used for bit fields. Here, a bitfield_bit_offset 0 refers to
a bitfield whose most-significant bit conincides with the most-
significant bit of the surrounding integer.
Now, in principle LLDB could arbitrarily choose which semantics of
bitfield_bit_offset to use. However, there are two problems with
the current approach:
- When parsing DWARF, LLDB decodes bit offsets in little-endian
bit numbering on LE systems, but in big-endian bit numbering
on BE systems. Passing those offsets later on into the
DataExtractor routines gives incorrect results on BE.
- In the interim, LLDB's type layer combines byte and bit offsets
into a single number. I.e. instead of recording bitfields by
specifying the byte offset and byte size of the surrounding
integer *plus* the bit offset of the bit field within that field,
it simply records a single bit offset number.
Now, note that converting from byte offset + bit offset to a
single offset value and back is well-defined if we either use
little-endian byte order *and* little-endian bit numbering,
or use big-endian byte order *and* big-endian bit numbering.
Any other combination will yield incorrect results.
Therefore, the simplest approach would seem to be to always use
the bit numbering that matches the system byte order. This makes
storing a single bit offset valid, and makes the existing DWARF
code correct. The only place to fix is to teach DataExtractor
to use big-endian bit numbering on big endian systems.
However, there is only additional caveat: we also get bit offsets
from LLDB synthetic bitfields. While the exact semantics of those
doesn't seem to be well-defined, from test cases it appears that
the intent was for the user-provided synthetic bitfield offset to
always use little-endian bit numbering. Therefore, on a big-endian
system we now have to convert those to big-endian bit numbering
to remain consistent.
Differential Revision: http://reviews.llvm.org/D18982
llvm-svn: 266312
The Scalar implementation and a few other places in LLDB directly
access the internal implementation of APInt values using the
getRawData method. Unfortunately, pretty much all of these places
do not handle big-endian systems correctly. While on little-endian
machines, the pointer returned by getRawData can simply be used as
a pointer to the integer value in its natural format, no matter
what size, this is not true on big-endian systems: getRawData
actually points to an array of type uint64_t, with the first element
of the array always containing the least-significant word of the
integer. This means that if the bitsize of that integer is smaller
than 64, we need to add an offset to the pointer returned by
getRawData in order to access the value in its natural type, and
if the bitsize is *larger* than 64, we actually have to swap the
constituent words before we can access the value in its natural type.
This patch fixes every incorrect use of getRawData in the code base.
For the most part, this is done by simply removing uses of getRawData
in the first place, and using other APInt member functions to operate
on the integer data.
This can be done in many member functions of Scalar itself, as well
as in Symbol/Type.h and in IRInterpreter::Interpret. For the latter,
I've had to add a Scalar::MakeUnsigned routine to parallel the existing
Scalar::MakeSigned, e.g. in order to implement an unsigned divide.
The Scalar::RawUInt, Scalar::RawULong, and Scalar::RawULongLong
were already unused and can be simply removed. I've also removed
the Scalar::GetRawBits64 function and its few users.
The one remaining user of getRawData in Scalar.cpp is GetBytes.
I've implemented all the cases described above to correctly
implement access to the underlying integer data on big-endian
systems. GetData now simply calls GetBytes instead of reimplementing
its contents.
Finally, two places in the clang interface code were also accessing
APInt.getRawData in order to actually construct a byte representation
of an integer. I've changed those to make use of a Scalar instead,
to avoid having to re-implement the logic there.
The patch also adds a couple of unit tests verifying correct operation
of the GetBytes routine as well as the conversion routines. Those tests
actually exposed more problems in the Scalar code: the SetValueFromData
routine didn't work correctly for 128- and 256-bit data types, and the
SChar routine should have an explicit "signed char" return type to work
correctly on platforms where char defaults to unsigned.
Differential Revision: http://reviews.llvm.org/D18981
llvm-svn: 266311
Scalar::GetBytes provides a non-const access to the underlying bytes
of the scalar value, supposedly allowing for modification of those
bytes. However, even with the current implementation, this is not
really possible. For floating-point scalars, the pointer returned
by GetBytes refers to a temporary copy; modifications to that copy
will be simply ignored. For integer scalars, the pointer refers
to internal memory of the APInt implementation, which isn't
supposed to be directly modifyable; GetBytes simply casts aways
the const-ness of the pointer ...
With my upcoming patch to fix Scalar::GetBytes for big-endian
systems, this problem is going to get worse, since there we need
temporary copies even for some integer scalars. Therefore, this
patch makes Scalar::GetBytes const, fixing all those problems.
As a follow-on change, RegisterValues::GetBytes must be made const
as well. This in turn means that the way of initializing a
RegisterValue by doing a SetType followed by writing to GetBytes
no longer works. Instead, I've changed SetValueFromData to do
the equivalent of SetType itself, and then re-implemented
SetFromMemoryData to work on top of SetValueFromData.
There is still a need for RegisterValue::SetType, since some
platform-specific code uses it to reinterpret the contents of
an already filled RegisterValue. To make this usage work in
all cases (even changing from a type implemented via Scalar
to a type implemented as a byte buffer), SetType now simply
copies the old contents out, and then reloads the RegisterValue
from this data using the new type via SetValueFromData.
This in turn means that there is no remaining caller of
Scalar::SetType, so it can be removed.
The only other follow-on change was in MIPS EmulateInstruction
code, where some uses of RegisterValue::GetBytes could be made
const trivially.
Differential Revision: http://reviews.llvm.org/D18980
llvm-svn: 266310
This fixes several test case failure on s390x caused by the fact that
on this platform, the default "char" type is unsigned.
- In ClangASTContext::GetBuiltinTypeForEncodingAndBitSize we should return
an explicit *signed* char type for encoding eEncodingSint and bit size 8,
instead of the default platform char type (which may be unsigned).
This fix matches existing code in ClangASTContext::GetIntTypeFromBitSize,
and fixes the TestClangASTContext.TestBuiltinTypeForEncodingAndBitSize
unit test case.
- The test/expression_command/char/TestExprsChar.py test case is known to
fail on platforms defaulting to unsigned char (pr23069), and just needs
to be xfailed on s390x like on arm.
- The test/functionalities/watchpoint/watchpoint_on_vectors/main.c test
case defines a vector of "char" and implicitly assumes to be signed.
Use an explicit "signed char" instead.
Differential Revision: http://reviews.llvm.org/D18979
llvm-svn: 266309
This patch adds support for Linux on SystemZ:
- A new ArchSpec value of eCore_s390x_generic
- A new directory Plugins/ABI/SysV-s390x providing an ABI implementation
- Register context support
- Native Linux support including watchpoint support
- ELF core file support
- Misc. support throughout the code base (e.g. breakpoint opcodes)
- Test case updates to support the platform
This should provide complete support for debugging the SystemZ platform.
Not yet supported are optional features like transaction support (zEC12)
or SIMD vector support (z13).
There is no instruction emulation, since our ABI requires that all code
provide correct DWARF CFI at all PC locations in .eh_frame to support
unwinding (i.e. -fasynchronous-unwind-tables is on by default).
The implementation follows existing platforms in a mostly straightforward
manner. A couple of things that are different:
- We do not use PTRACE_PEEKUSER / PTRACE_POKEUSER to access single registers,
since some registers (access register) reside at offsets in the user area
that are multiples of 4, but the PTRACE_PEEKUSER interface only allows
accessing aligned 8-byte blocks in the user area. Instead, we use a s390
specific ptrace interface PTRACE_PEEKUSR_AREA / PTRACE_POKEUSR_AREA that
allows accessing a whole block of the user area in one go, so in effect
allowing to treat parts of the user area as register sets.
- SystemZ hardware does not provide any means to implement read watchpoints,
only write watchpoints. In fact, we can only support a *single* write
watchpoint (but this can span a range of arbitrary size). In LLDB this
means we support only a single watchpoint. I've set all test cases that
require read watchpoints (or multiple watchpoints) to expected failure
on the platform. [ Note that there were two test cases that install
a read/write watchpoint even though they nowhere rely on the "read"
property. I've changed those to simply use plain write watchpoints. ]
Differential Revision: http://reviews.llvm.org/D18978
llvm-svn: 266308
If the UnwindPlan did not identify how to unwind the stack pointer
register, LLDB currently assumes it can determine to caller's SP
from the current frame's CFA. This is true on most platforms
where CFA is by definition equal to the incoming SP at function
entry.
However, on the s390x target, we instead define the CFA to equal
the incoming SP plus an offset of 160 bytes. This is because
our ABI defines that the caller has to provide a register save
area of size 160 bytes. This area is allocated by the caller,
but is considered part of the callee's stack frame, and therefore
the CFA is defined as pointing to the top of this area.
In order to make this work on s390x, this patch introduces a new
ABI callback GetFallbackRegisterLocation that provides platform-
specific fallback register locations for unwinding. The existing
code to handle SP unwinding as well as volatile registers is moved
into the default implementation of that ABI callback, to allow
targets where that implementation is incorrect to override it.
This patch in itself is a no-op for all existing platforms.
But it is a pre-requisite for adding s390x support.
Differential Revision: http://reviews.llvm.org/D18977
llvm-svn: 266307
That was removed in r266304, but leads to warnings by Clang.
Thanks to Rafael Espíndola for pointing on that.
Though I think change was legal from point of C++.
llvm-svn: 266306
The PS_STRINGS constant can easily be incorrect with mismatched
kernel/userland - e.g. when building i386 sanitizers on FreeBSD/amd64
with -m32. The kern.ps_strings sysctl was introduced over 20 years ago
as the supported way to fetch the environment and argument string
addresses from the kernel, so the fallback is never used.
Differential Revision: http://reviews.llvm.org/D19027
llvm-svn: 266305
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.
This patch was previous committed as r266055 as seemed to have caused some spurious
test failures. They did not reappear after further local testing.
llvm-svn: 266301
In short, CVE-2016-2143 will crash the machine if a process uses both >4TB
virtual addresses and fork(). ASan, TSan, and MSan will, by necessity, map
a sizable chunk of virtual address space, which is much larger than 4TB.
Even worse, sanitizers will always use fork() for llvm-symbolizer when a bug
is detected. Disable all three by aborting on process initialization if
the running kernel version is not known to contain a fix.
Unfortunately, there's no reliable way to detect the fix without crashing
the kernel. So, we rely on whitelisting - I've included a list of upstream
kernel versions that will work. In case someone uses a distribution kernel
or applied the fix themselves, an override switch is also included.
Differential Revision: http://reviews.llvm.org/D18915
llvm-svn: 266297
This teaches sanitizer_common about s390 and s390x virtual space size.
s390 is unusual in that it has 31-bit virtual space.
Differential Revision: http://reviews.llvm.org/D18896
llvm-svn: 266296
mmap on s390 is quite a special snowflake: since it has too many
parameters to pass them in registers, it passes a pointer to a struct
with all the parameters instead.
Differential Revision: http://reviews.llvm.org/D18889
llvm-svn: 266295