Checking if a path is absolute can be expensive and currently the result is not cached in the FileSpec object. This patch adds caching and also code to clear the cache if the file is modified.
Differential Revision: https://reviews.llvm.org/D130396
Resubmission of https://reviews.llvm.org/D130309 with the 2 patches that fixed the linux buildbot, and new windows fixes.
The FileSpec APIs allow users to modify instance variables directly by getting a non const reference to the directory and filename instance variables. This makes it impossible to control all of the times the FileSpec object is modified so we can clear cached member variables like m_resolved and with an upcoming patch caching if the file is relative or absolute. This patch modifies the APIs of FileSpec so no one can modify the directory or filename instance variables directly by adding set accessors and by removing the get accessors that are non const.
Many clients were using FileSpec::GetCString(...) which returned a unique C string from a ConstString'ified version of the result of GetPath() which returned a std::string. This caused many locations to use this convenient function incorrectly and could cause many strings to be added to the constant string pool that didn't need to. Most clients were converted to using FileSpec::GetPath().c_str() when possible. Other clients were modified to use the newly renamed version of this function which returns an actualy ConstString:
ConstString FileSpec::GetPathAsConstString(bool denormalize = true) const;
This avoids the issue where people were getting an already uniqued "const char *" that came from a ConstString only to put the "const char *" back into a "ConstString" object. By returning the ConstString instead of a "const char *" clients can be more efficient with the result.
The patch:
- Removes the non const GetDirectory() and GetFilename() get accessors
- Adds set accessors to replace the above functions: SetDirectory() and SetFilename().
- Adds ClearDirectory() and ClearFilename() to replace usage of the FileSpec::GetDirectory().Clear()/FileSpec::GetFilename().Clear() call sites
- Fixed all incorrect usage of FileSpec::GetCString() to use FileSpec::GetPath().c_str() where appropriate, and updated other call sites that wanted a ConstString to use the newly returned ConstString appropriately and efficiently.
Differential Revision: https://reviews.llvm.org/D130549
The DumpDataExtractor function had two branches for printing floating
point values. One branch (APFloat) was used if we had a Target object
around and could query it for the appropriate semantics. If we didn't
have a Target, we used host operations to read and format the value.
This patch changes second path to use APFloat as well. To make it work,
I pick reasonable defaults for different byte size. Notably, I did not
include x87 long double in that list (as it is ambibuous and
architecture-specific). This exposed a bug where we were printing
register values using the target-less branch, even though the registers
definitely belong to a target, and we had it available. Fixing this
prompted the update of several tests for register values due to slightly
different floating point outputs.
The most dubious aspect of this patch is the change in
TypeSystemClang::GetFloatTypeSemantics to recognize `10` as a valid size
for x87 long double. This was necessary because because sizeof(long
double) on x86_64 is 16 even though it only holds 10 bytes of useful
data. This generalizes the hackaround present in the target-free branch
of the dumping function.
Differential Revision: https://reviews.llvm.org/D129750
This diff move the logic of `GetControlFlowKind()` from Disassembler.cpp to DisassemblerLLVMC.cpp.
Here's details:
- Actual logic of GetControlFlowKind() move to `DisassemblerLLVMC.cpp`, and we can check underlying architecture using `DisassemblerScope` there.
- With this change, passing 'triple' to `GetControlFlowKind()` is no more required.
Reviewed By: wallace
Differential Revision: https://reviews.llvm.org/D130320
This is the first part of support for reading MTE tags from Linux
core files. The format is documented here:
https://www.kernel.org/doc/html/latest/arm64/memory-tagging-extension.html#core-dump-support
This patch adds a method to unpack from the format the core
file uses, which is different to the one chosen for GDB packets.
MemoryTagManagerAArch64MTE is not tied one OS so another OS
might choose a different format in future. However, infrastructure
to handle that would go untested until then so I've chosen not to
attempt to handle that.
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/D129487
Checking if a path is absolute can be expensive and currently the result is not cached in the FileSpec object. This patch adds caching and also code to clear the cache if the file is modified.
Differential Revision: https://reviews.llvm.org/D130396
The FileSpect APIs allow users to modify instance variables directly by getting a non const reference to the directory and filename instance variables. This makes it impossibly to control all of the times the FileSpec object is modified so we can clear the cache. This patch modifies the APIs of FileSpec so no one can modify the directory or filename directly by adding set accessors and by removing the get accessors that are non const.
Many clients were using FileSpec::GetCString(...) which returned a unique C string from a ConstString'ified version of the result of GetPath() which returned a std::string. This caused many locations to use this convenient function incorrectly and could cause many strings to be added to the constant string pool that didn't need to. Most clients were converted to using FileSpec::GetPath().c_str() when possible. Other clients were modified to use the newly renamed version of this function which returns an actualy ConstString:
ConstString FileSpec::GetPathAsConstString(bool denormalize = true) const;
This avoids the issue where people were getting an already uniqued "const char *" that came from a ConstString only to put the "const char *" back into a "ConstString" object. By returning the ConstString instead of a "const char *" clients can be more efficient with the result.
The patch:
- Removes the non const GetDirectory() and GetFilename() get accessors
- Adds set accessors to replace the above functions: SetDirectory() and SetFilename().
- Adds ClearDirectory() and ClearFilename() to replace usage of the FileSpec::GetDirectory().Clear()/FileSpec::GetFilename().Clear() call sites
- Fixed all incorrect usage of FileSpec::GetCString() to use FileSpec::GetPath().c_str() where appropriate, and updated other call sites that wanted a ConstString to use the newly returned ConstString appropriately and efficiently.
Differential Revision: https://reviews.llvm.org/D130309
DW_OP_skip/DW_OP_bra can move offset to the end of the data, which means that this was the last instruction to execute and the interpreter should terminate.
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/D130285
Reland 486787210d which broke tests on Arm and Windows.
* Windows -- on Windows const static data members with no out-of-class
definition do have valid addresses, in constract to other platforms
(Linux, macos) where they don't. Adjusted the test to expect success
on Windows and failure on other platforms.
* Arm -- `int128` is not available on 32-bit ARM, so disable the test
for this architecture.
This adds support for using const static integral data members as described by C++11 [class.static.data]p3
to LLDB's expression evaluator.
So far LLDB treated these data members are normal static variables. They already work as intended when they are declared in the class definition and then defined in a namespace scope. However, if they are declared and initialised in the class definition but never defined in a namespace scope, all LLDB expressions that use them will fail to link when LLDB can't find the respective symbol for the variable.
The reason for this is that the data members which are only declared in the class are not emitted into any object file so LLDB can never resolve them. Expressions that use these variables are expected to directly use their constant value if possible. Clang can do this for us during codegen, but it requires that we add the constant value to the VarDecl we generate for these data members.
This patch implements this by:
* parsing the constant values from the debug info and adding it to variable declarations we encounter.
* ensuring that LLDB doesn't implicitly try to take the address of expressions that might be an lvalue that points to such a special data member.
The second change is caused by LLDB's way of storing lvalues in the expression parser. When LLDB parses an expression, it tries to keep the result around via two mechanisms:
1. For lvalues, LLDB generates a static pointer variable and stores the address of the last expression in it: `T *$__lldb_expr_result_ptr = &LastExpression`
2. For everything else, LLDB generates a static variable of the same type as the last expression and then direct initialises that variable: `T $__lldb_expr_result(LastExpression)`
If we try to print a special const static data member via something like `expr Class::Member`, then LLDB will try to take the address of this expression as it's an lvalue. This means LLDB will try to take the address of the variable which causes that Clang can't replace the use with the constant value. There isn't any good way to detect this case (as there a lot of different expressions that could yield an lvalue that points to such a data member), so this patch also changes that we only use the first way of capturing the result if the last expression does not have a type that could potentially indicate it's coming from such a special data member.
This change shouldn't break most workflows for users. The only observable side effect I could find is that the implicit persistent result variables for const int's now have their own memory address:
Before this change:
```
(lldb) p i
(const int) $0 = 123
(lldb) p &$0
(const int *) $1 = 0x00007ffeefbff8e8
(lldb) p &i
(const int *) $2 = 0x00007ffeefbff8e8
```
After this change we capture `i` by value so it has its own value.
```
(lldb) p i
(const int) $0 = 123
(lldb) p &$0
(const int *) $1 = 0x0000000100155320
(lldb) p &i
(const int *) $2 = 0x00007ffeefbff8e8
```
Reviewed By: Michael137
Differential Revision: https://reviews.llvm.org/D81471
LLDB supports having globbing regexes in the process launch arguments
that will be resolved using the user's shell. This requires that we pass
the launch args to the shell and then read back the expanded arguments
using LLDB's argdumper utility.
As the shell will not just expand the globbing regexes but all special
characters, we need to escape all non-globbing charcters such as $, &,
<, >, etc. as those otherwise are interpreted and removed in the step
where we expand the globbing characters. Also because the special
characters are shell-specific, LLDB needs to maintain a list of all the
characters that need to be escaped for each specific shell.
This patch adds the list of special characters that need to be escaped
for fish. Without this patch on systems where fish is the user's shell
having any of these special characters in your arguments or path to
the binary will cause the process launch to fail. E.g., `lldb -- ./calc
1<2` is failing without this patch. The same happens if the absolute
path to calc is in a directory that contains for example parentheses
or other special characters.
Differential revision: https://reviews.llvm.org/D104635
Add a test that ensures we always prioritize exact triple matches when
creating platforms. This is a regression test for a (now resolved) bug
that that resulted in the remote tvOS platform being selected for a tvOS
simulator binary because the ArchSpecs are compatible.
Drop the thread-safe flag and make the locking strategy the
responsibility of the individual log handler.
Previously we got away with a non-thread safe mode because we were using
unbuffered streams that rely on the underlying syscalls/OS for
synchronization. With the introduction of log handlers, we can have
arbitrary logic involved in writing out the logs. With this patch the
log handlers can pick the most appropriate locking strategy for their
particular implementation.
Differential revision: https://reviews.llvm.org/D127922
Support adding a "pending callback" to the main loop, that will be
called once after all the pending events are processed. This can be
e.g. to defer destroying the process instance until its exit is fully
processed, as suggested in D127500.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.llvm.org/D128253
This patch introduces the concept of a log handlers. Log handlers allow
customizing the way log output is emitted. The StreamCallback class
tried to do something conceptually similar. The benefit of the log
handler interface is that you don't need to conform to llvm's
raw_ostream interface.
Differential revision: https://reviews.llvm.org/D127922
As discusses offline with @jj10305, we are updating some naming used throughout the code, specially in the json schema
- traceBuffer -> iptTrace
- core -> cpu
Differential Revision: https://reviews.llvm.org/D127817
This is the final functional patch to support intel pt decoding per cpu.
It works by doing the following:
- First, all context switches are split by tid and sorted in order. This produces a list of continuous executes per thread per core.
- Then, all intel pt subtraces are split by PSB boundaries and assigned to individual thread continuous executions on the same core by doing simple TSC-based comparisons.
- With this, we have, per thread, a sorted list of continuous executions each one with a list of intel pt subtraces. Up to this point, this is really fast because no instructions were actually decoded.
- Then, each thread can be decoded by traversing their continuous executions and intel pt subtraces. An advantage of having these continuous executions is that we can identify if a continuous exexecution doesn't have intel pt data, and thus has a gap in it. We can later to more sofisticated comparisons to identify if within a continuous execution there are gaps.
I'm adding a test as well.
Differential Revision: https://reviews.llvm.org/D126394
:q!
This diff is massive, but it's because it connects the client with lldb-server
and also ensures that the postmortem case works.
- Flatten the postmortem trace schema. The reason is that the schema has become quite complex due to the new multicore case, which defeats the original purpose of having a schema that could work for every trace plug-in. At this point, it's better that each trace plug-in defines it's own full schema. This means that the only common field is "type".
-- Because of this new approach, I merged the "common" trace load and saving functionalities into the IntelPT one. This simplified the code quite a bit. If we eventually implement another trace plug-in, we can see then what we could reuse.
-- The new schema, which is flattened, has now better comments and is parsed better. A change I did was to disallow hex addresses, because they are a bit error prone. I'm asking now to print the address in decimal.
-- Renamed "intel" to "GenuineIntel" in the schema because that's what you see in /proc/cpuinfo.
- Implemented reading the context switch trace data buffer. I had to do
some refactors to do that cleanly.
-- A major change that I did here was to simplify the perf_event circular buffer reading logic. It was too complex. Maybe the original Intel author had something different in mind.
- Implemented all the necessary bits to read trace.json files with per-core data.
- Implemented all the necessary bits to save to disk per-core trace session.
- Added a test that ensures that parsing and saving to disk works.
Differential Revision: https://reviews.llvm.org/D126015
- Add a warnings field in the jLLDBGetState response, for warnings to be delivered to the client for troubleshooting. This removes the need to silently log lldb-server's llvm::Errors and not expose them easily to the user
- Simplify the tscPerfZeroConversion struct and schema. It used to extend a base abstract class, but I'm doubting that we'll ever add other conversion mechanisms because all modern kernels support perf zero. It is also the one who is supposed to work with the timestamps produced by the context switch trace, so expecting it is imperative.
- Force tsc collection for cpu tracing.
- Add a test checking that tscPerfZeroConversion is returned by the GetState request
- Add a pre-check for cpu tracing that makes sure that perf zero values are available.
Differential Revision: https://reviews.llvm.org/D125932
so that they can be used to prime new Process runs. "process handle"
was also changed to populate the dummy target if there's no selected
target, so that the settings will get copied into new targets.
Differential Revision: https://reviews.llvm.org/D126259
This diffs implements per-core tracing on lldb-server. It also includes tests that ensure that tracing can be initiated from the client and that the jLLDBGetState ppacket returns the list of trace buffers per core.
This doesn't include any decoder changes.
Finally, this makes some little changes here and there improving the existing code.
A specific piece of code that can't reliably be tested is when tracing
per core fails due to permissions. In this case we add a
troubleshooting message and this is the manual test:
```
/proc/sys/kernel/perf_event_paranoid set to 1
(lldb) process trace start --per-core-tracing error: perf event syscall failed: Permission denied
You might need that /proc/sys/kernel/perf_event_paranoid has a value of 0 or -1.
``
Differential Revision: https://reviews.llvm.org/D124858
symbol name matches. Instead, we extract the incoming path's base
name, look up all the symbols with that base name, and then compare
the rest of the context that the user provided to make sure it
matches. However, we do this comparison using just a strstr. So for
instance:
break set -n foo::bar
will match not only "a::foo::bar" but "notherfoo::bar". The former is
pretty clearly the user's intent, but I don't think the latter is, and
results in breakpoints picking up too many matches.
This change adds a Language::DemangledNameContainsPath API which can
do a language aware match against the path provided. If the language
doesn't provide this we fall back to the strstr (though that's changed
to StringRef::contains in the patch).
Differential Revision: https://reviews.llvm.org/D124579
I'm refactoring IntelPTThreadTrace into IntelPTSingleBufferTrace so that it can
both single threads or single cores. In this diff I'm basically renaming the
class, moving it to its own file, and removing all the pieces that are not used
along with some basic cleanup.
Differential Revision: https://reviews.llvm.org/D124648
In UnwindAssemblyInstEmulation we correctly recognize when a LDP
restores the fp & lr in an epilogue, and mark them as having the
caller's contents now, but we don't update the CFA register rule
at that point to indicate that the CFA is now calculated in terms
of $sp. This doesn't impact the backtrace because the register
contents are all <same> now, but it can confuse the stepper when
the StackID changes mid-epilogue.
Differential Revision: https://reviews.llvm.org/D124492
rdar://92064415
In order to open perf events per core, we need to first get the list of
core ids available in the system. So I'm adding a function that does
that by parsing /proc/cpuinfo. That seems to be the simplest and most
portable way to do that.
Besides that, I made a few refactors and renames to reflect better that
the cpu info that we use in lldb-server comes from procfs.
Differential Revision: https://reviews.llvm.org/D124573
UniqueCStringMap<T> objects are a std::vector<UniqueCStringMap::Entry> objects where the Entry object contains a ConstString + T. The values in the vector are sorted first by ConstString and then by the T value. ConstString objects are simply uniqued "const char *" values and when we compare we use the actual string pointer as the value we sort by. This caused a problem when we saved the symbol table name indexes and debug info indexes to disk in one process when they were sorted, and then loaded them into another process when decoding them from the cache files. Why? Because the order in which the ConstString objects were created are now completely different and the string pointers will no longer be sorted in the new process the cache was loaded into.
The unit tests created for the initial patch didn't catch the encoding and decoding issues of UniqueCStringMap<T> because they were happening in the same process and encoding and decoding would end up createing sorted UniqueCStringMap<T> objects due to the constant string pool being exactly the same.
This patch does the sort and also reserves the right amount of entries in the UniqueCStringMap::m_map prior to adding them all to avoid doing multiple allocations.
Added a unit test that loads an object file from yaml, and then I created a cache file for the original file and removed the cache file's signature mod time check since we will generate an object file from the YAML, and use that as the object file for the Symtab object. Then we load the cache data from the array of symtab cache bytes so that the ConstString "const char *" values will not match the current process, and verify we can lookup the 4 names from the object file in the symbol table.
Differential Revision: https://reviews.llvm.org/D124572
This patch fixes a crash when using process launch -t to launch the
inferior from a TTY. The issue is that on Darwin, Host.mm is calling
ConnectionFileDescriptor::Connect without a socket_id_callback_type. The
overload passes nullptr as the function ref, which gets called
unconditionally as the socket_id_callback.
One potential way to fix this is to change all the lambdas to include a
null check, but instead I went with an empty lambda.
Differential revision: https://reviews.llvm.org/D124535
We dropped downstream support for Python 2 in the previous release. Now
that we have branched for the next release the window where this kind of
change could introduce conflicts is closing too. Start by getting rid of
Python 2 support in the Script Interpreter plugin.
Differential revision: https://reviews.llvm.org/D124429
Applied clang-tidy modernize-use-override over LLDB and added it to the LLDB .clang-tidy config.
Differential Revision: https://reviews.llvm.org/D123340
Unlike for any of the other shells, we were escaping $ when using tcsh.
There's nothing special about $ in tcsh and this prevents you from
expanding shell variables, one of the main reasons this functionality
exists in the first place.
Differential revision: https://reviews.llvm.org/D123690
This patch moves the platform creation and selection logic into the
per-debugger platform lists. I've tried to keep functional changes to a
minimum -- the main (only) observable difference in this change is that
APIs, which select a platform by name (e.g.,
Debugger::SetCurrentPlatform) will not automatically pick up a platform
associated with another debugger (or no debugger at all).
I've also added several tests for this functionality -- one of the
pleasant consequences of the debugger isolation is that it is now
possible to test the platform selection and creation logic.
This is a product of the discussion at
<https://discourse.llvm.org/t/multiple-platforms-with-the-same-name/59594>.
Differential Revision: https://reviews.llvm.org/D120810
LLDB supports having globbing regexes in the process launch arguments
that will be resolved using the user's shell. This requires that we pass
the launch args to the shell and then read back the expanded arguments
using LLDB's argdumper utility.
As the shell will not just expand the globbing regexes but all special
characters, we need to escape all non-globbing charcters such as $, &,
<, >, etc. as those otherwise are interpreted and removed in the step
where we expand the globbing characters. Also because the special
characters are shell-specific, LLDB needs to maintain a list of all the
characters that need to be escaped for each specific shell.
This patch adds the missing semicolon character to the escape list for
all currently supported shells. Without this having a semicolon in the
binary path or having a semicolon in the launch arguments will cause the
argdumping process to fail. E.g., lldb -- ./calc "a;b" was failing
before but is working now.
Fixes rdar://55776943
Differential revision: https://reviews.llvm.org/D104629
After enabling the LLDB index cache in production we discovered that some distributed build systems play with the modification times of any .o files that were downloaded from the build cache. This was causing the LLDB index cache to read the wrong cache file for files that didn't have a UUID as all of the modfication times were set to the same value by the build system. When new .o files were downloaded, the only unique identifier was the mod time which were all the same, and we would load an older cache for the updated .o file. So disabling caching of files that have no UUIDs for now until we can create a more solid solution.
Differential Revision: https://reviews.llvm.org/D120948
Currently, all data buffers are assumed to be writable. This is a
problem on macOS where it's not allowed to load unsigned binaries in
memory as writable. To be more precise, MAP_RESILIENT_CODESIGN and
MAP_RESILIENT_MEDIA need to be set for mapped (unsigned) binaries on our
platform.
Binaries are mapped through FileSystem::CreateDataBuffer which returns a
DataBufferLLVM. The latter is backed by a llvm::WritableMemoryBuffer
because every DataBuffer in LLDB is considered to be writable. In order
to use a read-only llvm::MemoryBuffer I had to split our abstraction
around it.
This patch distinguishes between a DataBuffer (read-only) and
WritableDataBuffer (read-write) and updates LLDB to use the appropriate
one.
rdar://74890607
Differential revision: https://reviews.llvm.org/D122856