Now the decoded thread has Append methods that provide more flexibility
in terms of the underlying data structure that represents the
instructions. In this case, we are able to represent the sporadic errors
as map and thus reduce the size of each instruction.
Differential Revision: https://reviews.llvm.org/D122293
For header units we build the top level module directly from the header
that it represents and macros defined in this TU need to be emitted (when
such a definition is live at the end of the TU).
Differential Revision: https://reviews.llvm.org/D121097
Instead of using a temporary `string` in `__vformat_to_wrapped` use a new
generic iterator. This aids to reduce the number of template instantions
and avoids using a `string` to buffer the entire formatted output.
This changes the type of `format_context` and `wformat_context`, this can
still be done since the code isn't ABI stable yet.
Several approaches have been evaluated:
- Using a __output_buffer base class with:
- a put function to store the buffer in its internal buffer
- a virtual flush function to copy the internal buffer to the output
- Using a `function` to forward the output operation to the output buffer,
much like the next method.
- Using a type erased function point to store the data in the buffer.
The last version resulted in the best performance. For some cases there's
still a loss of speed over the original method. This loss many becomes
apparent when large strings are copied to a pointer like iterator, before
the compiler optimized this using `memcpy`.
Reviewed By: ldionne, vitaut, #libc
Differential Revision: https://reviews.llvm.org/D110495
1. Add comments to explain why we set `isAsmParserOnly` for XACQUIRE and XRELEASE
2. Check `X86Inst` in the constructor of `RecognizableInstrBase` so that
we can avoid the case where one of it's field is not initialized but
accessed by user. (e.g. in X86EVEX2VEXTablesEmitter.cpp)
3. Move `Rec` from `RecognizableInstrBase` to `RecognizableInstr` to reduce
size of `RecognizableInstrBase`
4. Remove out-of-date comments for shouldBeEmitted() (filter() was removed)
5. Add a basic field `IsAsmParserOnly` and remove the field
`ShouldBeEmitted` b/c we can deduce it w/ little overhead
In C, assignment expressions result in an rvalue whose type is the type
of the lhs of the assignment after it undergoes lvalue to rvalue
conversion. lvalue to rvalue conversion in C strips all qualifiers
including _Atomic.
We used getUnqualifiedType() which does not strip the _Atomic qualifier
when we should have used getAtomicUnqualifiedType(). This corrects the
usage and adds some comments to getUnqualifiedType() to make it more
clear that it does not strip _Atomic and that's on purpose (see C11
6.2.5p27).
This addresses Issue 48742.
D117296 removed wording for __builtin_assume, D120205 restored the
wording, but the last sentence was only partly restored. This restores
the rest of the last sentence.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D122423
Now, perform last active true vector combine only where
we're extracting from a flag-setting operation. But in
fact, the last active extracting will output LASTB + WHILELS,
and the WHILELS itself is a flag-setting operation, so
precommit this case to test the potentially further optimization.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D122453
In fact, an instruction can not be emitted to disassemble table when
`isAsmParserOnly` is true, so `isAsmParserOnly=true` implies
`ShouldBeEmitted=false`.
We check `isAsmParserOnly` in X86FoldTablesEmitter.cpp at a early stage
b/c none of them is foldable.
This is support for the user-facing options to create importable header units
from headers in the user or system search paths (or to be given an absolute path).
This means that an incomplete header path will be passed by the driver and the
lookup carried out using the search paths present when the front end is run.
To support this, we introduce file fypes for c++-{user,system,header-unit}-header.
These terms are the same as the ones used by GCC, to minimise the differences for
tooling (and users).
The preprocessor checks for headers before issuing a warning for
"#pragma once" in a header build. We ensure that the importable header units
are recognised as headers in order to avoid such warnings.
Differential Revision: https://reviews.llvm.org/D121096
As we mentioned in the code comments for function `ResourcePoolTy::release`,
at some point there could be two identical resources on the two sides of `Next`
mark. It is usually not an issue, unless the following case:
1. Some resources are not returned.
2. We need to iterate the pool and free the element.
That will cause double free, which is the case for event pool. Since we don't release
events hold by the data map, it can happen that the `Next` mark is not reset, and
we have two identical items in the pool. When the pool is destroyed, we will call
`cuEventDestroy` twice on the same event. In the best case, we can only observe
CUDA errors. In the worst case, it can cause internal failures in CUDART and further
crash.
This patch fixes the issue by tracking all resources that have been given using
an `unordered_set`. We don't remove it when a resource is returned. When the pool
is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make
sure that the set contains all resources allocated from the device. We just need
to iterate the set and free the resource accordingly.
For now, only event pool is set to use it. Stream pool is not because we can make
sure all streams are returned when the plugin is destroyed.
Someone might be wondering, why don't we release all events hold in the data map.
That is because, plugins are determined to be destroyed *before* `libomptarget`.
If we can somehow make the plugin outlast `libomptarget`, life will be much
easier.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122014
This patch adds the necessary AMDGPU calling convention to the ctor /
dtor kernels. These are fundamentally device kenels called by the host
on image load. Without this calling convention information the AMDGPU
plugin is unable to identify them.
Depends on D122504
Fixes#54091
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122515
The default construction of constructor functions by LLVM tends to make
them have internal linkage. When we call a ctor / dtor function in the
target region we are actually creating a kernel that is called at
registration. Because the ctor is a kernel we need to make sure it's
externally visible so we can actually call it. This prevented AMDGPU
from correctly using constructors while NVPTX could use them simply
because it ignored internal visibility.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D122504
All LLVM backends use MCDisassembler as a base class for their
instruction decoders. Use "const MCDisassembler *" for the decoder
instead of "const void *". Remove unnecessary static casts.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D122245
STATUS='NEW' and 'REPLACE' require FILE= to be present.
STATUS='SCRATCH' may not appear with FILE=.
These errors are caught at compilation time when constant character
strings are used in an OPEN statement, but the runtime needs
to enforce them as well to catch errors in OPEN statements
with character variables and expressions.
Differential Revision: https://reviews.llvm.org/D122509
With the shared cache getting split into multiple files, the current
way we created ObjectFileMachO objects for shared cache dylib images
will break.
This patch conditionally adopts new SPIs which will do the right
thing in the new world of multi-file caches.
Some functions can end up non-externally visible despite not being
declared "static" or in an unnamed namespace in C++ - such as by having
parameters that are of non-external types.
Such functions aren't mistakenly intended to be defining some function
that needs a declaration. They could be maybe more legible (except for
the operator new example) with an explicit static, but that's a
stylistic thing outside what should be addressed by a warning.
This reapplies 275c56226d - once we figure
out what to do about the change in behavior for -Wnon-c-typedef-for-linkage
(this reverts the revert commit 85ee1d3ca1)
Differential Revision: https://reviews.llvm.org/D121328
We could only do this in limited ways (since we emit the TUs first, we
can't use ref_addr (& we can't use that in Split DWARF either) - so we
had to synthesize declarations into the TUs) and they were ambiguous in
some cases (if the CU type had internal linkage, parsing the TU would
require knowing which CU was referencing the TU to know which type the
declaration was for, which seems not-ideal). So to avoid all that, let's
just not reference types defined in the CU from TUs - instead moving the
TU type into the CU (recursively).
This does increase debug info size (by pulling more things out of type
units, into the compile unit) - about 2% of uncompressed dwp file size
for clang -O0 -g -gsplit-dwarf. (5% .debug_info.dwo section size
increase in the .dwp)
The interfaces to C_ASSOCIATED()'s specific procedures must be
PURE so that they are accepted for use in specification expressions.
Differential Revision: https://reviews.llvm.org/D122438
With Scripted Processes, in order to create scripted threads, the blueprint
provides a dictionary that have each thread index as the key with the respective
thread instance as the pair value.
In Python, this is fine because a dictionary key can be of any type including
integer types:
```
>>> {1: "one", 2: "two", 10: "ten"}
{1: 'one', 2: 'two', 10: 'ten'}
```
However, when the python dictionary gets bridged to C++ we convert it to a
`StructuredData::Dictionary` that uses a `std::map<ConstString, ObjectSP>`
for storage.
Because `std::map` is an ordered container and ours uses the `ConstString`
type for keys, the thread indices gets converted to strings which makes the
dictionary sorted alphabetically, instead of numerically.
If the ScriptedProcess has 10 threads or more, it causes thread “10”
(and higher) to be after thread “1”, but before thread “2”.
In order to solve this, this sorts the thread info dictionary keys
numerically, before iterating over them to create ScriptedThreads.
rdar://90327854
Differential Revision: https://reviews.llvm.org/D122429
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
This patch changes `StructuredData::Dictionary::GetKeys` return type
from an `StructuredData::ObjectSP` to a `StructuredData::ArraySP`.
The function already stored the keys in an array but implicitely upcasted
it to an `ObjectSP`, which required the user to convert it again to a
Array object to access each element.
Since we know the keys should be held by an iterable container, it makes
more sense to return the allocated ArraySP as-is.
Differential Revision: https://reviews.llvm.org/D122426
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
Previously, the ScriptedThread used the thread index as the thread id.
This patch parses the crashlog json to extract the actual thread "id" value,
and passes this information to the Crashlog ScriptedProcess blueprint,
to create a higher fidelity ScriptedThreaad.
It also updates the blueprint to show the thread name and thread queue.
Finally, this patch updates the interactive crashlog test to reflect
these changes.
rdar://90327854
Differential Revision: https://reviews.llvm.org/D122422
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
The rule was added in 2014 to support -stdlib=libc++ and -lc++ without
specifying -L, when D.Dir is not a well-known system library directory like
/usr/lib /usr/lib64. This rule turns out to get in the way with (-m32 for
64-bit clang) or (-m64 for 32-bit clang) for Gentoo :
https://github.com/llvm/llvm-project/issues/54515
Nowadays LLVM_ENABLE_RUNTIMES is the only recommended way building libc++ and
LLVM_ENABLE_PROJECTS=libc++ is deprecated. LLVM_ENABLE_RUNTIMES builds libc++
in D.Dir+"/../lib/${triple}/". The rule is unneeded. Also reverts D108286.
Gentoo uses a modified LLVM_ENABLE_RUNTIMES that installs libc++.so in
well-known paths like /usr/lib64 and /usr/lib which are already covered by
nearby search paths.
Implication: if a downstream package needs something like -lLLVM-15git and uses
libLLVM-15git.so not in a well-known path, it needs to supply -L
D.Dir+"/../lib" explicitly (e.g. via LLVMConfig.cmake), instead of relying on
the previous default search path.
Reviewed By: mgorny
Differential Revision: https://reviews.llvm.org/D122444
This patch adds a helper method to determine if a nonvirtual base has an entry in the LLVM struct. Such a base may not have an entry
if the base does not have any fields/bases itself that would change the size of the struct. This utility method is useful for other frontends (Polygeist) that use Clang as an API to generate code.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D122502
While -g[no-]simple-template-names is a driver option, the fancier
-gsimple-template-names={simple,mangled} option is cc1-only, so code
to handle it in the driver is dead.
Differential Revision: https://reviews.llvm.org/D122503
Adds flang/include/flang/Common/visit.h, which defines
a Fortran::common::visit() template function that is a drop-in
replacement for std::visit(). Modifies most use sites in
the front-end and runtime to use common::visit().
The C++ standard mandates that std::visit() have O(1) execution
time, which forces implementations to build dispatch tables.
This new common::visit() is O(log2 N) in the number of alternatives
in a variant<>, but that N tends to be small and so this change
produces a fairly significant improvement in compiler build
memory requirements, a 5-10% improvement in compiler build time,
and a small improvement in compiler execution time.
Building with -DFLANG_USE_STD_VISIT causes common::visit()
to be an alias for std::visit().
Calls to common::visit() with multiple variant arguments
are referred to std::visit(), pending further work.
Differential Revision: https://reviews.llvm.org/D122441