This matches what we do in IR. For the RISC-V test case, this allows
us to use -8 for the AND mask instead of materializing a constant in a register.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D127335
Most GSS nodes have short effective lifetimes, keeping them around until the
end of the parse is wasteful. Mark and sweep them every 20 tokens.
When parsing clangd/AST.cpp, this reduces the GSS memory from 1MB to 20kB.
We pay ~5% performance for this according to the glrParse benchmark.
(Parsing more tokens between GCs doesn't seem to improve this further).
Compared to the refcounting approach in https://reviews.llvm.org/D126337, this
is simpler (at least the complexity is better isolated) and has >2x less
overhead. It doesn't provide death handlers (for error-handling) but we have
an alternative solution in mind.
Differential Revision: https://reviews.llvm.org/D126723
There are more states than symbols.
This means first partioning the action list by state leaves us with a smaller
range to binary search over. This improves find() a lot and glrParse() by 7%.
The tradeoff is storing more smaller ranges increases the size of the offsets
array, overall grammar memory is +1% (337->340KB).
Before:
glrParse 188795975 ns 188778003 ns 77 bytes_per_second=1.98068M/s
After:
glrParse 175936203 ns 175916873 ns 81 bytes_per_second=2.12548M/s
Differential Revision: https://reviews.llvm.org/D127006
If the callsite is in a single BB loop, we need to exclude the BB from
the successor set (in which it'd be a member), because that set forms a
boundary at which we stop traversing the CFG, when re-ingesting BBs
after inlining; but after inlining, the callsite BB's new successors
should be visited.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D127178
Once ForceStop is set to true, we only return positive inlining advice when it is mandatory; There is no need for further node/edge accounting.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D127245
On FreeBSD AArch64 safestack needs to use __syscall to handle 64 bit arguments
Reviewed by: MaskRay, vitalybuka
Differential Revision: https://reviews.llvm.org/D125901
The stack pointer is stored in the second slot in the jump buffer on
AArch64. Use the correct slot value to read this rather than the
following register.
Reviewed by: melver
Differential Revision: https://reviews.llvm.org/D125762
As with 64 bit x86 use an offset in middle of the address space scaled up
to work with the full 48 bit space.
Reviewed by: MaskRay
Differential Revision: https://reviews.llvm.org/D125757
This is failing to find `EnableABIBreakingCheck` at link time. Add
Support to provide it here.
Reviewed By: hokein
Differential Revision: https://reviews.llvm.org/D127269
Support complex numbers for Matrix Market Exchange Formats. Add a test case.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D127138
In the case you pass multiple universal binaries to llvm-cov, assume
that if only 1 architecture is passed, it should be used for all the
passed binaries.
This makes it easier to use multiple multi-arch binaries, since it's
likely very rare that your architectures mismatch significantly if you
also have multiple binaries in a single llvm-cov invocation. If the
architecture is invalid for any of the passed binaries, it will still
fail later.
Differential Revision: https://reviews.llvm.org/D121667
In D126580 we updated the test to reflect that there should always
be a full trace. However, some executions do not have symbolizer
information, so we will restore the original test until we can formulate
a more robust test.
Reviewed By: leonardchan
Differential Revision: https://reviews.llvm.org/D127334
This patch implements R_AARCH64_PREL64 and R_AARCH64_PREL32 relocations that is
used in eh frame pointers. The test case utlizes obj2yaml tool to create an
artifical eh frame that generates related relocation types.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D127058
The default value of sample-profile-inline-limit-max is defined as 10000 in sampleprofile.cpp. This is too big for cspreinliner which works with assembly size instead of IR size. The value 3000 turns out to be a good tradeoff. Compared to the value 10000, 3000 gives as good performance and code size, but lower build time.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D127330
This reverts commit b37d84aa8d.
This broke aarch64 asan builders for fuchsia. I accidentally changed the allocator
settings for fuchsia on aarch64 because the new asan allocator settings use:
```
// AArch64/SANITIZER_CAN_USE_ALLOCATOR64 is only for 42-bit VMA
// so no need to different values for different VMA.
const uptr kAllocatorSpace = 0x10000000000ULL;
const uptr kAllocatorSize = 0x10000000000ULL; // 3T.
typedef DefaultSizeClassMap SizeClassMap;
```
rather than reaching the final `#else` which would use fuchsia's lsan config.
During lowering of memcmp/bcmp, the check for a size of 0 is done
in 2 different ways. In rare cases this can lead to a crash in
SystemZSelectionDAGInfo::EmitTargetCodeForMemcmp(). The root cause
is that SelectionDAGBuilder::visitMemCmpBCmpCall() checks for a
constant int value which is not yet evaluated. When the value is
turned into a SDValue, then the evaluation is done and results in
a ConstantSDNode. But EmitTargetCodeForMemcmp() expects the special
case of 0 length to be handled, which results in an assertion.
The fix is to turn the value into a SDValue, so that both functions
use the same check.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D126900
When constraints in the two operands make each other redundant, prefer constraints of the second because this affects the number of sets in the output at each level; reducing these can help prevent exponential blowup.
This is accomplished by adding extra overloads to Simplex::detectRedundant that only scan a subrange of the constraints for redundancy.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D127237
On most platforms, the linker detection command that we run ends up being
something like `clang++ -Wl,-v` or `clang++ -Wl,--version`. This usually
fails with a missing reference to `_main` because we don't have any input
file. However, when compiling for a target that is implicitly freestanding,
the invocation actually succeeds and a dummy `a.out` file is created in
the current working directory. This is extremely annoying because it
creates a `a.out` file at the root of the monorepo when running CMake
configuration from the root.
Differential Revision: https://reviews.llvm.org/D125827
In order to avoid stranding the Objective-C runtime lock, we switched
from objc_copyRealizedClassList to its non locking variant
objc_copyRealizedClassList_nolock. Not taking the lock was relatively
safe because we run this expression on one thread only, but it was still
possible that someone was in the middle of modifying this list while we
were trying to read it. Worst case that would result in a crash in the
inferior without side-effects and we'd unwind and try again later.
With the introduction of macOS Ventura, we can use
objc_getRealizedClassList_trylock instead. It has semantics similar to
objc_copyRealizedClassList_nolock, but instead of not locking at all,
the function returns if the lock is already taken, which avoids the
aforementioned crash without stranding the Objective-C runtime lock.
Because LLDB gets to allocate the underlying memory we also avoid
stranding the malloc lock.
rdar://89373233
Differential revision: https://reviews.llvm.org/D127252
Reduces repetition in tablegen files for defining various tensor types. In particular the goal is to reduce the repetition when defining new tensor types (e.g., D126994).
Reviewed By: aartbik, rriddle
Differential Revision: https://reviews.llvm.org/D127039
This patch moves the implementation of synthetic properties from the StructValue class into the Value base class so that it can be used across all Value instances.
Reviewed By: gribozavr2, ymandel, sgatev, xazax.hun
Differential Revision: https://reviews.llvm.org/D127196
This fixes the underlying module dependencies, which had a
non-deterministic order, which was also visible in the order of calls to
DependencyConsumer methods. This was not directly observable in
the clang-scan-deps utility, because it was previously seeing a sorted
order from std::map in DependencyScanningTool. However, the underlying
API previously created a likely issue for any other clients. Note: if
you only apply the change from DependencyScanningTool, you can see the
issue in clang-scan-deps, and existing tests will fail
non-deterministicaly.
Differential Revision: https://reviews.llvm.org/D127243
The command-line arguments for module builds are cc1 commands, so they
do not implicitly set -disable-free like a driver invocation, and
Tooling will disable it for the scanning instance itself. Set
-disable-free explicitly so that separate invocations for building
modules will not pay for freeing memory unnecessarily.
Differential Revision: https://reviews.llvm.org/D127229
Includes dpp versions of VOP3P instructions.
Patch 18/N for upstreaming of AMDGPU gfx11 architecture
Depends on D126917
Reviewed By: rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D126978
The callsite identifier used in pseudo probe encoding and decoding is consisted of a function name and the callsite probe id. For encoding, i.e., `MCPseudoProbeInlineTree`, the function name is callee function name. However for decoding, i.e., `MCDecodedPseudoProbeInlineTree`, the caller function name is used actually. This results in multiple callees that are inlined at the same callsite, likely via indirect call promotion, sharing the same decoded inline frame. While it is not a problem for profile generation, it confuses probe re-encoding in Bolt.
In Bolt, we decode pseudo probes first and build `MCDecodedPseudoProbeInlineTree`. The decoded tree is used for final re-encoding. Here comes the problem. Two inlinees from the same callsite share the same decoded inline frame. During re-encoding, the frame name (whatever inlinee comes first) will be used and encoded in the bolted binary. This will cause wrong inline contexts in the profile generated on the bolted binary.
The fix is a no-op to pre-bolt profile generation. Some of the bolt tests are not yet upstreamed, thus I'm not adding a bolt test here.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D126434
Implement the C-API and Python bindings for the builtin opaque type, which was previously missing.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D127303
Add proper CodeView encoding for positive constant integer values greater than
127. In addition, use the two byte encoding form for positive values less
than LF_NUMERIC.
Differential Revision: https://reviews.llvm.org/D126968