The ACLE distinguishes between the following addressing modes for gather
loads:
* "scalar base, vector offset", and
* "vector base, scalar offset".
For the "vector base, scalar offset" case, the
`int_aarch64_sve_ld1_gather_imm` intrinsic was added in 79f2422d.
Currently, that intrinsic assumes that the scalar offset is passed as an
immediate. As a result, it does not cater for cases where scalar offset
is stored in a register.
In this patch `int_aarch64_sve_ld1_gather_imm` is extended so that all
cases are covered:
* `int_aarch64_sve_ld1_gather_imm` is renamed as
`int_aarch64_sve_ld1_gather_scalar_offset`
* new DAG combine rules are added for GLD1_IMM for scenarios where the
offset is a non-immediate scalar or an out-of-range immediate
* sve-intrinsics-gather-loads-vector-base.ll is renamed as
sve-intrinsics-gather-loads-vector-base-imm-offset.ll
* sve-intrinsics-gather-loads-vector-base-scalar-offset.ll is added to test
file for non-immediate offsets
Similar changes are made for scatter store intrinsics.
Reviewed By: sdesmalen, efriedma
Differential Revision: https://reviews.llvm.org/D71773
Summary:
Normally, on linux we retrieve the process ID from the LinuxProcStatus
stream (which is just the contents of /proc/%d/status pseudo-file).
However, this stream is not strictly required (it's a breakpad
extension), and we are encountering a fair amount of minidumps which do
not have it present. It's not clear whether this is the case with all
these minidumps, but the two known situations where this stream can be
missing are:
- /proc filesystem not mounted (or something to that effect)
- process crashing after exhausting (almost) all file descriptors (so
the minidump writer may not be able to open the /proc file)
Since this is a corner case which will become less and less relevant
(crashpad-generated minidumps should not suffer from this problem), I
work around this problem by hardcoding the PID to 1 in these cases.
The same thing is done by the gdb plugin when talking to a stub which
does not report a process id (e.g. a hardware probe).
Reviewers: jingham, clayborg
Subscribers: markmentovai, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D70238
Summary:
This code is handling debug info paths starting with /proc/self/cwd,
which is one of the mechanisms people use to obtain "relocatable" debug
info (the idea being that one starts the debugger with an appropriate
cwd and things "just work").
Instead of resolving the symlinks inside DWARFUnit, we can do the same
thing more elegantly by hooking into the existing Module path remapping
code. Since llvm::DWARFUnit does not support any similar functionality,
doing things this way is also a step towards unifying llvm and lldb
dwarf parsers.
Reviewers: JDevlieghere, aprantl, clayborg, jdoerfert
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D71770
Summary:
Motivation: When setting breakpoints in certain projects line sequences are frequently being inserted out of order.
Rather than inserting sequences one at a time into a sorted line table, store all the line sequences as we're building them up and sort and flatten afterwards.
Reviewers: jdoerfert, labath
Reviewed By: labath
Subscribers: teemperor, labath, mgrang, JDevlieghere, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D72909
Summary:
This method was doing a lot more than it's only caller needed
(DWARFDIE::LookupDeepestBlock) needed, so I inline it into the caller,
and remove any code which is not actually used. This includes code for
searching for the deepest function, and the code for working around
incomplete DW_AT_low_pc/high_pc attributes on a compile unit DIE (modern
compiler get this right, and this method is called on function DIEs
anyway).
This also improves our llvm consistency, as llvm::DWARFDebugInfoEntry is
just a very simple struct with no nontrivial logic.
Reviewers: JDevlieghere, aprantl
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D72920
Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly.
Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov
Reviewed By: Ayal, DaniilSuchkov
Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67905
Summary:
Printing policy was not propogated to functiondecls when creating a
completion string which resulted in canonical template parameters like
`foo<type-parameter-0-0>`. This patch propogates printing policy to those as
well.
Fixes https://github.com/clangd/clangd/issues/76
Reviewers: ilya-biryukov
Subscribers: jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72715
Summary:
The goal of this patch is two-fold. First, it fixes a use-after-free in
the construction of the llvm DWARFContext. This happened because the
construction code was throwing away the lldb DataExtractors it got while
reading the sections (unlike their llvm counterparts, these are also
responsible for memory ownership). In most cases this did not matter,
because the sections are just slices of the mmapped file data. But this
isn't the case for compressed elf sections, in which case the section is
decompressed into a heap buffer. A similar thing also happen with object
files which are loaded from process memory.
The second goal is to make it explicit which sections go into the llvm
DWARFContext -- any access to the sections through both DWARF parsers
carries a risk of parsing things twice, so it's better if this is a
conscious decision. Also, this avoids loading completely irrelevant
sections (e.g. .text). At present, the only section that needs to be
present in the llvm DWARFContext is the debug_line_str. Using it through
both APIs is not a problem, as there is no parsing involved.
The first goal is achieved by loading the sections through the existing
lldb DWARFContext APIs, which already do the caching. The second by
explicitly enumerating the sections we wish to load.
Reviewers: JDevlieghere, aprantl
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D72917
This patch uses helper function rewriteLoopExitValues that is refactored in
D72602 to rematerialise the iteration count in exit blocks, so that we can
clean-up loop update expressions inside the hardware-loops later in
ARMLowOverheadLoops, which is necessary to get actual performance gains for
tail-predicated loops.
Differential Revision: https://reviews.llvm.org/D72714
If sysctl is not found at all, let the usual exception propogate
so that the user can fix their env. If it fails because of the
permissions required to read the property then print a warning
and continue.
Differential Revision: https://reviews.llvm.org/D72278
Summary: Current implementation of getLoopEstimatedTripCount returns 1 iteration less than it should. The reason is that in bottom tested loop first iteration is executed before first back branch is taken. For example for loop with !{!"branch_weights", i32 1 // taken, i32 1 // exit} metadata getLoopEstimatedTripCount gives 1 while actual number of iterations is 2.
Reviewers: Ayal, fhahn
Reviewed By: Ayal
Subscribers: mgorny, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71990
Summary:
This was reverted in 328e0f3dca due to
chromium bot failure. This revision addresses that case.
Original commit message:
Summary:
This patch will provide support for auto return type for the C++ member
functions. Before this return type of the member function is deduced and
stored in the DIE.
This patch includes llvm side implementation of this feature.
Patch by: Awanish Pandey <Awanish.Pandey@amd.com>
Reviewers: dblaikie, aprantl, shafik, alok, SouraVX, jini.susan.george
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D70524
We have a bug currently: printed tag names might overlap the
value column. It happens for MIPS now.
This patch adds a logic to calculate the size of indentation on fly
to fix such issues.
Differential revision: https://reviews.llvm.org/D72838
This moves `rewriteLoopExitValues()` from IndVarSimplify to LoopUtils thus
making it a generic loop utility function. This allows to rewrite loop exit
values by just calling this function without running the whole IndVarSimplify
pass.
We use this in D72714 to rematerialise the iteration count in exit blocks, so
that we can clean-up loop update expressions inside the hardware-loops later.
Differential Revision: https://reviews.llvm.org/D72602
The idea is to produce R_X86_64_PLT32 instead of
R_X86_64_PC32 for branches.
It fixes https://bugs.llvm.org/show_bug.cgi?id=44397.
This patch teaches MC to do that for JCC (jump if condition is met)
instructions. The new behavior matches modern GNU as.
It is similar to D43383, which did the same for "call/jmp foo",
but missed JCC cases.
Differential revision: https://reviews.llvm.org/D72831
This adds Post inc variants of the VLD2/4 and VST2/4 instructions in
MVE. It uses the same mechanism/nodes as Neon, transforming the
intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a
post-inc instruction. The code to do that is mostly taken from the
existing Neon code, but simplified as less variants are needed.
It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4
instrinsics, which allow the nodes to have MMO's, calculated as the full
length to the memory being loaded/stored.
Differential Revision: https://reviews.llvm.org/D71194
We were previously not necessarily favouring postinc for the MVE loads
and stores, leading to extra code prior to the loop to set up the
preinc. MVE in general can benefit from postinc (as we don't have
unrolled loops), and certain instructions like the VLD2's only post-inc
versions are available.
Differential Revision: https://reviews.llvm.org/D70790
StackColoring::remapInstructions() remaps MachineOperand frame index (e.g. %stack.1 -> %stack.0)
but does not remap FixedStackPseudoSourceValue frame index (e.g. store 4 into %stack.1.ap2.i.i)
referenced by MachineMemoryOperand.
This can cause an assertion failure when LiveDebugValues references a dead stack object.
It is difficult to craft a test case. -g, va_copy and stack-coloring are required.
I can only reproduce it on ppc32.
builds.
Fix a libc++abi test that was incorrectly checking for threading
primitives even when threading was disabled.
Additionally, temporarily XFAIL some module tests that fail because
the <atomic> header is unsupported but still built as a part of the
std module.
To properly address this libc++ would either need to produce a different
module.modulemap for single-threaded configurations, or it would need
to make the <atomic> header not hard-error and instead be empty
for single-threaded configurations
Summary:
We previously listed first declared members, then implicit operator=,
then implicit operator==, then implicit destructors. Per discussion on
https://github.com/itanium-cxx-abi/cxx-abi/issues/88, put the implicit
equality comparison operators at the very end, after all special member
functions.
Reviewers: rjmccall
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72897
a temporary.
We previously failed to materialize a temporary when performing an
implicit conversion to a reference type, resulting in our thinking the
argument was a value rather than a reference in some cases.
Currently PromoteMaskArithemtic only looks at a single operation to
skip casts. This means we miss cases where we combine multiple masks.
This patch updates PromoteMaskArithemtic to try to recursively promote
AND/XOR/AND nodes that terminate in truncates of the right size or
constant vectors.
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D72524
Except AMDGPU/R600RegisterInfo (a bunch of MIR tests seem to have
problems), every target overrides it with true. PostMachineScheduler
requires livein information. Not providing it can cause assertion
failures in ScheduleDAGInstrs::addSchedBarrierDeps().
The MaterializationResponsibility::defineMaterializing method allows clients to
add new definitions that are in the process of being materialized to the JIT.
This patch adds support to defineMaterializing for symbols with weak linkage
where the new definitions may be rejected if another materializer concurrently
defines the same symbol. If a weak symbol is rejected it will not be added to
the MaterializationResponsibility's responsibility set. Clients can check for
membership in the responsibility set via the
MaterializationResponsibility::getSymbols() method before resolving any
such weak symbols.
This patch also adds code to RTDyldObjectLinkingLayer to tag COFF comdat symbols
introduced during codegen as weak, on the assumption that these are COFF comdat
constants. This fixes http://llvm.org/PR40074.
Summary: This diff expands the SpacesAroundConditions option added in D68346 to include adding spaces to catch statements.
Reviewed By: MyDeveloperDay
Patch by: timwoj
Differential Revision: https://reviews.llvm.org/D72793
Summary:
The documentation for IndentCaseLabels claimed that the "Switch
statement body is always indented one level more than case labels". This
is technically false for the code block immediately following the label.
Its closing bracket aligns with the start of the label.
If the case label are not indented, it leads to a style where the
closing bracket of the block aligns with the closing bracket of the
switch statement, which can be hard to parse.
This change introduces a new option, IndentCaseBlocks, which when true
treats the block as a scope block (which it technically is).
(Note: regenerated ClangFormatStyleOptions.rst using tools/dump_style.py)
Reviewed By: MyDeveloperDay
Patch By: capn
Tags: #clang-format, #clang
Differential Revision: https://reviews.llvm.org/D72276
This intention is to move patchable-function before aarch64-branch-targets
(configured in AArch64PassConfig::addPreEmitPass) so that we emit BTI before NOPs
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424).
This also allows addPreEmitPass() passes to know the precise instruction sizes if they want.
Tried x86-64 Debug/Release builds of ccls with -fxray-instrument -fxray-instruction-threshold=1.
No output difference with this commit and the previous commit.
Summary:
I think whatever problem the gluing was fixing has long since been fixed. We don't have any of the restrictions on FP stack stuff that existed back when this was first added.
I had to change which type we use for FILD in BuildFILD when X86 was enabled because most of the isel patterns block f32/f64 instructions when SSE1/SSE2 are enabled. So I needed to use the f80 pattern, but this shouldn't have an effect the generated code since there is only one FILD instruction anyway. We already use f80 explicitly in other other places.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: andrew.w.kaylor, scanon, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72805