This better matches the rest of the infrastructure, is much simpler, and makes it easier to move these types to being declaratively specified.
Differential Revision: https://reviews.llvm.org/D93432
This document was not updated after the LLVM dialect type system had been
reimplemented and was using an outdated syntax. Rewrite the part of the
document that concerns type conversion and prepare the ground for splitting it
into a document that explains how built-in types are converted and a separate
document that explains how standard types and functions are converted, which
will better correspond to the fact that built-in types do not belong to the
standard dialect.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D93486
cl.exe doesn't understand Zd (in either MSVC 2017 or 2019), so neiter
should we. It used to do the same as `-gline-tables-only` which is
exposed as clang-cl flag as well, so if you want this behavior, use
`gline-tables-only`. That makes it clear that it's a clang-cl-only flag
that won't work with cl.exe.
Motivated by the discussion in D92958.
Differential Revision: https://reviews.llvm.org/D93458
For full-debug-info (is_debug=true / symbol_level=2 builds), this makes
linking 15% slower, but gdb startup 1500% faster (for lld: link time
3.9s->4.4s, gdb load time >30s->2s).
For link time, I ran
bench.py -o {noindex,index}.txt \
sh -c 'rm out/gn/bin/lld && ninja -C out/gn lld'
and then `ministat noindex.txt index.txt`:
```
x noindex.txt
+ index.txt
N Min Max Median Avg Stddev
x 5 3.784461 4.0200169 3.8452811 3.8754988 0.089902595
+ 5 4.32496 4.6058481 4.3361208 4.4141198 0.12288267
Difference at 95.0% confidence
0.538621 +/- 0.15702
13.8981% +/- 4.05161%
(Student's t, pooled s = 0.107663)
```
For gdb load time I loaded the crash in PR48392 with
gdb -ex r --args ../out/gn/bin/ld64.lld.darwinnew @response.txt
and just stopped the time until the crash got displayed with a stopwatch
a few times. So the speedup there is less precise, but it's so
pronounced that that's ok (loads ~instantly with the patch, takes a very
long time without it).
Only doing this for LLD because I haven't tried it with other linkers.
Differential Revision: https://reviews.llvm.org/D92844
If a GPU function is externally reachable we give up trying to find the
(unique) kernel it is called from. This can hinder optimizations. Emit a
remark and explain mitigation strategies.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D93439
The LLVM IR 'switch' instruction allows control flow to be transferred
to one of any number of branches depending on an integer control value,
or a default value if the control does not match any branch values. This patch
adds `llvm.switch` to the MLIR LLVMIR dialect, as well as translation routines
for lowering it to LLVM IR.
To store a variable number of operands for a variable number of branch
destinations, the new op makes use of the `AttrSizedOperandSegments`
trait. It stores its default branch operands as one segment, and all
remaining case branches' operands as another. It also stores pairs of
begin and end offset values to delineate the sub-range of each case branch's
operands. There's probably a better way to implement this, since the
offset computation complicates several parts of the op definition. This is the
approach I settled on because in doing so I was able to delegate to the default
op builder member functions. However, it may be preferable to instead specify
`skipDefaultBuilders` in the op's ODS, or use a completely separate
approach; feedback is welcome!
Another contentious part of this patch may be the custom printer and
parser functions for the op. Ideally I would have liked the MLIR to be
printed in this way:
```
llvm.switch %0, ^bb1(%1 : !llvm.i32) [
1: ^bb2,
2: ^bb3(%2, %3 : !llvm.i32, !llvm.i32)
]
```
The above would resemble how LLVM IR is formatted for the 'switch'
instruction. But I found it difficult to print and parse something like
this, whether I used the declarative assembly format or custom functions.
I also was not sure a multi-line format would be welcome -- it seems
like most MLIR ops do not use newlines. Again, I'd be happy to hear any
feedback here as well, or on any other aspect of the patch.
Differential Revision: https://reviews.llvm.org/D93005
Remove the OpenMP clause information from the OMPKinds.def file and use the
information from the new OMP.td file. There is now a single source of truth for the
directives and clauses.
To avoid generate lots of specific small code from tablegen, the macros previously
used in OMPKinds.def are generated almost as identical. This can be polished and
possibly removed in a further patch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D92955
Live symbols should only cause the files in which they are defined
to become live.
For now this is only tested in emscripten: we're continuing
to work on reducing the test case further for an lld-style
unit test.
Differential Revision: https://reviews.llvm.org/D93472
canAllocate() does not take into account the header size so it does
not return the right answer in borderline cases. There was already
code handling this correctly in isTaggedAllocation() so split it out
into a separate function and call it from the test.
Furthermore the test was incorrect when MTE is enabled because MTE
does not pattern fill primary allocations. Fix it.
Differential Revision: https://reviews.llvm.org/D93437
On PPC, the vector pair instructions are independent from MMA.
This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names.
We also move the vector pair type/intrinsic/builtin tests to their own files.
Differential Revision: https://reviews.llvm.org/D91974
Initially we were avoiding the release of smaller size classes due to
the fact that it was an expensive operation, particularly on 32-bit
platforms. With a lot of batches, and given that there are a lot of
blocks per page, this was a lengthy operation with little results.
There has been some improvements since then to the 32-bit release,
and we still have some criterias preventing us from wasting time
(eg, 9x% free blocks in the class size, etc).
Allowing to release blocks < 128 bytes helps in situations where a lot
of small chunks would not have been reclaimed if not for a forced
reclaiming.
Additionally change some `CHECK` to `DCHECK` and rearrange a bit the
code.
I didn't experience any regressions in my benchmarks.
Differential Revision: https://reviews.llvm.org/D93141
Extracting the similar regions is the first step in the IROutliner.
Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed. Each
IRSimilarityCandidate is used to define an OutlinableRegion. Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.
Each region is then extracted with the CodeExtractor into its own
function.
We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
Recommit of bf899e8913 fixing memory
leaks.
Reviewers: paquette, jroelofs, yroux
Differential Revision: https://reviews.llvm.org/D86975
is_debug by default makes symbol_level = 2 and !is_debug means by
default symbol_level = 0.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D92958
Use RegSetKind enum for register sets everything, rather than int.
Always spell it as 'RegSetKind', without unnecessary 'enum'. Add
missing switch case. While at it, use uint32_t for regnums
consistently.
Differential Revision: https://reviews.llvm.org/D93450
Replace the wrong code in GetRegisterSetCount() with a constant return.
The original code passed register index in place of register set index,
effectively getting always true. Correcting the code to check for
register set existence is not possible as LLDB supports only eliminating
last register sets. Just return the full number for now which should
be NFC.
Differential Revision: https://reviews.llvm.org/D93396
This reverts commit a01b26fb51, because it
breaks the "finish" command in some way -- the command does not
terminate after it steps out, but continues running the target. The
exact blast radius is not clear, but it at least affects the usage of
the "finish" command in TestGuiBasicDebug.py. The error is *not*
gui-related, as the same issue can be reproduced by running the same
steps outside of the gui.
There is some kind of a race going on, as the test fails only 20% of the
time on the buildbot.
If two variables are declared with __attribute__((section(name))) and
the implicit section types (e.g. read only vs writeable) conflict, an
error is raised. Extend this mechanism so that an error is raised if the
section type implied by a function's __attribute__((section)) conflicts
with that of another variable.
This patch add some checks for the restriction on the routine directive
and fix several issue at the same time.
Validity tests have been added in a separate file than acc-clause-validity.f90 since this one
became quite large. I plan to split the larger file once on-going review are done.
Reviewed By: sameeranjoshi
Differential Revision: https://reviews.llvm.org/D92672
The LCSSA pass makes use of a function insertDebugValuesForPHIs() to
propogate dbg.value() intrinsics to newly inserted PHI instructions. Faulty
behaviour occurs when the parent PHI of a newly inserted PHI is not the
most recent assignment to a source variable. insertDebugValuesForPHIs ends
up propagating a value that isn't the most recent assignemnt.
This change removes the call to insertDebugValuesForPHIs() from LCSSA,
preventing incorrect dbg.value intrinsics from being propagated.
Propagating variable locations between blocks will occur later, during
LiveDebugValues.
Differential Revision: https://reviews.llvm.org/D92576
The PPCCTRLoop pass has been moved to HardwareLoops,
so the comments and some useless code are deprecated now.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D93336
[amdgpu] Default to code object v3
v4 is not yet readily available, and doesn't appear
to be implemented in the back end
Reviewed By: t-tye, yaxunl
Differential Revision: https://reviews.llvm.org/D93258
This commit shuffles SPIR-V code around to better follow MLIR
convention. Specifically,
* Created IR/, Transforms/, Linking/, and Utils/ subdirectories and
moved suitable code inside.
* Created SPIRVEnums.{h|cpp} for SPIR-V C/C++ enums generated from
SPIR-V spec. Previously they are cluttered inside SPIRVTypes.{h|cpp}.
* Fixed include guards in various header files (both .h and .td).
* Moved serialization tests under test/Target/SPIRV.
* Renamed TableGen backend -gen-spirv-op-utils into -gen-spirv-attr-utils
as it is only generating utility functions for attributes.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D93407
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.
Below is an example of splitting the block bb1 at its first instruction.
/// Original IR
bb0:
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
br bb1
bb1:
br bb1.split
bb1.split:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
br bb1.split
bb1.split
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
Differential Revision: https://reviews.llvm.org/D92200
Fixes issue where if a line section doesn't start with a line number
then the addresses at the beginning of the section don't have line numbers.
For example, for a line section like this
```
0001:00000010-00000014, line/column/addr entries = 1
7 00000013 !
```
a line number wouldn't be found for addresses from 10 to 12.
This matches behavior when using the DIA SDK.
Differential Revision: https://reviews.llvm.org/D93306
Update the allowed clauses for the SERIAL construct for the new OpenACC 3.1
specification.
Reviewed By: sameeranjoshi
Differential Revision: https://reviews.llvm.org/D92123
If the source instruction has !annotation metadata, all instructions
created during combining should also have it. Tell the builder to
add it.
The !annotation system was discussed on llvm-dev as part of
'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
This patch is based on an earlier patch by Francis Visoiu Mistrih.
Reviewed By: thegameg, lebedev.ri
Differential Revision: https://reviews.llvm.org/D91444
As indicated by AArch64 ELF specification, symbols with st_other
marked with STO_AARCH64_VARIANT_PCS indicates it may follow a variant
procedure call standard with different register usage convention
(for instance SVE calls).
Static linkers must preserve the marking and propagate it to the dynamic
symbol table if any reference or definition of the symbol is marked with
STO_AARCH64_VARIANT_PCS, and add a DT_AARCH64_VARIANT_PCS dynamic tag if
there are R_<CLS>_JUMP_SLOT relocations that reference that symbols.
It implements https://bugs.llvm.org/show_bug.cgi?id=48368.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D93045