Code duplication (subsequently removed by refactoring) allowed
a logic discrepancy to creep in here.
We were being conservative about creating a vector binop -- but
not a vector cmp -- in the case where a vector op has the same
estimated cost as the scalar op. We want to be more aggressive
here because that can allow other combines based on reduced
instruction count/uses.
We can reverse the transform in DAGCombiner (potentially with a
more accurate cost model) if this causes regressions.
AFAIK, this does not conflict with InstCombine. We have a
scalarize transform there, but it relies on finding a constant
operand or a matching insertelement, so that means it eliminates
an extractelement from the sequence (so we won't have 2 extracts
by the time we get here if InstCombine succeeds).
Differential Revision: https://reviews.llvm.org/D75062
Summary:
The original patch had TODOs to add support for step computations,
which this commit addresses. The computations are expressed using
affine expressions so that the affine canonicalizers can simplify
the full bound and index computations.
Also cleans up the code a little and exposes the pass in the
header file.
Differential Revision: https://reviews.llvm.org/D75052
On Windows, an error running the debugger typically leaves a process
hanging around in the working directory. When Dexter exits, it can't then
delete the working directory and produces an exception, masking the problem
in the debugger. (This can be worked around by specifying --save-temps).
Rather than hard-erroring, print a warning when we can't delete the working
directory instead.
It'd be much better to improve our error handling, and make the
WorkingDirectory class aware that something's wrong when it enters exit.
However, this is something that's going to mask genuine errors and make
everyones lives harder right now, so I think this non-ideal fix is
important to get in first.
Differential Revision: https://reviews.llvm.org/D74548
This member is for some reason initialized in ClangASTSource::FindExternalVisibleDecls
so all other functions using this member dereference a nullptr unless we
call this function before that. Let's just initialize this in the constructor.
This should be NFC as the only side effect is that we don't reset the namespace map
when calling ClangASTSource::FindExternalVisibleDecls multiple times (and we never
call this function multiple times for one NameSearchContext from what I can see).
The size of NameSearchContext isn't important as we never store it and rarely
allocate more than a few. This way we also don't have to use the memset to
initialize these fields to zero.
Summary:
Currently extract variable doesn't spell the type explicitly and just
uses an `auto` instead, which is not available in C.
Reviewers: usaxena95
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75053
Now that the Windows installer no longer does anything besides
self-extract, maybe it would make sense to distribute the toolchain as a
plain zip file in addition to the current installer.
Differential revision: https://reviews.llvm.org/D74896
Summary:
This patch renames functions and TableGen classes for SVE gathers and
scatters. The original names implied that the corresponding
methods/classes are only suited for regular gathers/scatters (i.e. LD1
and ST1), which is not the case. Indeed, we will be re-using them for
non-temporal and first-faulting gathers/scatters in the forthcoming
patches. The new names also highlight the split into Vector-Scalar (VS)
and Scalar-Vector (SV) cases.
List of changes:
* `performLD1GatherCombine` and `performST1ScatterCombine` are renamed
as `performGatherLoadCombine` and `performScatterStoreCombine`,
respectively.
* Selection DAG types for scatters and gathers from
AArch64SVEInstrInfo.td are renamed. For example, `SDT_AArch64_GLD1` is
renamed as `SDT_AArch64_GATHER_SV`. SV stands for Scalar-Vector, as
opposed to Vector-Scalar (VS).
* The intrinsic classes from IntrinsicsAArch64.td are renamed. For
example, `AdvSIMD_GatherLoad_64bitOffset_Intrinsic` is renamed as
`AdvSIMD_GatherLoad_SV_64b_Offsets_Intrinsic`.
* Updated comments in `performGatherLoadCombine` and
`performScatterStoreCombine`.
Reviewers: sdesmalen, rengolin, efriedma
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75035
Affine dialect already has a map+operand simplification infrastructure in
place. Plug the recently added affine.min/max operations into this
infrastructure and add a simple test. More complex behavior of the simplifier
is already tested by other ops.
Addresses https://bugs.llvm.org/show_bug.cgi?id=45008.
Differential Revision: https://reviews.llvm.org/D75058
Originally, intrinsics generator for the LLVM dialect has been producing
customized code fragments for the translation of MLIR operations to LLVM IR
intrinsics. LLVM dialect ODS now provides a generalized version of the
translation code, parameterizable with the properties of the operation.
Generate ODS that uses this version of the translation code instead of
generating a new version of it for each intrinsic.
Differential Revision: https://reviews.llvm.org/D74893
All LLVM IR intrinsics are constructed in a similar way. The ODS definition of
the LLVM dialect in MLIR also lists multiple intrinsics, many of which
reproduce the same (or similar enough) code stanza to translate the MLIR
operation into the LLVM IR intrinsic. Provide a single base class containing
parameterizable code to build LLVM IR intrinsics given their name and the lists
of overloadable operands and results. Use this class to remove (almost)
duplicate translations for intrinsics defined in LLVMOps.td.
Differential Revision: https://reviews.llvm.org/D74889
Summary:
The mapper assigns annotations to loop.parallel operations that
are compatible with the loop to gpu mapping pass. The outermost
loop uses the grid dimensions, followed by block dimensions. All
remaining loops are mapped to sequential loops.
Differential Revision: https://reviews.llvm.org/D74963
Summary:
Implements the following intrinsics:
* llvm.aarch64.sve.convert.to.svbool
* llvm.aarch64.sve.convert.from.svbool
For converting the ACLE svbool_t type (<n x 16 x i1>) to and from the
other predicate types: <n x 8 x i1>, <n x 4 x i1> and <n x 2 x i1>.
Reviewers: sdesmalen, kmclaughlin, efriedma, dancgr, rengolin
Reviewed By: sdesmalen, efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74471
The following series of refactoring patches aim to fix the horrible mess that MallocChecker.cpp is.
I genuinely hate this file. It goes completely against how most of the checkers
are implemented, its by far the biggest headache regarding checker dependencies,
checker options, or anything you can imagine. On top of all that, its just bad
code. Its seriously everything that you shouldn't do in C++, or any other
language really. Bad variable/class names, in/out parameters... Apologies, rant
over.
So: there are a variety of memory manipulating function this checker models. One
aspect of these functions is their AllocationFamily, which we use to distinguish
between allocation kinds, like using free() on an object allocated by operator
new. However, since we always know which function we're actually modeling, in
fact we know it compile time, there is no need to use tricks to retrieve this
information out of thin air n+1 function calls down the line. This patch changes
many methods of MallocChecker to take a non-optional AllocationFamily template
parameter (which also makes stack dumps a bit nicer!), and removes some no
longer needed auxiliary functions.
Differential Revision: https://reviews.llvm.org/D68162
While the value of the CIE pointer field in a DWARF FDE record is
an offset to the corresponding CIE record from the beginning of
the section, for EH FDE records it is relative to the current offset.
Previously, we did not make that distinction when dumped both kinds
of FDE records and just printed the same value for the CIE pointer
field and the CIE offset; that was acceptable for DWARF FDEs but was
wrong for EH FDEs.
This patch fixes the issue by explicitly printing the offset of the
linked CIE object.
Differential Revision: https://reviews.llvm.org/D74613
These can come out nondeterministically for two reasons:
- sorting based on ConstStringified pointer values
- different relative speeds of the indexing threads
Making these nondeterministic without incurring performance penalties is
hard, so I just make the test expect them in any order (the order is not
important in this test anyway.
Seems like this code raised some alarm bells as it looks like an ArrayRef
to a temporary initializer list, but it's actually just calling the ArrayRef(T*, T*)
constructor. Let's clarify this and directly call the right ArrayRef constructor here.
Fixes rdar://problem/59176052
Summary:
This is a quality of life change to make it a little nicer to look at, NFC.
This patch makes the RUN and OK lines green and FAILED lines red to match gtest's output.
Reviewers: sivachandra, gchatelet, PaulkaToast
Reviewed By: gchatelet
Subscribers: MaskRay, tschuett, libc-commits
Differential Revision: https://reviews.llvm.org/D75103
Instead add it when we make the machine nodes during instruction
selections.
This makes this ISD node closer to ISD::MGATHER. Trying to see
if we remove the X86 specific ones.
Summary:
Acts on `BinaryOperator` and `UnaryOperator` and functions the same as `anyOf(hasOperatorName(...), hasOperatorName(...), ...)`
Documentation generation isn't perfect but I feel that the python doc script needs updating for that
Reviewers: aaron.ballman, gribozavr2
Reviewed By: aaron.ballman, gribozavr2
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75040
Summary:
I added an `abort()` call to some code and noticed that the test suite was still passing and it just marked my test as "UNSUPPORTED".
It seems the reason for that is that we expect failing tests to print "FAIL:" which doesn't happen when we crash. If we then also
have an unsupported because we skipped some debug information in the output, we just mark the test passing because it is unsupported
on the current platform.
This patch marks any test that has a non-zero exit code as failing even if it doesn't print "FAIL:" (e.g., because it crashed).
Reviewers: labath, JDevlieghere
Reviewed By: labath, JDevlieghere
Subscribers: aprantl, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D75031
These are technically text files, but the object file layer treats them
as binary, and the relevant tests verify the parsed contents byte for
byte. Git's crlf conversion can make those tests fail. Marking the files
as non-text disables that.
Order of evaluation of the operands of any C++ operator [...] is
unspecified. This patch fixes the issue in Stream::Indent by calling the
function consecutively.
On my Windows setup, TestSettings.py fails because the function prints
the value first, followed by the indentation.
Expected result:
MY_FILE=this is a file name with spaces.txt
Actual result:
MY_FILE =this is a file name with spaces.txt
The current set of custom combines are only really useful after
legalization, so move them there. There is a lot of overlap in the
boilerplate here, but I think we do want a pretty different set of
combines before and after legalize. I think we will want a lot of
overlap between the post-legalize and a post-regbankselect combiner.
The aarcht64-ubuntu bot is showing a test failure in TestHandleAbort.py
with this patch. Adding some logging to that file, it looks like
the saved register context above the trap handler does not have
save state for $pc, but it does have it for $lr on that platform.
I need to fall back to looking for $lr if the $pc cannot be retrieved.
I'll update the patch and re-commit once that's fixed.
This reverts commit edc4f4c9c9.
MachineVerifier still takes 45-50% of total compile time with
-verify-machineinstrs, with calcRegsPassed dataflow taking ~50-60% of
MachineVerifier.
The majority of that time is spent in BBInfo::addPassed, mostly within
DenseSet implementing the sets the dataflow is operating over.
In particular, 1/4 of that DenseSet time is spent just iterating over it
(operator++), 40-50% on insertions, and most of the rest in ::count.
Given that, we're implementing custom sets just for this analysis here,
focusing on cheap insertions and O(n) iteration time (as opposed to
O(U), where U is the universe).
As it's based _mostly_ on BitVector for sparse and SmallVector for
dense, it may remotely resemble SparseSet. The difference is, our
solution is a lot less clever, doesn't have constant time `clear` that
we won't use anyway as reusing these sets across analyses is cumbersome,
and thus more space efficient and safer (got a resizable Universe and a
fallback to DenseSet for sparse if it gets too big).
With this patch MachineVerifier gets ~15-20% faster, its contribution to
total compile time drops from 45-50% to ~35%, while contribution of
calcRegsPassed to MachineVerifier drops from 50-60% to ~35% as well.
calcRegsPassed itself gets another 2x faster here.
All measured on a large suite of shaders targeting a number of GPUs.
Reviewers: bogner, stoklund, rudkx, qcolombet
Reviewed By: rudkx
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75033