Summary:
Add DynamicMemRefType which can reference one of the statically ranked StridedMemRefType or a UnrankedMemRefType so that runner utils only need to be implemented once.
There is definitely room for more clean up and unification, but I will keep that for follow-ups.
Reviewers: nicolasvasilache
Reviewed By: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80513
This revision adds the additional lowering and exposes the patterns at a finer granularity for better programmatic reuse. The unit test makes use of the finer grained pattern for simpler checks.
As the ContractionOpLowering is exposed programmatically, cleanup opportunities appear and static class methods are turned into free functions with static visibility.
Differential Revision: https://reviews.llvm.org/D80375
Summary:
Currently, all changes returned by a single application of a rule must fit in
one atomic change and therefore must apply to one file. However, there are
patterns in which a single rule will want to modify multiple files; for example,
a header and implementation to change a declaration and its definition. This
patch relaxes Transformer, libTooling's interpreter of RewriteRules, to support
multiple changes.
Reviewers: gribozavr
Subscribers: mgrang, jfb, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80239
We currently diagnose static data members directly contained in unnamed classes,
but we should also diagnose when they're in a class that is nested (directly or
indirectly) in an unnamed class. Do this by iterating up the list of parent
DeclContexts and checking if any is an unnamed class.
Similarly also check for function or method DeclContexts (which includes things
like blocks and openmp captured statements) as then the class is considered to
be a local class, which means static data members aren't allowed.
Differential Revision: https://reviews.llvm.org/D80295
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.
Differential Revision: https://reviews.llvm.org/D75670
binop (splat X), (splat C) --> splat (binop X, C)
binop (splat C), (splat X) --> splat (binop C, X)
We do this in IR, and there's a similar fold for the case with 2
non-constant operands just above the code diff in this patch.
This was discussed in D79718, and the extra shuffle in the test
(llvm/test/CodeGen/X86/vector-fshl-128.ll::sink_splatvar) where it
was noticed disappears because demanded elements analysis is no
longer blocked. The large majority of the test diffs seem to be
benign code scheduling changes, but I do see another type of win:
moving the splat later allows binop narrowing in some cases.
Regressions were avoided on x86 and ARM with the INSERT_VECTOR_ELT
restriction.
Differential Revision: https://reviews.llvm.org/D79886
Summary: This is a NFC, it aims at simplifying both the code and build files.
Reviewers: abrachet, sivachandra
Subscribers: mgorny, tschuett, ecnelises, libc-commits, courbet
Tags: #libc-project
Differential Revision: https://reviews.llvm.org/D80291
Last part of recommitting 'Unify Intrinsic Costs'
259eb619ff. This patch now uses
getUserCost from getInstructionThroughput.
Differential Revision: https://reviews.llvm.org/D80012
As per http://lists.llvm.org/pipermail/cfe-dev/2019-August/063215.html, lets get rid of this option.
It presents 2 issues that have bugged me for years now:
* OSObject is NOT a boolean option. It in fact has 3 states:
* osx.OSObjectRetainCount is enabled but OSObject it set to false: RetainCount
regards the option as disabled.
* sx.OSObjectRetainCount is enabled and OSObject it set to true: RetainCount
regards the option as enabled.
* osx.OSObjectRetainCount is disabled: RetainCount regards the option as
disabled.
* The hack involves directly modifying AnalyzerOptions::ConfigTable, which
shouldn't even be public in the first place.
This still isn't really ideal, because it would be better to preserve the option
and remove the checker (we want visible checkers to be associated with
diagnostics, and hidden options like this one to be associated with changing how
the modeling is done), but backwards compatibility is an issue.
Differential Revision: https://reviews.llvm.org/D78097
Add the remaining cast instruction opcodes to the base implementation
of getUserCost and directly return the result. This allows
getInstructionThroughput to return getUserCost for the casts. This
has required changes to PPC and SystemZ because they implement
getUserCost and/or getCastInstrCost with adjustments for vector
operations. Adjusts have also been made in the remaining backends
that implement the method so that they still produce a cost of zero
or one for cost kinds other than throughput.
Differential Revision: https://reviews.llvm.org/D79848
After D80096, bots that build clang for distribution and that can't use
system gcc / libstdc++ need to pass a working rpath so that unit test
binaries can run. The method suggested in GettingStarted.rst works fine
for local development, but it results in an absolute local rpath ending
up even in distributed binaries like clang, which is both ugly and
unnecessary.
Add an explicit toggle that can be used to add an rpath only for the
non-distributed binaries that need it.
Differential Revision: https://reviews.llvm.org/D80534
Summary:
Clean-up code around mem ops clustering logic. This patch cleans up code within
the function clusterNeighboringMemOps(). It is WIP, and this patch is a first cut.
Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar
Reviewed By: foad
Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80119
Show failure to reduce the signbit extraction for 256-bit integer vectors on AVX1 targets where the pcmpgt/ashr has to be done with split 128-bit vectors.
A CIE with the Length == 0 is a terminator:
https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html
And GNU objdump recognizes them and prints the following for such entries:
"00000000 ZERO terminator"
This patch teaches llvm-objdump to do the same. I had to update tests to use
"CHECK-NEXT" too.
(Note: it looks perhaps not right that printing is done inside the DebugInfo library,
I'd expect to see the change in the llvm-objdump's code somewhere instead,
but that is how it done atm).
Differential revision: https://reviews.llvm.org/D80476
Each `ELFYAML::ProgramHeader` currently contains a list of section names
included. We are trying to map them to Fill/Sections very late,
though we can create such mapping early, in `initProgramHeaders`.
The benefit is that with such change it is possible to access mapped
chunks earlier (for example during writing section content) and have
simpler code.
Differential revision: https://reviews.llvm.org/D80520
I've noticed an issue with "Data.getRelocatedValue(...)" call.
it might silently ignore an error when a content is truncated.
That leads to an infinite loop in the code (e.g. llvm-readobj hangs).
After fixing the issue I've found that actually we always tried
to read past the end of a section, even when a content was valid.
It happened because the terminator CIE (a CIE with the length == 0)
was never handled. At first I've tried just to stop adding the terminator
entry (and return), but it does not seem to be correct, because tools like
llvm-objdump might want to print something for such entries
(see comments in the code and test cases).
This patch fixes issues mentioned, provides new test cases for
both llvm-readobj and lib/DebugInfo and adds FIXMEs to existent
test cases related.
Differential revision: https://reviews.llvm.org/D80299
Summary:
During CodeGen for AArch64 Neon intrinsics, Clang was incorrectly
assuming all the pointers from which loads were being generated for vld1
intrinsics were aligned according to the intrinsics result type, causing
alignment faults on the code generated by the backend.
This patch updates vld1 intrinsics' CodeGen to properly capture the
correct load alignment based on the type of the pointer provided as
input for the intrinsic.
Reviewers: t.p.northover, ostannard, pcc
Reviewed By: ostannard
Subscribers: kristof.beyls, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79721
Recommitting most of the remaining changes from
259eb619ff, but excluding the call to
getUserCost from getInstructionThroughput. Though there's still no
test changes, I doubt that this is an NFC...
With the two getIntrinsicInstrCosts folded into one, now fold in the
scalar/code-size orientated getIntrinsicCost. The remaining scalar
intrinsics were memcpy, cttz and ctlz which now have special handling
in the BasicTTI implementation.
This had required a change in the AMDGPU backend for fabs as it
should always be 'free'. I've also changed the X86 backend to return
the BaseT implementation when the CostKind isn't RecipThroughput.
Differential Revision: https://reviews.llvm.org/D80012
Summary:
We already skip function bodies from these files while parsing, and drop symbols
found in them. However, traversing their ASTs still takes a substantial amount
of time.
Non-scientific benchmark on my machine:
background-indexing llvm-project (llvm+clang+clang-tools-extra), wall time
before: 7:46
after: 5:13
change: -33%
Indexer.cpp libclang should be updated too, I'm less familiar with that code,
and it's doing tricky things with the ShouldSkipFunctionBody callback, so it
needs to be done separately.
Reviewers: kadircet
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80296
Recommitting part of "[CostModel] Unify Intrinsic Costs."
de71def3f5
Now that the 'free' intrinsic information has been sunk to the lowest
level, query the base implementation in BasicTTI before doing
anything else. I suspect this is the change that was causing the main
changes, particularly the large effects on debug builds.
Differential Revision: https://reviews.llvm.org/D80012
-fno-PIC and -fPIE code generally cannot be linked in -shared mode and there is no benefit accessing via local aliases.
Actually, a .Lfoo$local reference will be converted to a STT_SECTION (if no section relaxation) reference which will cause the section symbol (sizeof(Elf64_Sym)=24) to be generated.
This change makes minor correction to the implementation of intrinsic
`llvm.flt.rounds`:
- Added documentation entry in LangRef,
- Attributes of the intrinsic changed to be in line with other functions
dependent of floating-point environment.
Differential Revision: https://reviews.llvm.org/D79322
Summary:
Lexing until the token location is past preamble bound could be wrong
in some cases as preprocessor lexer can lex multiple tokens in a single call.
Reviewers: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79426
Summary:
Constructors can have implicit initializers, this was crashing define
outline. Make sure we find the first "written" ctor initializer to figure out
`:` location.
Fixes https://github.com/clangd/clangd/issues/400
Reviewers: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80521
-fno-semantic-interposition is currently the CC1 default. (The opposite
disables some interprocedural optimizations.) However, it does not infer
dso_local: on most targets accesses to ExternalLinkage functions/variables
defined in the current module still need PLT/GOT.
This patch makes explicit -fno-semantic-interposition infer dso_local,
so that PLT/GOT can be eliminated if targets implement local aliases
for AsmPrinter::getSymbolPreferLocal (currently only x86).
Currently we check whether the module flag "SemanticInterposition" is 0.
If yes, infer dso_local. In the future, we can infer dso_local unless
"SemanticInterposition" is 1: frontends other than clang will also
benefit from the optimization if they don't bother setting the flag.
(There will be risks if they do want ELF interposition: they need to set
"SemanticInterposition" to 1.)
As reported in https://bugs.llvm.org/show_bug.cgi?id=45709 we can hit an
infinite loop in legalization since we set the legalization action for
ISD::SELECT_CC for all fixed length vector types to Promote. Without some
different legalization action for the type being promoted to, the legalizer
simply loops. Since we don't have patterns to match the node, the right
legalization action should be Expand.
Differential revision: https://reviews.llvm.org/D79854
Summary:
This patch sets inline-deferral-scale to 2.
Both internal and SPEC benchmarking show that 2 is the best number
among -1, 2, 3, and 4.
inline-deferral-scale SPECint2006
------------------------------------------------------------
-1 38.0 (the default without this patch)
2 38.5
3 38.1
4 38.1
With the new default number, shouldBeDeferred returns true if:
TotalCost < IC.getCost() * 2
where
TotalCost is TotalSecondaryCost + IC.getCost() * NumCallerUsers.
If TotalCost >= 0 and NumCallerUsers >= 2, then
TotalCost >= IC.getCost() * 2, so shouldBeDeferred returns true only
when NumCallerUsers is 1.
Now, if TotalSecondaryCost < 0, which can happen if
InlineConstants::LastCallToStaticBonus, a huge number, has been
subtracted from TotalSecondaryCost, then TotalCost may be negative.
In this case, shouldBeDeferred may return true even when
NumCallerUsers >= 2.
Reviewers: davidxl, nikic
Reviewed By: davidxl
Subscribers: xbolva00, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80229
Summary: Update status page and test synopsis. Add synopsis in <cmath>.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D80456
Fix combineSubToSubus to handle the new DAG to avoid a regression.
There are still regressions in test14/test15/test16. Where it
looks like were trying to set up cases we could match to
umin+trunc+subus but the handling was never finished. The
regression here isn't unique to sub. Its a lost opportunity for
taking an AND with two truncated inputs and producing a larger
AND with a single truncate. The same thing could happen with
any other node we handle in combineTruncatedArithmetic since we
are moving the truncate up the DAG.
Differential Revision: https://reviews.llvm.org/D80483