This patch makes the folding of or(A, B) into not(and(not(A), not(B)))
more agressive for I1 vector. This only affects Thumb2 MVE and improves
codegen, because it removes a lot of msr/mrs instructions on VPR.P0.
This patch also adds a xor(vcmp) -> !vcmp fold for MVE.
Differential Revision: https://reviews.llvm.org/D77202
This patch adds an implementation of PerformVSELECTCombine in the
ARM DAG Combiner that transforms vselect(not(cond), lhs, rhs) into
vselect(cond, rhs, lhs).
Normally, this should be done by the target-independent DAG Combiner,
but it doesn't handle the kind of constants that we generate, so we
have to reimplement it here.
Differential Revision: https://reviews.llvm.org/D77712
Summary:
I have fixed several places in getSplatSourceVector and isSplatValue
to work correctly with scalable vectors. I added new support for
the ISD::SPLAT_VECTOR DAG node as one of the obvious cases we can
support with scalable vectors. In other places I have tried to do
the sensible thing, such as bail out for vector types we don't yet
support or don't intend to support.
It's not possible to add IR test cases to cover these changes, since
they are currently only ever exercised on certain targets, e.g.
only X86 targets use the result of getSplatSourceVector. I've
assumed that X86 tests already exist to test these code paths for
fixed vectors. However, I have added some AArch64 unit tests that
test the specific functions I have changed.
Differential revision: https://reviews.llvm.org/D79083
Summary:
That unless the user requested an output object (--lto-obj-path), the an
unused empty combined module is not emitted.
This changed is helpful for some target (ex. RISCV-V) which encoded the
ABI info in IR module flags (target-abi). Empty unused module has no ABI
info so the linker would get the linking error during merging
incompatible ABIs.
Reviewers: tejohnson, espindola, MaskRay
Subscribers: emaste, inglorion, arichardson, hiraditya, simoncook, MaskRay, steven_wu, dexonsmith, PkmX, dang, lenary, s.egerton, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78988
Refering to the link order of a dylib better matches the terminology used in
static compilation. As upcoming patches will increase the number of places where
link order matters (for example when closing JITDylibs) it's better to get this
name change out of the way early.
This reverts commit fb5fd74685.
Re-instates commit 53913a65b4
The fix is to trim off trailing separators, as in `/foo/bar/` and
produce `/foo/bar`. VFS tests rely on this. I added unit tests for
remove_dots.
Register live ranges may have had gaps that after coalescing should be
removed. This is done by adding a new segment to the range, and merging
it with neighboring segments. When doing so, do not assume that each
subrange of the register ended at the same index. If a subrange ended
earlier, adding this segment could make the live range invalid.
Instead, if the subrange is not live at the start of the segment,
extend it first.
Summary:
Constrain which metadata nodes are allowed to be, or contain,
DILocations. This ensures that logic for updating DILocations in a
Module is complete.
Currently, !llvm.loop metadata is the only odd duck which contains
nested DILocations. This has caused problems in the past: some passes
forgot to visit the nested locations, leading to subtly broken debug
info and late verification failures.
If there's a compelling reason for some future metadata to nest
DILocations, we'll need to introduce a generic API for updating the
locations attached to an Instruction before relaxing this check.
Reviewers: aprantl, dsanders
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79245
Today symbol names generated for machine basic block sections use a
unary encoding to reduce bloat. This is essential when every basic block
in the binary is assigned a symbol however with basic block clusters
(rG05192e585ce175b55f2a26b83b4ed7882785c8e6) when we only need to
generate a few non-temporary symbols we can assign more descriptive
names making them more user friendly. With this change -
Cold cluster section for function foo is named "foo.cold"
Exception cluster section for function foo is named "foo.eh"
Other cluster sections identified by their ids are named "foo.ID"
Using this format works well with existing tools. It will demangle as
expected and works with existing symbolizers, profilers and debuggers
out of the box.
$ c++filt _Z3foov.cold
foo() [clone .cold]
$ c++filt _Z3foov.eh
foo() [clone .eh]
$c++filt _Z3foov.1234
foo() [clone 1234]
Tests for basicblock-sections are updated with some cleanup where
appropriate.
Differential Revision: https://reviews.llvm.org/D79221
Before this patch, global variables didn't have their namespace prepended in the Codeview debug symbol stream. This prevented Visual Studio from displaying them in the debugger (they appeared as 'unspecified error')
Differential Revision: https://reviews.llvm.org/D79028
Summary:
Moving these function initializations into separate functions makes it easier
to read the runOnModule function. There is also precedent in the sanitizer code:
asan has a function ModuleAddressSanitizer::initializeCallbacks(Module &M). I
thought it made sense to break the initializations into two sets. One for the
compiler runtime functions and one for the event callbacks.
Tested with: check-all
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D79307
This a hack to fix illegal 32 to 16 bit copies.
The problem is when we make 16 bit subregs legal it creates
a huge amount of failures which can only be resolved at once
without a temporary hack like this.
The next step is to change operands, instruction definitions
and patterns until this hack is not needed.
Differential Revision: https://reviews.llvm.org/D79119
Summary:
Remove invalid usage of VectorType::getNumElements in
ShuffleVectorInst::isValidOperands identified by test case
llvm::Analysis/ConstantFolding/vscale-shufflevector.ll. The tested
conditions hold for both fixed width and scalable vectors; use
getElementCount().
Reviewers: efriedma, sdesmalen, c-rhodes, spatel
Reviewed By: sdesmalen
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79212
We allocated a suitably aligned frame index so we know that all the values
have ABI alignment.
For MIPS this avoids using pair of lwl + lwr instructions instead of a
single lw. I found this when compiling CHERI pure capability code where
we can't use the lwl/lwr unaligned loads/stores and and were to falling
back to a byte load + shift + or sequence.
This should save a few instructions for MIPS and possibly other backends
that don't have fast unaligned loads/stores.
It also improves code generation for CodeGen/X86/pr34653.ll and
CodeGen/WebAssembly/offset.ll since they can now use aligned loads.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78999
Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence.
Reviewers: rampitec, arsenm, vpykhtin
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78091
Summary:
This patch adds AArch64ISD nodes for [S|U]MIN_PRED
and [S|U]MAX_PRED, and lowers both SVE intrinsics and
IR operations for min and max to these nodes.
There are two forms of these instructions for SVE: a predicated
form and an immediate (unpredicated) form. The patterns
which existed for the latter have been updated to match a
predicated node with an immediate and map this
to the immediate instruction.
Reviewers: sdesmalen, efriedma, dancgr, rengolin
Reviewed By: efriedma
Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79087
optimizePow does not create any new calls to pow, so it should work
regardless of whether the pow library function is available. This allows
it to optimize the llvm.pow intrinsic on targets with no math library.
Based on a patch by Tim Renouf.
Differential Revision: https://reviews.llvm.org/D68231
We haven't promoted AND/OR/XOR to vXi64 types for a while. So
there's no reason to use isOperationLegalOrPromote. So we can
just use isOperationLegal by merging with ADD handling.
Default legalization will create two v8i64 truncs to v8i32, concat
them to v16i32, and then truncate the rest of the way to v16i8.
Instead we can truncate directly from v8i64 to v8i8 in the lower
half of an xmm. Then concat the two halves to use vpunpcklqdq.
This is the same number of uops, but the dependency chain through
the uops is better since the halves are merged at the end.
I had to had SimplifyDemandedBits support for VTRUNC to prevent
a regression on vector-trunc-math.ll. combineTruncatedArithmetic
no longer gets a chance to shrink vXi64 mul so we were producing
the v8i64 multiply sequence using multiple PMULUDQs. With the
demanded bits fix we are able to prune out the extra ops leaving
just two PMULUDQs, one for each v8i64 half. This is twice the
width of the 2 v8i32 PMULLDs we had before, but PMULUDQ is 1
uop and PMULLD is 2. We also save some truncates. It's probably
worth using PMULUDQ even when PMULLQ is available since the latter
is 3 uops, but that will require a different change.
Differential Revision: https://reviews.llvm.org/D79231
Before we eagerly put dependences into the QueryMap as soon as we
encountered them (via `Attributor::getAAFor<>` or
`Attributor::recordDependence`). Now we will wait to see if the
dependence is useful, that is if the target is not already in a fixpoint
state at the end of the update. If so, there is no need to record the
dependence at all.
Due to the abstraction via `Attributor::updateAA` we will now also treat
the very first update (during attribute creation) as we do subsequent
updates.
Finally this resolves the problematic usage of QueriedNonFixAA.
---
Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):
Before:
```
calls to allocation functions: 554675 (389245/s)
temporary memory allocations: 101574 (71280/s)
peak heap memory consumption: 28.46MB
peak RSS (including heaptrack overhead): 116.26MB
total memory leaked: 269.10KB
```
After:
```
calls to allocation functions: 512465 (345559/s)
temporary memory allocations: 98832 (66643/s)
peak heap memory consumption: 22.54MB
peak RSS (including heaptrack overhead): 106.58MB
total memory leaked: 269.10KB
```
Difference:
```
calls to allocation functions: -42210 (-727758/s)
temporary memory allocations: -2742 (-47275/s)
peak heap memory consumption: -5.92MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
Attributes that only depend on the value (=bit pattern) can be
initialized from uses in the must-be-executed-context (MBEC). We did use
`AAComposeTwoGenericDeduction` and `AAFromMustBeExecutedContext` before
to do this for some positions of these attributes but not for all. This
was fairly complicated and also problematic as we did run it in every
`updateImpl` call even though we only use known information. The new
implementation removes `AAComposeTwoGenericDeduction`* and
`AAFromMustBeExecutedContext` in favor of a simple interface
`AddInformation::fromMBEContext(...)` which we call from the
`initialize` methods of the "value attribute" `Impl` classes, e.g.
`AANonNullImpl:initialize`.
There can be two types of test changes:
1) Artifacts were we miss some information that was known before a
global fixpoint was reached and therefore available in an update
but not at the beginning.
2) Deduction for values we did not derive via the MBEC before or which
were not found as the `AAFromMustBeExecutedContext::updateImpl` was
never invoked.
* An improved version of AAComposeTwoGenericDeduction can be found in
D78718. Once we find a new use case that implementation will be able
to handle "generic" AAs better.
---
Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):
Before:
```
calls to allocation functions: 468428 (328952/s)
temporary memory allocations: 77480 (54410/s)
peak heap memory consumption: 32.71MB
peak RSS (including heaptrack overhead): 122.46MB
total memory leaked: 269.10KB
```
After:
```
calls to allocation functions: 554720 (351310/s)
temporary memory allocations: 101650 (64376/s)
peak heap memory consumption: 28.46MB
peak RSS (including heaptrack overhead): 116.75MB
total memory leaked: 269.10KB
```
Difference:
```
calls to allocation functions: 86292 (556722/s)
temporary memory allocations: 24170 (155935/s)
peak heap memory consumption: -4.25MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D78719
If SimplifyInstruction() does not succeed in simplifying the
instruction, it will compute the known bits of the instruction
in the hope that all bits are known and the instruction can be
folded to a constant. I have removed a similar optimization
from InstCombine in D75801, and would like to drop this one as well.
On average, we spend ~1% of total compile-time performing this
known bits calculation. However, if we introduce some additional
statistics for known bits computations and how many of them succeed
in simplifying the instruction we get (on test-suite):
instsimplify.NumKnownBits: 216
instsimplify.NumKnownBitsComputed: 13828375
valuetracking.NumKnownBitsComputed: 45860806
Out of ~14M known bits calculations (accounting for approximately
one third of all known bits calculations), only 0.0015% succeed in
producing a constant. Those cases where we do succeed to compute
all known bits will get folded by other passes like InstCombine
later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test
show a hash difference after this change. On lencod we see an
improvement (a loop phi is optimized away), on the GCC torture
test a regression (a function return value is determined only
after IPSCCP, preventing propagation from a noinline function.)
There are various regressions in InstSimplify tests. However, all
of these cases are already handled by InstCombine, and corresponding
tests have already been added there.
Differential Revision: https://reviews.llvm.org/D79294
Summary:
This factors cost and reporting out of the inlining workflow, thus
making it easier to reuse when driving inlining from the upcoming
InliningAdvisor.
Depends on: D79215
Reviewers: davidxl, echristo
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79275
Since every AbstractAttribute so far, and for the foreseeable future,
corresponds to a single IRPosition we can simplify the class structure.
We already did this for IRAttribute but there is no reason to stop
there.
LLD calls this on every source file string in every object file when
writing PDBs, so it is somewhat hot.
Avoid rewriting paths that do not contain path traversal components
(./..). Use find_first_not_of(separators) directly instead of using the
path iterators. The path component iterators appear to be slow, and
directly searching for slashes makes it easier to find double separators
that need to be canonicalized.
I discovered that the VFS relies on remote_dots to not canonicalize
early slashes (/foo or C:/foo) on Windows, so I had to leave that
behavior behind with unit tests for it. This is undesirable, but I claim
that my change is NFC.
Summary:
Current implementation of DWARFDie::getName(DINameKind Kind) could
lead to double call to DWARFDie::find(DW_AT_name) in following
scenario:
getName(LinkageName);
getName(ShortName);
getName(LinkageName) calls find(DW_AT_name) if linkage name is not
found. Then, it is called again in getName(ShortName). This patch
alows to request LinkageName and ShortName separately
to avoid extra call to find(DW_AT_name).
It helps D74169 to parse clang debuginfo faster(~1%).
Reviewers: clayborg, dblaikie
Differential Revision: https://reviews.llvm.org/D79173
The splitVector helper uses extractSubVector which splits build vectors like we do here, so avoid reimplementing it.
splitVector could easily be extended to peek through bitcasts as well but I'd prefer to keep this commit NFC.
The number of public symbols is very large, and each deserialization
does a few heap allocations. The public symbols are serialized by the
linker, so we can assume they have the expected layout and use it
directly.
Saves O(#publics) temporary heap allocations and shrinks some data
structures.
This accounts for a large portion of the memory allocations in LLD.
This DebugSubsectionRecordBuilder object can be stored directly in
C13Builders, it mostly wraps other subsections.
Remove the container kind field from the object. It is always the same
for all elements in the vector, and we can pass it in during writing.
The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces.
The technique employed by `ExpandLoad` is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines.
Differential Revision: https://reviews.llvm.org/D79096
rL368553 added SimplifyMultipleUseDemandedBits handling for ISD::TRUNCATE to SimplifyDemandedBits so we don't need to duplicate this (and it gets rid of another GetDemandedBits call which is slowly being replaced with SimplifyMultipleUseDemandedBits anyhow).
This generalizes the main Windows command line tokenizer to be able to
produce StringRef substrings as well as freshly copied C strings. The
implementation is still shared with the normal tokenizer, which is
important, because we have unit tests for that.
.drective sections can be very long. They can potentially list up to
every symbol in the object file by name. It is worth avoiding these
string copies.
This saves a lot of memory when linking chrome.dll with PGO
instrumentation:
BEFORE AFTER % IMP
peak memory: 6657.76MB 4983.54MB -25%
real: 4m30.875s 2m26.250s -46%
The time improvement may not be real, my machine was noisy while running
this, but that the peak memory usage improvement should be real.
This change may also help apps that heavily use dllexport annotations,
because those also use linker directives in object files. Apps that do
not use many directives are unlikely to be affected.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D79262
Summary:
The current lowering of `select` on RISC-V uses a branch instruction to load a
register with one or other value. This is inefficient, especially in the case of
small constants that can be computed easily.
By implementing the TargetLowering::convertSelectOfConstantsToMath hook, some of
the simpler cases are covered that let us avoid introducing a branch in these
cases.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D79260
Summary:
This patch addresses some weird assembly sequences we were seeing during
comparing floats. In particular, comparing a float to itself tells you whether
it is NaN or not, which we were doing correctly, but with an extra unneeded
`and` instruction.
This patch specialises the existing patterns to remove the `and` instructions
when both their operands are the same.
Reviewed By: luismarques, asb
Differential Revision: https://reviews.llvm.org/D78908
When the shufflevector mask operand was converted into special
instruction data, the FunctionComparator was not updated to
account for this. As such, MergeFuncs will happily merge
shufflevectors with different masks.
This fixes https://bugs.llvm.org/show_bug.cgi?id=45773.
Differential Revision: https://reviews.llvm.org/D79261
Summary:
In D77860, we have changed `getSymbolFlags()` return type to `Expected<uint32_t>`.
This change helps bubble the error further up the stack.
Reviewers: jhenderson, grimar, JDevlieghere, MaskRay
Reviewed By: jhenderson
Subscribers: hiraditya, MaskRay, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79075
Summary:
As described in https://github.com/WebAssembly/simd/pull/209. This is
the final reorganization of the SIMD opcode space before
standardization. It has been landed in concert with corresponding
changes in other projects in the WebAssembly SIMD ecosystem.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79224
Over time, we have made many additions to this file and it has frankly become a
bit of a mess. This has led to at least one issue - we have a number of
instructions where the side effects flag should be set to false and we neglected
to do this. This patch suggests a refactoring that should make the file much
more maintainable. The file is split up into major sections and the nesting
level is reduced, predicate blocks merged, etc.
Sections:
- Custom PPCISD node definitions
- Predicate definitions
- Instruction formats
- Instruction definitions
- Helper DAG definitions
- Anonymous patterns
- Instruction aliases
Differential revision: https://reviews.llvm.org/D78132
Summary:
shouldInline makes a decision based on the InlineCost of a call site, as
well as an evaluation on whether the site should be deferred. This means
it's possible for the decision to be not to inline, even for an
InlineCost that would otherwise allow it.
Both uses of shouldInline performed the exact same logic after calling
it. In addition, the decision on whether to inline or not was
communicated through two values of the Option<InlineCost> return value:
None, or an InlineCost evaluating to false.
Simplified by:
- encapsulating the decision in the return object. The bool it evaluates
to communicates unambiguously the decision. The InlineCost is also
available.
- encapsulated the common post-shouldInline code into shouldInline.
Reviewers: davidxl, echristo, eraman
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79215
Summary:
Have stripNonLineTableDebugInfo() attach updated !llvm.loop metadata to
an instruction (instead of updating and then discarding the metadata).
This fixes "!dbg attachment points at wrong subprogram for function"
errors seen while archiving an iOS app.
It would be nice -- as a follow-up -- to catch this issue earlier,
perhaps by modifying the verifier to constrain where DILocations are
allowed. Any alternative suggestions appreciated.
rdar://61982466
Reviewers: aprantl, dsanders
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79200
Summary:
Make foldVectorBinop return null if the instruction type is a scalable
vector. It is unclear what, if any, of this function works with scalable
vectors.
Identified by test LLVM.Transforms/InstCombine::nsw.ll
Reviewers: efriedma, david-arm, fpetrogalli, spatel
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79196
The more general fold was not poison-safe, so it was removed:
rG5486e00
...but it is ok to have this transform if analysis can determine
the vector contains no poison. The test shows a simple example
of that: constant integer elements are not poison.
This fixes potential reference invalidations, when no lattice value is
assigned for CopyOf. As the state of CopyOf won't change while in
handleCallResult, we can get a copy once and use that.
Should fix PR45749.
PR45481:
https://bugs.llvm.org/show_bug.cgi?id=45481
SDAG has an identical transform to this, so there's little
chance of any real-world impact. OTOH, that means we are
effectively sweeping the bug out of sight because poison
exists in codegen too.
Summary:
If no inputs and no output path are provided, llvm-lib should produce a useful error.
Before this, it would fail by reading from an unitialized StringRef.
Reviewed By: vvereschaka
Differential Revision: https://reviews.llvm.org/D79227
Handle concat_vectors(extract_subvector(broadcast(x)), extract_subvector(broadcast(x))) -> broadcast(x)
To expose this we also need collectConcatOps to recognise the insert_subvector(x, extract_subvector(x, lo), hi) subvector splat pattern
VMEM loads of the same type (sampler vs no sampler) are guaranteed to
write their result registers in order, so there is no need for an
s_waitcnt even if they write to overlapping vgprs.
Differential Revision: https://reviews.llvm.org/D79176
While restoring latency, check if any of the registers of
source instruction is a subregister of the successor instructions
apart from being same register.
The setting of `MCAsmInfo` properties for XCOFF got split between
`MCAsmInfoXCOFF` and `PPCXCOFFMCAsmInfo`. Except for the properties that
are dependent on the target information being passed via the
constructor, the properties being set in `PPCXCOFFMCAsmInfo` had no
fundamental reason for being treated as specific for XCOFF on PowerPC.
Indeed, the property that might be considered more specific to PowerPC,
`NeedsFunctionDescriptors`, was set in `MCAsmInfoXCOFF`.
XCOFF being specific to PowerPC anyway, this patch consolidates the
setting of the properties into `MCAsmInfoXCOFF` except for the cases
that are dependent on the information provided via the
`PPCXCOFFMCAsmInfo` constructor.
This patch also reorders the assignments to the fields to match the
declaration order in `MCAsmInfo`.
This patch adds the x, t and g modifiers for inline asm from GCC. These will print a vector register as xmm*, ymm* or zmm* respectively.
I also fixed register names with modifiers with inteldialect so they are no longer printed with a leading %.
Patch by Amanieu d'Antras
Differential Revision: https://reviews.llvm.org/D78977
Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to.
Differential Revision: https://reviews.llvm.org/D79045
Summary:
This was introduced in dda3c19a36 aka D77621.
The unused template instantiation causes a warning on 32 bit systems
about truncating a uint64_t to 32-bit size_t.
Reviewed By: dblaikie, smeenai
Differential Revision: https://reviews.llvm.org/D79214
In D74183 clang started emitting alignment for sret parameters
unconditionally. This caused a 1.5% compile-time regression on
tramp3d-v4. The reason is that we now generate many instance of IR like
%ptrint = ptrtoint %class.GuardLayers* %guards_m to i64
%maskedptr = and i64 %ptrint, 3
%maskcond = icmp eq i64 %maskedptr, 0
tail call void @llvm.assume(i1 %maskcond)
to preserve the alignment information during inlining. Based on IR
analysis, these assumptions also regress optimization. The attached
phase ordering test case illustrates two issues: One are instruction
count based optimization heuristics, which are affected by the four
additional instructions of the assumption. The other is blocking of
SROA due to ptrtoint casts (PR45763).
We already encountered the same problem in Rust, where we (unlike
Clang) generally prefer to emit alignment information absolutely
everywhere it is available. We were only able to do this after
hardcoding -preserve-alignment-assumptions-during-inlining=false,
because we were seeing significant optimization and compile-time
regressions otherwise.
This patch disables -preserve-alignment-assumptions-during-inlining
by default, because we should not be punishing people for adding
more alignment annotations.
Once the assume bundle work shakes out and we can represent (and use)
alignment assumptions using assume bundles, it should be possible to
re-enable this with reduced overhead.
Differential Revision: https://reviews.llvm.org/D76886
This change add support for defined wasm globals in the .s format,
the MC layer, and wasm-ld
Currently there is no support custom initialization and all wasm
globals are initialized to zero.
Fixes: PR45742
Differential Revision: https://reviews.llvm.org/D79137
Summary:
Add a new option (/llvmlibempty). If passed and llvm-lib does not give an error, it will create a valid output archive even if empty.
By default, llvm-lib mimicks lib.exe: if given no input files, it doesn't create its output file at all. This is incompatible with some build systems, so we add a command-line option to toggle this compatibility behavior.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D78894
vpermw is 2 uops. vpermt2b/vpermt2w are two shuffle uops and a port 015 uop. Weirdly vpermb is a single uop.
This patch bumps the cost to 2 for these operations. Maybe should go to 3 for the vpermt2*, but I've started conservative.
I've also removed a few entries that were now the same as earlier subtargets or that I didn't think we really did. Like I don't think we extend v32i8 to v32i16, shuffle, and then truncate.
Differential Revision: https://reviews.llvm.org/D79148
This pushes the NOT pattern up the DAG to help expose it for further combines (AND->ANDN in particular).
The PSHUFD/MOVDDUP 'splat' cases are the only ones I've seen in the wild so far, we can further generalize if/when we need to.
Every new attribute we add from now on will not be supported in the
raw format, because we ran out of space. Don't bother listing each
affected attribute twice.
The assert checks that every instruction must be annotated by this point while it is not
necessary. If the inlining process was interrupted because the threshold was reached, the rest
of the instructions would not be annotated which triggers the assert.
The added test shows the situation in which it can happen.
This is a recommit as the original commit fail due to the absence of REQUIRES: assert in the test.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D79107
Summary:
Refactor getInterestingMemoryOperands() so that information about the
pointer operand is returned through an array of structures instead of
passing each piece of information separately by-value.
This is in preparation for returning information about multiple pointer
operands from a single instruction.
A side effect is that, instead of repeatedly generating the same
information through isInterestingMemoryAccess(), it is now simply collected
once and then passed around; that's probably more efficient.
HWAddressSanitizer has a bunch of copypasted code from AddressSanitizer,
so these changes have to be duplicated.
This is patch 3/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments
[glider: renamed llvm::InterestingMemoryOperand::Type to OpType to fix
GCC compilation]
Reviewers: kcc, glider
Reviewed By: glider
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77618
Summary:
In both AddressSanitizer and HWAddressSanitizer, we first collect
instructions whose operands should be instrumented and memory intrinsics,
then instrument them. Both during collection and when inserting
instrumentation, they are handled separately.
Collect them separately and instrument them separately. This is a bit
more straightforward, and prepares for collecting operands instead of
instructions in a future patch.
This is patch 2/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments
Reviewers: kcc, glider
Reviewed By: glider
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77617
Summary:
A following commit will split the loop over ToInstrument into two.
To avoid having to duplicate the condition for suppressing instrumentation
sites based on ClDebug{Min,Max}, refactor it out into a new function.
While we're at it, we can also avoid the indirection through
NumInstrumented for setting FunctionModified.
This is patch 1/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments
Reviewers: kcc, glider
Reviewed By: glider
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77616
I couldn't make arc land the changes properly, for some reason they all got
squashed. Reverting them now to land cleanly.
Summary: This reverts commit cfb5f89b62.
Reviewers: kcc, thejh
Subscribers:
Summary:
A following commit will split the loop over ToInstrument into two.
To avoid having to duplicate the condition for suppressing instrumentation
sites based on ClDebug{Min,Max}, refactor it out into a new function.
While we're at it, we can also avoid the indirection through
NumInstrumented for setting FunctionModified.
This is patch 1/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments
Reviewers: kcc, glider
Reviewed By: glider
Subscribers: jfb, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77616
X86 matches several 'shift+xor' funnel shift patterns:
fold (or (srl (srl x1, 1), (xor y, 31)), (shl x0, y)) -> (fshl x0, x1, y)
fold (or (shl (shl x0, 1), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y)
fold (or (shl (add x0, x0), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y)
These patterns are also what we end up with the proposed expansion changes in D77301.
This patch moves these to DAGCombine's generic MatchFunnelPosNeg.
All existing X86 test cases still pass, and we just have a small codegen change in pr32282.ll.
Reviewed By: @spatel
Differential Revision: https://reviews.llvm.org/D78935
Summary:
This patch implements custom floating-point reduction ISD nodes that
have vector results, which are used to lower the following intrinsics:
* llvm.aarch64.sve.fadda
* llvm.aarch64.sve.faddv
* llvm.aarch64.sve.fmaxv
* llvm.aarch64.sve.fmaxnmv
* llvm.aarch64.sve.fminv
* llvm.aarch64.sve.fminnmv
SVE reduction instructions keep their result within a vector register,
with all other bits set to zero.
Changes in this patch were implemented by Paul Walker and Sander de
Smalen.
Reviewers: sdesmalen, efriedma, rengolin
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78723
Summary:
This patch tries to ensure that we do something sensible when
generating code for the ISD::INSERT_VECTOR_ELT DAG node when operating
on scalable vectors. Previously we always returned 'undef' when
inserting an element into an out-of-bounds lane index, whereas now
we only do this for fixed length vectors. For scalable vectors it
is assumed that the backend will do the right thing in the same way
that we have to deal with variable lane indices.
In this patch I have permitted a few basic combinations for scalable
vector types where it makes sense, but in general avoided most cases
for now as they currently require the use of BUILD_VECTOR nodes.
This patch includes tests for all scalable vector types when inserting
into lane 0, but I've only included one or two vector types for other
cases such as variable lane inserts.
Differential Revision: https://reviews.llvm.org/D78992
Summary: There is no need to create BPI explicitly. It should be requested through AM in a normal way.
Reviewers: skatkov
Reviewed By: skatkov
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79080
Summary:
Change std::vector to SmallVector to prevent re-allocations and to
have small pre-allocated storage.
Reviewers: clayborg, dblaikie
Differential Revision: https://reviews.llvm.org/D79123
The generic implementation is actually specific to x86. It assumes the
offset is relative to the end of the instruction and the immediate is
not scaled (which is false on most RISC).
Summary:
By design 'BranchProbabilityInfo:: getEdgeProbability(const BasicBlock *Src, const BasicBlock *Dst) const' should return sum of probabilities over all edges from Src to Dst. Current implementation is buggy and returns 1/num_of_successors if probabilities are not explicitly set.
Note current implementation of BPI printing has an issue as well and annotates each edge with sum of probabilities over all ages from one basic block to another. That's why 30% probability reported (instead of 10%) in the lit test. This is not urgent issue since only printing is affected.
Note also current implementation assumes that either all or none edges have probabilities set. This is not the only place which uses such assumption. At least we should assert that in verifier. In addition we can think on a more robust API of BPI which would prevent situations.
Reviewers: skatkov, yrouban, taewookoh
Reviewed By: skatkov
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79071
Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager.
Reviewers: skatkov, taewookoh, yrouban
Reviewed By: skatkov
Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78987
Prior to D69446 I had done some NFC cleanup to make landing an iterative
outliner a cleaner more straight-forward patch. Since then, it seems that has
landed but I noticed some ways it could be cleaned up. Specifically:
1) doOutline was meant to be the re-runable function, but instead
runOnceOnModule was created that just calls doOutline.
2) In D69446 we discussed that the flag allowing the re-run of the
outliner should be a flag to tell how many additional times to run
the outliner again, not the total number of times. I don't think it
makes sense to introduce a flag, but print an error if the flag is
set to 0.
This is an NFCi, the i being that I get rid of the way that the
machine-outline-runs flag could be used to tell the outliner to not run
at all, and because I renamed the flag to '-machine-outliner-reruns'.
Differential Revision: https://reviews.llvm.org/D79070
Summary:
Renamed 'CS' to 'CB', and, in one case, to a more specific name to avoid
naming collision with outer scope (a maintainability/readability reason,
not correctness)
Also updated comments.
Reviewers: davidxl, dblaikie, jdoerfert
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79101
This section is the remnant of how this code was structured before
we made v32i16/v64i8 legal types with avx512f when not restricting
to 256 bit vectors. Now that there are just a few items left,
merge them near similar things in the other section.
Summary:
A valid DominatorTree is needed to do PhiTranslation.
Before this patch, a MemoryUse could be optimized to an access outside a loop, while the address it loads from is modified in the loop.
This can lead to a miscompile.
Reviewers: george.burgess.iv
Subscribers: Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79068
The assert checks that every instruction must be annotated by this point while it is not
necessary. If the inlining process was interrupted because the threshold was reached, the rest
of the instructions would not be annotated which triggers the assert.
The added test shows the situation in which it can happen.
Reviewed-By: mtrofin
Diff: https://reviews.llvm.org/D79107
We generate much better code these days than we used to. And we use the same sequence for AVX1 and AVX2 for these
For v4i64->v4i32 we generate:
vextractf128 xmm1, ymm0, 1
vshufps xmm0, xmm0, xmm1, 136 # xmm0 = xmm0[0,2],xmm1[0,2]
And for v8i64->v8i32 we generate:
vperm2f128 ymm2, ymm0, ymm1, 49 # ymm2 = ymm0[2,3],ymm1[2,3]
vinsertf128 ymm0, ymm0, xmm1, 1
vshufps ymm0, ymm0, ymm2, 136 # ymm0 = ymm0[0,2],ymm2[0,2],ymm0[4,6],ymm2[4,6]
Differential Revision: https://reviews.llvm.org/D79109
For compatibility with other assemblers on the platform, allow
using just plain integer register numbers in all places where a
register operand is expected.
Bug: llvm.org/PR45582
Remove redundant Group and Regs arguments from parseRegister
and eliminate one of its overloaded versions.
Remove redundant Regs argument from parseAddress.
NFC intended.
This is currently enabled for Intel big cores from Sandy Bridge onward, as well as Atom, Silvermont, and KNL, due to 64-bit division being so slow on these cores. AMD cores can do this in hardware (use 32-bit division based on input operand width), so it's not a win there. But since the majority of x86 CPUs benefit from this optimization, and since the potential upside is significantly greater than the downside, we should enable this for the generic x86-64 target.
Patch By: @atdt
Reviewed By: @craig.topper, @RKSimon
Differential Revision: https://reviews.llvm.org/D75567
Summary:
AArch64's system register ERXTS_EL1 is present in the backend as a
component of the Arm Reliability, Availability and Serviceability (RAS)
extension. However, it has been removed from the specification before
its final release.
This patch removes the register.
Reviewers: SjoerdMeijer, DavidSpickett
Reviewed By: DavidSpickett
Subscribers: DavidSpickett, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79007
The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited.
This patch does 2 things:
1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern.
2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs.
This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing.
A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well.
Reviewed By: @craig.topper
Differential Revision: https://reviews.llvm.org/D78216
The crash that caused the original revert has been fixed in
a3c964a278. I also added a reduced version of the crash reproducer.
This reverts the revert commit 2107af9ccf.
Call getNegatedExpression(Cost) and check the Cost to make the code more clear.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D78347
It allows it not to crash and analyze 16 bit subregs if those
appear in the instructions. At the same time it does not attempt
to reassign these. It still can correctly identify register
banks to let larger registers to be reassigned.
More work will be needed here when real instructions will use
these registers and more tests as well.
Differential Revision: https://reviews.llvm.org/D78772
This reverts commit ad38f4b371.
As it broke building the unittests:
.../sources/llvm-project/llvm/unittests/Support/Path.cpp:334:5: error: use of undeclared identifier 'set'
set(Value);
^
1 error generated.
We generate PACK instructions with an undef second source when we are truncating from a 128-bit vector to something narrower and we don't care about the upper bits of the vector register. The register allocation process will always assign untied undef uses to xmm0. This creates a false dependency on xmm0.
By adding these instructions to hasUndefRegUpdate, we can get the BreakFalseDeps pass to reassign the source to match the other input. Normally this interface is used for instructions that might need an xor inserted to break the dependency. But the pass also has a heuristic that tries to use the same register as other sources. That should always be possible for these instructions so we'll never trigger the xor dependency break.
Differential Revision: https://reviews.llvm.org/D79032
These are used in SReg_32 and when we start to use SGPR_LO16
there will be compaints that not all registers in RC support
all subreg indexes. For now it is NFC.
Unused regunits are reserved so that verifier does not complain
about missing phys reg live-ins.
Differential Revision: https://reviews.llvm.org/D78591
Generalize the 16-bit FPR to 32-bit GPR logic to work for all cases where
destination size is bigger than source size.
Also fixed CheckCopy() always returning true instead of the result of
isValidCopy().
Differential Revision: https://reviews.llvm.org/D77530
Patch by tambre (Raul Tambre)
Summary:
This patch adds a function that is similar to `llvm::sys::path::home_directory`, but provides access to the system cache directory.
For Windows, that is %LOCALAPPDATA%, and applications should put their files under %LOCALAPPDATA%\Organization\Product\.
For *nixes, it adheres to the XDG Base Directory Specification, so it first looks at the XDG_CACHE_HOME environment variable and falls back to ~/.cache/.
Subsequently, the Clangd Index storage leverages this new API to put index files somewhere else than the users home directory.
Fixes https://github.com/clangd/clangd/issues/341
Reviewers: sammccall, chandlerc, Bigcheese
Reviewed By: sammccall
Subscribers: hiraditya, ilya-biryukov, MaskRay, jkorous, dexonsmith, arphaman, kadircet, ormris, usaxena95, cfe-commits, llvm-commits
Tags: #clang-tools-extra, #clang, #llvm
Differential Revision: https://reviews.llvm.org/D78501
This allows forward declarations of PointerCheck, which in turn reduce
the number of times LoopAccessAnalysis needs to be included.
Ultimately this helps with moving runtime check generation to
Transforms/Utils/LoopUtils.h, without having to include it there.
Reviewers: anemet, Ayal
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D78458
These are used in SReg_32 and when we start to use SGPR_LO16
there will be compaints that not all registers in RC support
all subreg indexes. For now it is NFC.
Unused regunits are reserved so that verifier does not complain
about missing phys reg live-ins.
Differential Revision: https://reviews.llvm.org/D78591
Summary:
Refactored the parameter and return type where they are too generally
typed as Instruction.
Reviewers: dblaikie, wmi, craig.topper
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79027
* Merge QueueLock and CompletionLock.
* Avoid spurious CompletionCondition.notify_all() when ActiveThreads is greater than 0.
* Use default member initializers.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D78856
Implement passing of ByVal formal arguments when the argument is passed
partly in the argument registers, with the remainder of the argument
passed on the stack.
Differential Revision: https://reviews.llvm.org/D78515
Similar to code in `getAArch64Cmp` in AArch64ISelLowering.
When we get a compare against a constant, sometimes, that constant isn't valid
for selecting an immediate form.
However, sometimes, you can get a valid constant by adding 1 or subtracting 1,
and updating the condition code.
This implements the following transformations when valid:
- x slt c => x sle c - 1
- x sge c => x sgt c - 1
- x ult c => x ule c - 1
- x uge c => x ugt c - 1
- x sle c => x slt c + 1
- x sgt c => s sge c + 1
- x ule c => x ult c + 1
- x ugt c => s uge c + 1
Valid meaning the constant doesn't wrap around when we fudge it, and the result
gives us a compare which can be selected into an immediate form.
This also moves `getImmedFromMO` higher up in the file so we can use it.
Differential Revision: https://reviews.llvm.org/D78769
I've modified isTruncateFree to get an accurate cost for types that need to be split. I'm planning to look into fixing it for all vectors, but need more cost cleanups first.
Differential Revision: https://reviews.llvm.org/D78973
We unconditionally compared the DW_AT_ranges offset to the length of the
.debug_ranges section. For DWARF5 we should look at the debug_rnglists
section instead.
Differential revision: https://reviews.llvm.org/D78971
In `InstCombiner::visitAdd()`, we have
```
// A+B --> A|B iff A and B have no bits set in common.
if (haveNoCommonBitsSet(LHS, RHS, DL, &AC, &I, &DT))
return BinaryOperator::CreateOr(LHS, RHS);
```
so we should handle such `or`'s here, too.
Add support for reserving LR in:
* the driver through `-ffixed-x30`
* cc1 through `-target-feature +reserve-x30`
* the backend through `-mattr=+reserve-x30`
* a subtarget feature `reserve-x30`
the same way we're doing for the other registers.
Summary:
They all match the base implementation in
TargetInstrInfo::isUnpredicatedTerminator.
Follow up to D62749.
Reviewers: echristo, MaskRay, hfinkel
Reviewed By: echristo
Subscribers: wuzish, nemanjai, hiraditya, kbarton, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78976
This changes the logic with lowering fp16 bitcasts to always produce
either a VMOVhr or a VMOVrh, instead of only trying to do it with
certain surrounding nodes. To perform the same optimisations demand bits
and known bits information has been added for them.
Differential Revision: https://reviews.llvm.org/D78587
getTargetStreamer() might return null (e.g. when running inlined-strings.ll test),
downcasting to a reference will be wrong. This is detectable with -fsanitize=null.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D78686
There are several different types of cost that TTI tries to provide
explicit information for: throughput, latency, code size along with
a vague 'intersection of code-size cost and execution cost'.
The vectorizer is a keen user of RecipThroughput and there's at least
'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
help with this cost. The latency cost has a single use and a single
implementation. The intersection cost appears to cover most of the
rest of the API.
getUserCost is explicitly called from within TTI when the user has
been explicit in wanting the code size (also only one use) as well
as a few passes which are concerned with a mixture of size and/or
a relative cost. In many cases these costs are closely related, such
as when multiple instructions are required, but one evident diverging
cost in this function is for div/rem.
This patch adds an argument so that the cost required is explicit,
so that we can make the important distinction when necessary.
Differential Revision: https://reviews.llvm.org/D78635
Summary:
Changing all mnemonic to match assembly instructions to simplify mnemonic
naming rules. This time update all branch instructions. This also change
to use %s10 register consistently.
Differential Revision: https://reviews.llvm.org/D78889
Summary:
Add simm7fp/mimmfp to represent floating point immediate values.
Also clean multiclasses to define floating point arithmetic instructions
to handle simm7fp/mimmfp operands. Also add several regression tests
for new operands.
Differential Revision: https://reviews.llvm.org/D78887
Currently, on PowerPC target, it uses function scope UnsafeFPMath
option to drive Machine Combiner pass.
This is not accurate in two ways:
1: the scope is not accurate. Machine Combiner pass only requires
instruction-level flags instead of the function scope.
2: the float point flag is not accurate. Machine Combiner pass
only requires float point flags reassoc and nsz.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D78183
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Summary:
In the ppc-expand-isel pass, we use stepForward() to update the
liveins, this function is not recommended, because it needs the
accurate kill info.
This patch uses the function computeAndAddLiveIns() to update the
liveins, it's the recommended method and can fix the liveins bug for
ppc-expand-isel pass..
Reviewed By: efriedma, lkail
Differential Revision: https://reviews.llvm.org/D78657
Summary:
While looking into issues with IfConverter, I noticed that
X86InstrInfo::isUnpredicatedTerminator matched its overriden
implementation in TargetInstrInfo::isUnpredicatedTerminator.
Reviewers: craig.topper, hfinkel, MaskRay, echristo
Reviewed By: MaskRay, echristo
Subscribers: hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62749
We were passing the AppleObjCSection instead of the AddrSection. Maybe
the API changed and this remained unnoticed because the types are the
same, or maybe it's just a typo.
Add llvm.call.preallocated.{setup,arg} instrinsics.
Add "preallocated" operand bundle which takes a token produced by llvm.call.preallocated.setup.
Add "preallocated" parameter attribute, which is like byval but without the copy.
Verifier changes for these IR constructs.
See https://github.com/rnk/llvm-project/blob/call-setup-docs/llvm/docs/CallSetup.md
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74651
The code assumed that zero-extending the integer constant to the
designated alloc size would be fine even for BE targets, but that's not
the case as that pulls in zeros from the MSB side while we actually
expect the padding zeros to go after the LSB.
I've changed the codepath handling the constant integers to use the
store size for both small(er than u64) and big constants and then add
zero padding right after that.
Differential Revision: https://reviews.llvm.org/D78011
SmallVector currently uses 32bit integers for size and capacity to reduce
sizeof(SmallVector). This limits the number of elements to UINT32_MAX.
For a SmallVector<char>, this limits the SmallVector size to only 4GB.
Buffering bitcode output uses SmallVector<char>, but needs >4GB output.
This changes SmallVector size and capacity to conditionally use word-size
integers if the element type is small (<4 bytes). For larger elements types,
the vector size can reach ~16GB with 32bit size.
Making this conditional on the element type provides both the smaller
sizeof(SmallVector) for larger types which are unlikely to grow so large,
and supports larger capacities for smaller element types.
This recommit fixes the same template being instantiated twice on platforms
where uintptr_t is the same as uint32_t.
We may want to identify sequences that are not
reductions, but still qualify as load-combines
in the back-end, so make most of the body a
helper function.
"unskipableSimplifyCode()" was added to handle unsafe BL8_NOTOC instruction
when TOC was not completely removed. The function is not needed after confirming
TOC pointer is not used in a function that uses PC-Relative addressing.
Differential Revision: https://reviews.llvm.org/D78517
like .cfi_restore"
Insert .cfi_offset/.cfi_register when IncomingCSRSaved of current block
is larger than OutgoingCSRSaved of its previous block.
Original commit message:
https://reviews.llvm.org/D42848 only handled CFA related cfi directives but
didn't handle CSR related cfi. The patch adds the CSR part. Basically it reuses
the framework created in D42848. For each basicblock, the patch tracks which
CSR set have been saved at its CFG predecessors's exits, and compare the CSR
set with the set at its previous basicblock's exit (The previous block is the
block laid before the current block). If the saved CSR set at its previous
basicblock's exit is larger, .cfi_restore will be inserted.
The patch also generates proper .cfi_restore in epilogue to make sure the
saved CSR set is consistent for the incoming edges of each block.
Differential Revision: https://reviews.llvm.org/D74303
Summary:
Reviewing failures identified in D78586, I was finding the identifiers
for these iterators hard to read.
Reviewers: efriedma, MaskRay, jyknight
Reviewed By: MaskRay
Subscribers: hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78849
All avx512 truncate instructions except vXi64->vXi32 are 2 uops
on port 5. So raise their costs to 2. Except when we have an
earlier faster sequence like pshufb for 128 bit input vectors.
Add a lower cost of 3 v16i16->v16i8 with avx512f where we can
extend to v16i32 then truncate. And a cost of 2 for avx512bw with
and without avx512vl. There we can use vpmovwb with either a ymm
or zmm input. Both of these beat masking, splitting, and using
packuswb which is our avx/avx2 codegen.
The tl;dr story is that this causes jumps in the emitted line
tables, even at `-O0`. We could at some point consider more fancy
solutions to preserve locations, but it doesn't seem to be worth
the effort for now.
<rdar://problem/62460788>
Differential Revision: https://reviews.llvm.org/D78947
Tail Calls were initially disabled for PC Relative code because it was not safe
to make certain assumptions about the tail calls (namely that all compiled
functions no longer used the TOC pointer in R2). However, once all of the
TOC pointer references have been removed it is safe to tail call everything
that was tail called prior to the PC relative additions as well as a number of
new cases.
For example, it is now possible to tail call indirect functions as there is no
need to save and restore the TOC pointer for indirect functions if the caller
is marked as may clobber R2 (st_other=1). For the same reason it is now also
possible to tail call functions that are external.
Differential Revision: https://reviews.llvm.org/D77788
With a fix to unittests/Support/TarWriterTest.cpp
This makes lld's --reproduce output more compatible with tar 1.13 and
before. This is a very old version of tar, but it's the version in
both gnuwin and unxutils, and the cost for supporting them are very
low, so we might as well just do that.
https://bugs.chromium.org/p/chromium/issues/detail?id=1073524#c21
and onward has more details.
Differential Revision: https://reviews.llvm.org/D78945
This makes lld's --reproduce output more compatible with tar 1.13 and
before. This is a very old version of tar, but it's the version in
both gnuwin and unxutils, and the cost for supporting them are very
low, so we might as well just do that.
https://bugs.chromium.org/p/chromium/issues/detail?id=1073524#c21
and onward has more details.
Differential Revision: https://reviews.llvm.org/D78945
D63847 added `MCInstrAnalysis::evaluateMemoryOperandAddress()`. This patch
leverages the feature to print the target addresses for evaluable instructions.
```
-400a: movl 4080(%rip), %eax
+400a: movl 4080(%rip), %eax # 5000 <data1>
```
This patch also deletes `MIA->isCall(Inst) || MIA->isUnconditionalBranch(Inst) || MIA->isConditionalBranch(Inst)`
which is used to guard `MCInstrAnalysis::evaluateBranch()`
Reviewed By: jhenderson, skan
Differential Revision: https://reviews.llvm.org/D78776
Profile and profile summary are usually read only once and then annotated
on IR. The profile summary metadata on IR should include the value of the
newly added partial profile flag, so that compilation phase like thinlto
postlink can get the full set of profile information.
Differential Revision: https://reviews.llvm.org/D78310
Summary:
When generating code for the LLVM IR zeroinitialiser operation, if
the vector type is scalable we should be using SPLAT_VECTOR instead
of BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D78636
There are some intrinsics like this that currently block tail
predication, but should be fine. This allows fma through, as the one
that I ran into. There may be others that need the same treatment but
I've only done this one here.
Differential Revision: https://reviews.llvm.org/D78385
The insert(truncate/extend(extract(vec0,c0)),vec1,c1) case in rGacbc5ede99 wasn't combining the 'mineltsize' with the src vector elt size which may be smaller due to implicit extension during extraction.
Reduced from test case provided by @mstorsjo
hasNoSchedulingInfo should be used for Pseudo's and other instructions
that are never expected to be scheduled. This removes the flag from new
ARM instructions, instead fixing the A57 schedule by marking the related
architecture features as unsupported.
When compiling for a arm5te cpu from clang, the +dsp attribute is set.
This meant we could try and generate qadd8 instructions where we would
end up having no pattern. I've changed the condition here to be hasV6Ops
&& hasDSP, which is what other parts of ARMISelLowering seem to use for
similar instructions.
Fixed PR45677.
Differential Revision: https://reviews.llvm.org/D78877
This is a NFC patch for D77319. The idea is to hide the getNegatibleCost inside the getNegatedExpression()
to have it return null if the cost is expensive, and add some helper function for easy to use. And
rename the old getNegatedExpression to negateExpression to avoid the semantic conflict.
Reviewed By: RKSimon
Differential revision: https://reviews.llvm.org/D78291
We're currently getting this from the default implementation. But
I don't like how the cost model came to this answer and I might
be making some changes there.
Summary:
Extending the Function::viewCFG prototypes to allow for printing block probability info in form of .dot files during debugging.
Also avoiding an AV when no BFI/BPI available.
Reviewers: wenlei, davidxl, knaumov
Reviewed By: wenlei, davidxl
Subscribers: MaskRay, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77978
When folding tail, branch taken count is computed during initial VPlan execution
and recorded to be used by the compare computing the loop's mask. This recording
should directly set the State, instead of reusing Value2VPValue mapping which
serves original Values present prior to vectorization.
The branch taken count may be a constant Value, which may be used elsewhere in
the loop; trying to employ Value2VPValue for both leads to the issue reported in
https://reviews.llvm.org/D76992#inline-721028
Differential Revision: https://reviews.llvm.org/D78847
This means AttrBuilder will always create a sorted set of attributes and
we can skip the sorting step. Sorting attributes is surprisingly
expensive, and I recently made it worse by making it use array_pod_sort.
Followup to the PR45604 fix at rGe71dd7c011a3 where we disabled most of these cases.
By creating the shuffle at the byte level we can handle any extension/truncation as long as we track how small the scalar got and assume that the upper bytes will need to be zero.
Integer ranges can be used for loaded/stored values. Note that widening
can be disabled for loads/stores, as we only rely on instructions that
cause continued increases to ranges to be widened (like binary
operators).
Reviewers: efriedma, mssimpso, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78433
The change in D78624 had a noticeable negative compile-time impact.
It seems that going through a function call for the MaxUsesToExplore
default is fairly expensive, at least if LLVM is not built with LTO.
This patch makes MaxUsesToExpore default to 0 and assigns the actual
default in the implementation instead. This recovers most of the
regression.
Differential Revision: https://reviews.llvm.org/D78734
Attributes are currently stored as a simple list. Enum attributes
additionally use a bitset to allow quickly determining whether an
attribute is set. String attributes on the other hand require a
full scan of the list. As functions tend to have a lot of string
attributes (at least when clang is used), this is a noticeable
performance issue.
This patch adds an additional name => attribute map to the
AttributeSetNode, which allows querying string attributes quickly.
This results in a 3% reduction in instructions retired on CTMark.
Changes to memory usage seem to be in the noise (attribute sets are
uniqued, and we don't tend to have more than a few dozen or hundred
unique attribute sets, so adding an extra map does not have a
noticeable cost.)
Differential Revision: https://reviews.llvm.org/D78859
Summary:
Specifically make some simple refactorings to get PointerUnion.h and
Twine.h out of the public includes. While here, trim out a lot of
transitive includes as well.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78870
Summary:
This introduces a new SourceMgr::FindLocForLineAndColumn method that
uses the OffsetCache in SourceMgr::SrcBuffer to do do a constant time
lookup for the line number (once the cache is populated).
Use this method in MLIR's SourceMgrDiagnosticHandler::convertLocToSMLoc,
replacing the O(n) scanning logic. This resolves a long standing TODO
in MLIR, and makes one of my usecases go dramatically faster (which is
currently producing many diagnostics in a 40MB SourceBuffer).
NFC, this is just a performance speedup and cleanup.
Reviewers: rriddle!, ftynse!
Subscribers: hiraditya, mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, grosul1, frgossen, Kayjukh, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78868
Summary:
Missing error mangling is noticed in
https://bugs.llvm.org/show_bug.cgi?id=45636
where inconsistent profiling input caused
llvm/lld to crash as:
```
Program aborted due to an unhandled Error:
linking module flags 'ProfileSummary':
IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o'
```
The change does not change the fact that LLVM crashes
but changes error output to say what was incorrect:
```
LLVM ERROR: Function Import: link error:
linking module flags 'ProfileSummary':
IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o'
```
Actual crash has yet to be fixed.
Reviewers: lattner
Reviewed By: lattner
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78676
This is another enhancement to D77895/D78362
to avoid a round-trip from XMM->GPR->XMM.
This time we handle the case of starting/ending with different FP types
but always with signed i32 as the intermediate value.
I think this covers all of the faux vector optimization possibilities
for pre-AVX512.
There is at least 1 other transform mentioned in PR36617:
https://bugs.llvm.org/show_bug.cgi?id=36617#c19
...where we fold an 'fpext' into a preceding 'sitofp'. I think we will
want to handle that earlier (DAGCombiner or instcombine) because that's
a target-independent optimization.
Differential Revision: https://reviews.llvm.org/D78758
(X | MaskC) == C --> (X & ~MaskC) == C ^ MaskC
(X | MaskC) != C --> (X & ~MaskC) != C ^ MaskC
We have more analyis for 'and' patterns and already lean this way
in the existing code, so this should be neutral or better in IR.
If this does not do as well in codegen, the problem already exists
and we should fix that based on target costs/heuristics.
http://volta.cs.utah.edu:8080/z/oP3ecL
define void @src(i8 %x, i8 %OrC, i8 %C, i1* %p0, i1* %p1) {
%or = or i8 %x, %OrC
%eq = icmp eq i8 %or, %C
store i1 %eq, i1* %p0
%ne = icmp ne i8 %or, %C
store i1 %ne, i1* %p1
ret void
}
define void @tgt(i8 %x, i8 %OrC, i8 %C, i1* %p0, i1* %p1) {
%NotOrC = xor i8 %OrC, -1
%a = and i8 %x, %NotOrC
%NewC = xor i8 %C, %OrC
%eq = icmp eq i8 %a, %NewC
store i1 %eq, i1* %p0
%ne = icmp ne i8 %a, %NewC
store i1 %ne, i1* %p1
ret void
}
Using the existing NumFastStores statistic can be misleading when
comparing the impact of DSE patches.
For example, consider the case where a store gets removed from a
function before it is inlined into another function. A less
powerful DSE might only remove the store from functions it has
been inlined into, which will result in more stores being removed, but
no difference in the actual number of stores after DSE.
The new stat provides the absolute number of stores surviving after
DSE.
Reviewers: dmgreen, bryant, asbirlea, jfb
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D78830
Summary:
This patch helps isGuaranteedNotToBeUndefOrPoison look into more constants and instructions (bitcast/alloca/gep/fcmp).
To deal with bitcast, Depth is added to isGuaranteedNotToBeUndefOrPoison.
This patch is splitted from https://reviews.llvm.org/D75808.
Checked with Alive2
Reviewers: reames, jdoerfert
Reviewed By: jdoerfert
Subscribers: sanwou01, spatel, llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76010
Currently an unknown/undef value is marked as overdefined when merged
with an empty range. An empty range can occur in unreachable/dead code.
When merging the new unknown state (= no value known yet) with an empty
range, there still isn't any information about the value yet and we can
stay in unknown.
This gives a few nice improvements on the number of instructions removed
by IPSCCP:
Same hash: 170 (filtered out)
Remaining: 67
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...rks/FreeBench/mason/mason.test 3.00 6.00 100.0%
test-suite...nchmarks/McCat/18-imp/imp.test 3.00 5.00 66.7%
test-suite...C/CFP2000/179.art/179.art.test 2.00 3.00 50.0%
test-suite...ijndael/security-rijndael.test 2.00 3.00 50.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 40.00 58.00 45.0%
test-suite...ce/Applications/Burg/burg.test 26.00 37.00 42.3%
test-suite...cCat/03-testtrie/testtrie.test 3.00 4.00 33.3%
test-suite...Source/Benchmarks/sim/sim.test 29.00 36.00 24.1%
test-suite.../Applications/spiff/spiff.test 9.00 11.00 22.2%
test-suite...s/FreeBench/neural/neural.test 5.00 6.00 20.0%
test-suite...pplications/treecc/treecc.test 66.00 79.00 19.7%
test-suite...langs-C/football/football.test 85.00 101.00 18.8%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test 90.00 105.00 16.7%
test-suite...oxyApps-C++/miniFE/miniFE.test 37.00 43.00 16.2%
test-suite...rks/FreeBench/pifft/pifft.test 26.00 30.00 15.4%
test-suite...lications/sqlite3/sqlite3.test 481.00 548.00 13.9%
test-suite...marks/7zip/7zip-benchmark.test 4875.00 5522.00 13.3%
test-suite.../CINT2000/176.gcc/176.gcc.test 1117.00 1197.00 7.2%
test-suite...0.perlbench/400.perlbench.test 1618.00 1732.00 7.0%
Reviewers: efriedma, nikic, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78667
The sizes of offsets in the `.debug_str_offsets.dwo` section depend on
the format of compilation or type units referencing them: 4 bytes for
DWARF32 units and 8 bytes for DWARF64 ones. The fix uses parsed units
to determine the actual size of offsets in the corresponding part of
the `.debug_str_offsets.dwo` section.
Differential Revision: https://reviews.llvm.org/D78555
Summary:
refactor assume bulider for the next patch.
the assume builder now generate only one assume per attribute kind and per value they are on. to do this it takes the highest. this is desirable because currently, for all attributes the higest value is the most valuable.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78013
We should only skip `lifetime` and `dbg` intrinsics when searching for users.
Other intrinsics are legit users that can't be ignored.
Without this fix, the testcase would result in an invalid IR. `memcpy`
will have a reference to the, now, external value (local to the
extracted loop function).
Fix PR42194
Differential Revision: https://reviews.llvm.org/D78749
The CallSite and ImmutableCallSite were removed in a previous
commit. So rename the file to match the remaining class and
the name of the cpp that implements it.
SmallVector currently uses 32bit integers for size and capacity to reduce
sizeof(SmallVector). This limits the number of elements to UINT32_MAX.
For a SmallVector<char>, this limits the SmallVector size to only 4GB.
Buffering bitcode output uses SmallVector<char>, but needs >4GB output.
This changes SmallVector size and capacity to conditionally use word-size
integers if the element type is small (<4 bytes). For larger elements types,
the vector size can reach ~16GB with 32bit size.
Making this conditional on the element type provides both the smaller
sizeof(SmallVector) for larger types which are unlikely to grow so large,
and supports larger capacities for smaller element types.
If one of caller/callee has disabled ZMM registers due to
prefer-vector-width=256, we were previously
disabling argument promotion as the ABI might be incompatible since
one side will split 512-bit vectors in this case.
But if we can see that the types are all scalar this shouldn't be
a problem.
This patch assumes that pointer element type reflects the type that
the argument will be promoted to.
Differential Revision: https://reviews.llvm.org/D78770
Summary:
Instead of adding a ".unlikely" or ".eh" suffix for machine basic blocks,
this change updates the behaviour to use an appropriate prefix
instead. This allows lld to group basic block sections together
when -z,keep-text-section-prefix is specified and matches the behaviour
observed in gcc.
Reviewers: tmsriram, mtrofin, efriedma
Reviewed By: tmsriram, efriedma
Subscribers: eli.friedman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78742
Follow-up of D78082 and D78590.
Otherwise, because xray_instr_map is now read-only, the absolute
relocation used for Sled.Function will cause a text relocation.
A previous bug fix for varargs introduced a regression where we would
incorrectly widen some stores to memory when passing i8/i16 parameters on the
stack. This didn't show up seemingly because it only happens when there is
no signext/zeroext parameter attribute, which I think for Darwin clang adds.
Swift however seems to be a different story, and a plain anyext on the parameter
triggered the bug.
To fix this, I've added a new ValueHandler::assignValueToAddress type override
which lets us distiguish between varargs and fixed args (we still need this
widening behaviour for varargs to fix the original bug in 2018).
rdar://61353552
Summary:
Add a check to make sure that MachineInstr::mayAlias returns prematurely if at least one of its instruction parameters does not access memory. This prevents calls to TargetInstrInfo::areMemAccessesTriviallyDisjoint with incompatible instructions.
A side effect of this change is to render the mayAlias helper in the AArch64 load/store optimizer obsolete. We can now directly call the MachineInstr::mayAlias member function.
Reviewers: hfinkel, t.p.northover, mcrosier, eli.friedman, efriedma
Reviewed By: efriedma
Subscribers: efriedma, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78823
This is to fix performance regressions introduced by
86c944d790.
The old search would collect all potentially mergeable instructions in
the entire block. In this case, the same address is written in
multiple places in the block on the other side of a fence. When sorted
by offset, the two unmergeable, identical addresses would be next to
each other and the merge would give up.
Break the search space when we encounter an instruction we won't be
able to merge across. This will keep the identical addresses in
different merge attempts.
This may also improve compile time by reducing the merge list size.
As reported here: https://reviews.llvm.org/D75153#1987272
Before, each instance of llvm-cov was creating one thread per hardware core, which wasn't needed probably because the number of inputs were small. This was probably causing a thread rlimit issue on large core count systems.
After this patch, the previous behavior is restored (to what was before rG8404aeb5):
If --num-threads is not specified, we create one thread per input, up to num.cores.
When specified, --num-threads indicates any number of threads, with no upper limit.
Differential Revision: https://reviews.llvm.org/D78408
When using reversedInstructionsWithoutDebug to construct a range from a
pair of MachineInstrBundleIterators, the range unexpectedly leaves out an
element. This results in mis-optimization as @mstorsjo points out in
https://reviews.llvm.org/D78157.
The problem is that when we convert a MachineInstrBundleIterator to a
reverse iterator, the result gets incremented:
MachineInstrBundleIterator(++I.getReverse())
The comment there explains that the "resulting iterator will dereference
... to the previous node, which is somewhat unexpected; but converting
the two endpoints in a range will give the same range in reverse". This
makes it hard to understand what reversedInstructionsWithoutDebug will
do: I've removed the helper to prevent similar mistakes in the future.
Summary:
- Whether or not a vector is scalable is a function of its type. Since
all instances of ScalableVectorType will have true for this value and
all instances of FixedVectorType will have false for this value, there
is no need to store it as a class member.
Reviewers: efriedma, fpetrogalli, kmclaughlin
Reviewed By: fpetrogalli
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78601
This patch slightly improves the formatting of the debug output, adds a
few missing outputs and makes some existing outputs more consistent with
the rest.
Summary:
This simplifies testing in scenarios where we want to set up module-wide
analyses for inlining. The patch enables treating inlining and its
function cleanups, as a module pass. The alternative would be for tests
to describe the pipeline, which is tedious and adds maintenance
overhead.
Reviewers: davidxl, dblaikie, jdoerfert, sstefan1
Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78512
As discussed in PR45478:
https://bugs.llvm.org/show_bug.cgi?id=45478
...propagating FMF from the outer (second) call is not correct,
so intersect them instead.
I suspect we could do better (see TODO comment), but mismatched
FMF is probably too rare to care about.
Differential Revision: https://reviews.llvm.org/D78631
Summary:
It is important to emit HINT instructions instead of PAC ones when
PAC is disabled. This allows compatibility with other assemblers
(e.g. GAS). This was implemented in commit da33762de8.
Still, developers of assembly code will want to write code that is
compatible with both pre- and post-PAC CPUs. They could use HINT
mnemonics, but the new mnemonics are a lot more readable (e.g.
paciaz instead of hint #24), and they will result in the same
encodings. So, while LLVM should not *emit* the new mnemonics when
PAC is disabled, this patch will at least make LLVM *accept*
assembly code that uses them.
Reviewers: danielkiss, chill, olista01, LukeCheeseman, simon_tatham
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78372
Follow-up of D78082 (x86-64).
This change avoids dynamic relocations in `xray_instr_map` for ARM/AArch64/powerpc64le.
MIPS64 cannot use 64-bit PC-relative addresses because R_MIPS_PC64 is not defined.
Because MIPS32 shares the same code, for simplicity, we don't use PC-relative addresses for MIPS32 as well.
Tested on AArch64 Linux and ppc64le Linux.
Reviewed By: ianlevesque
Differential Revision: https://reviews.llvm.org/D78590
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
This patch includes:
- Assembly support for AArch32 and Assembly Parsing
D77872 has already added the MC representations of the instructions so that
they can be used in code gen; this patch fills in the details needed to
make assembly parsing work, and adds tests for asm and disasm
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Reviewers: t.p.northover, simon_tatham
Reviewed By: simon_tatham
Subscribers: simon_tatham, ostannard, kristof.beyls, hiraditya,
danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77874
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
This patch includes:
- Assembly support for AArch64 Scalable Vector Instructions (in line
with the Scalable Vector Extension - SVE)
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Reviewers: t.p.northover, rengolin, c-rhodes
Reviewed By: c-rhodes
Subscribers: c-rhodes, ostannard, tschuett, kristof.beyls, hiraditya,
danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77873
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
This patch includes:
- Assembly support for AArch32
- Intrinsics Support for AArch32 Neon Intrinsics for Matrix
Multiplication
Note: these extensions are optional in the 8.6a architecture and so have
to be enabled by default
No additional IR types or C Types are needed for this extension.
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Reviewers: t.p.northover, miyuki
Reviewed By: miyuki
Subscribers: miyuki, ostannard, kristof.beyls, hiraditya, danielkiss,
cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77872
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
This patch includes:
- Assembly support for AArch64 only (no SVE or Neon)
- Intrinsics Support for AArch64 Armv8.6a Matrix Multiplication Instructions (No bfloat16 matrix multiplication)
No IR types or C Types are needed for this extension.
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Reviewers: ostannard, t.p.northover, rengolin, kmclaughlin
Reviewed By: kmclaughlin
Subscribers: kmclaughlin, kristof.beyls, hiraditya, danielkiss,
cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77871
Summary: AttrIndex could be removed from DWARFAbbreviationDeclaration::getAttributeValue.
Reviewers: clayborg, dblaikie
Differential Revision: https://reviews.llvm.org/D78672
Currently we have computations of `p_filesz` and `p_memsz` mixed together
with the use of a loop over fragments. After recent changes it is possible to
avoid using a loop for the computation of `p_filesz`, since we know that fragments
are sorted by their file offsets.
The main benefit of this change is that splits the computation of `p_filesz`
and `p_memsz` what is simpler and allows us to fix the computation of the
`p_memsz` independently (D78005 shows the issue that we have currently).
Differential revision: https://reviews.llvm.org/D78628
Summary:
Frontend guarantees that coherent accesses have
corresponding cache policy bits set (glc, dlc).
Therefore there is no need for extra instructions
that invalidate cache.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78800
There's an ABI breakage here if LLVM is compiled in C++14 without
aligned allocation and a user tries to use the result with aligned
allocation. If DenseMap or unique_function is used across that ABI
boundary it will break (PR45413). Moving it out of line is a bit of
a band-aid and LLVM doesn't really give ABI guarantees at this level,
but given the number of complaints I've received over this it still
seems worth fixing.
Summary:
This patch maps IR operations for sdiv & udiv to the
@llvm.aarch64.sve.[s|u]div intrinsics.
A ptrue must be created during lowering as the div instructions
have only a predicated form.
Patch contains changes by Andrzej Warzynski.
Reviewers: sdesmalen, c-rhodes, efriedma, cameron.mcinally, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, andwar, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78569
One of transforms the loop vectorizer makes is LCSSA formation. In some cases it
is the only transform it makes. We should not drop CFG analyzes if only LCSSA was
formed and no actual CFG changes was made.
We should think of expanding this logic to other passes as well, and maybe make
it a part of PM framework.
Reviewed By: Florian Hahn
Differential Revision: https://reviews.llvm.org/D78360
MCELFObjectWriter::setRType## methods are always used altogether to
build complete MIPS N64 ABI "chain" of relocations. Using single
function for this task makes code less verbose.
Summary:
Changing all mnemonic to match assembly instructions to simplify mnemonic
naming rules. This time update all floating-point arithmetic instructions.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D78768
The approach here is to create a new (empty) component, `Extensions', where all
statically compiled extensions dynamically register their dependencies. That way
we're more natively compatible with LLVMBuild and llvm-config.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=44870
Differential Revision: https://reviews.llvm.org/D78192
This reverts commit 9245c7ac13.
This is triggering a segfault in XLA downstream, we'll follow-up with
a reproducer, it is likely influenced by TTI/TLI settings or other
options as a simple `opt -loop-vectorize` invocation on the IR
before the crash does not reproduce immediately.