Summary:
This patch restricts loop fusion to only consider rotated loops as valid candidates.
This simplifies the analysis and transformation and aligns with other loop optimizations.
Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel
Reviewed By: Meinersbur
Subscribers: ormris, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71025
Summary:
The instructions were originally implemented via builtins and
intrinsics so users would have to explicitly opt-in to using
them. This was useful while were validating whether these instructions
should have been merged into the spec proposal. Now that they have
been, we can use normal codegen patterns, so the intrinsics and
builtins are no longer useful.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71500
Now that the machine verifier will check for cases of register/immediate
MachineOperands and their correspondence to the MC instruction descriptor,
this patch adds the operand types to the descriptors where they were
previously missing. All MCOI::OPERAND_UNKNOWN operand types have been handled
to get a known type, except for G_... (global isel) instructions.
Review: Ulrich Weigand
https://reviews.llvm.org/D71494
Summary:
Follow-on to D66428 and D71193, to build the TLI per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
With D71193, the -fno-builtin* flags are converted to function
attributes, so we can now set this information per-function on the TLI.
In this patch, the TLI constructor is changed to take a Function, which
can be used to override the available builtins. The TLI is augmented
with an array that can be used to specify which builtins are not
available for the corresponding function. The available function checks
are changed to consult this override before checking the underlying
module level baseline TLII. New code is added to set this override
array based on the attributes.
I also removed the code that sets availability in the TLII in clang from
the options, which is no longer needed.
I removed a per-Triple caching of TLII objects in the analysis object,
as it is based on the Module's Triple which is the same for all
functions in any case. Is there a case where we would be compiling
multiple Modules with different Triples in one compilation?
Finally, I have changed the legacy analysis wrapper to create and use
the new PM analysis class (TargetLibraryAnalysis) in getTLI. This is
consistent with the behavior of getTTI for the legacy
TargetTransformInfo analysis. This change means that getTLI now creates
a new TLI on each call (although that should be very cheap as we cache
the module level TLII, and computing the per-function
attribute based availability should also be reasonably efficient).
I measured the compile time for a large C++ file with tens of thousands
of functions and as expected there was no increase.
Reviewers: chandlerc, hfinkel, gchatelet
Subscribers: mehdi_amini, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67923
Summary:
Reland after fixing a bug that allowed outlining of SP modifying instructions
that invalidated return address signing.
During AArch64 frame lowering instructions to enable return address
signing are inserted into functions if needed. Functions generated during
machine outlining don't run through target frame lowering and hence are
missing such instructions.
This patch introduces the following changes:
1. If not all functions that potentially participate in function outlining agree
on their return address signing scope and their return address signing key,
outlining is disabled for these functions.
2. If not all functions that potentially participate in function outlining agree
on their support for v8.3A features, outlining is disabled for these
functions.
3. If an outlining candidate would outline instructions that modify sp in a way
that invalidates return address signing, outlining is disabled for that
particular candidate.
4. If all candidate functions agree on the signing scope, signing key and their
support for v8.3 features, the outlined function behaves as if it had the
same scope and key attributes and as if it would provide the same v8.3A
support as the original functions.
Reviewers: ostannard, paquette
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70635
Summary:
This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients.
Functions will be deprecated one by one and as in tree code is cleaned up.
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71473
The emission of stack maps in AArch64 binaries has been disabled for all
binary formats except Mach-O since rL206610, probably mistakenly, as far
as I can tell. This patch reverts this to its intended state.
Differential Revision: https://reviews.llvm.org/D70069
Patch by Loic Ottet.
Summary:
In commit d60f34c20a (llvm-svn 317128,
PR35113) MergeBlockIntoPredecessor was changed into
discarding some dbg.value intrinsics referring to
PHI values, post-splice due to loop rotation.
That elimination of dbg.value intrinsics did not
consider which dbg.value to keep depending on the
context (e.g. if the variable is changing its value
several times inside the basic block).
In the past that hasn't been such a big problem since
CodeGenPrepare::placeDbgValues has moved the dbg.value
to be next to the PHI node anyway. But after commit
00e238896c CodeGenPrepare isn't doing that
any longer, so we need to be more careful when avoiding
duplicate dbg.value intrinsics in MergeBlockIntoPredecessor.
This patch replaces the code that tried to avoid duplicate
dbg.values by using the RemoveRedundantDbgInstrs helper.
Reviewers: aprantl, jmorse, vsk
Reviewed By: aprantl, vsk
Subscribers: jholewinski, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71480
Summary:
Add a RemoveRedundantDbgInstrs to BasicBlockUtils with the
goal to remove redundant dbg intrinsics from a basic block.
This can be useful after various transforms, as it might
be simpler to do a filtering of dbg intrinsics after the
transform than during the transform.
One primary use case would be to replace a too aggressive
removal done by MergeBlockIntoPredecessor, seen at loop
rotate (not done in this patch).
The elimination algorithm currently focuses on dbg.value
intrinsics and is doing two iterations over the BB.
First we iterate backward starting at the last instruction
in the BB. Whenever a consecutive sequence of dbg.value
instructions are found we keep the last dbg.value for
each variable found (variable fragments are identified
using the {DILocalVariable, FragmentInfo, inlinedAt}
triple as given by the DebugVariable helper class).
Next we iterate forward starting at the first instruction
in the BB. Whenever we find a dbg.value describing a
DebugVariable (identified by {DILocalVariable, inlinedAt})
we save the {DIValue, DIExpression} that describes that
variables value. But if the variable already was mapped
to the same {DIValue, DIExpression} pair we instead drop
the second dbg.value.
To ease the process of making lit tests for this utility a
new pass is introduced called RedundantDbgInstElimination.
It can be executed by opt using -redundant-dbg-inst-elim.
Reviewers: aprantl, jmorse, vsk
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71478
Summary:
At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number".
In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8.
Reviewers: foad, arsenm
Reviewed By: arsenm
Subscribers: foad, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Patch by Eugene Kuznetsov.
Differential Revision: https://reviews.llvm.org/D70367
Summary:
Guard against a potential crash observed in https://github.com/JuliaLang/julia/issues/32994#issuecomment-524249628
If two branches are collapsed we can encounter a degenerate conditional branch `TBB==FBB`.
The subsequent code assumes that they differ, so we exit out early.
Reviewers: ributzka, spatel
Subscribers: loladiro, dexonsmith, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66657
In ARMLowOverheadLoops.cpp, MVETailPredication.cpp, and MVEVPTBlock.cpp we have
quite a few helper functions all looking at the opcodes of MVE instructions.
This moves all these utility functions to ARMBaseInstrInfo.
Diferential Revision: https://reviews.llvm.org/D71426
JITLink (which underlies ObjectLinkingLayer) is a replacement for RuntimeDyld.
It supports the native code model, and linker plugins that enable a wider range
of features than RuntimeDyld.
Currently only enabled for MachO/x86-64 and MachO/arm64. New architectures will
be added as JITLink support for them is developed.
Summary:
r372285 changed LLVM to use a `TargetConstant` for parameters of intrinsics that are required to be immediates.
Since that commit, use of `%llvm.ppc.altivec.vc{fsx,fux,tsxs,tuxs}` intrinsics has not worked, and resulted in a `LLVM ERROR: Cannot select: intrinsic %llvm.ppc.altivec.vc*` error. The intrinsics' TableGen definitions matched on `imm` instead of `timm`.
This commit updates those definitions to use `timm`.
Fixes: https://llvm.org/PR44239
Reviewers: hfinkel, nemanjai, #powerpc, Jim
Reviewed By: Jim
Subscribers: qiucf, wuzish, Jim, hiraditya, kbarton, jsji, shchenz, llvm-commits
Tags: #llvm
Patched by vddvss (Colin Samples).
Differential Revision: https://reviews.llvm.org/D71138
This relieves ObjectLinkingLayer clients of the responsibility of holding the
memory manager. This makes it easier to select between RTDyldObjectLinkingLayer
(which already owned its memory manager factory) and ObjectLinkingLayer at
runtime as clients aren't required to hold a jitlink::MemoryManager field just
in case ObjectLinkingLayer is selected.
MCSymbolRefExpr::getVariantKindForName does not return VK_WEAKREF, so this code path is not exercised. Moreoever, .weakref is probably a feature that nobody uses.
shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...>
This is another missing shuffle fold pattern uncovered by the
shuffle correctness fix from D70246.
The problem was visible in the post-commit thread example, but
we managed to overcome the limitation for that particular case
with D71220.
This is something like the inverse of the previous fix - there
we didn't demand the inserted scalar, and here we are only
demanding an inserted scalar.
Differential Revision: https://reviews.llvm.org/D71488
The commit r369122 may keep LR and FP register (aka. frame record) in
the middle of a frame, thus we must add the offsets to ensure the FP
register always points to innermost frame record on the stack.
According to AAPCS64[1], a conforming code shall construct a linked list
of stack frames that can be traversed with frame records. This commit
is also essential to frame-pointer-based stack unwinder (e.g. the stack
unwinder in linx-perf-tools.)
[1] https://github.com/ARM-software/software-standards/blob/master/abi/aapcs64/aapcs64.rst#the-frame-pointer
Test: llvm-lit ${LLVM_SRC}/test/CodeGen/AArch64/framelayout-frame-record.ll
Test: llvm-lit ${LLVM_SRC}/test/CodeGen/AArch64
Differential Revision: https://reviews.llvm.org/D70800
This refactors the if-statements handling the hashing of various
MachineOperand types into a switch-statement. The purpose is to cover
all the basis for all MachineOperand types while being very deliberate
about which MachineOperand types we are not handling and why (better
added comments). This patch is a NFC redo of https://reviews.llvm.org/D71396.
Much of the changes present in D71396 will come in smaller follow-up patches
that will add support for hashing the MachineOperand types that aren't
covered piece-meal with tests for each new case.
This was part of D70767. When we replace the value of (call/invoke)
instructions we do not want to disturb the old call graph so we will
only replace instruction uses until we get rid of the old PM.
Accepted as part of D70767.
The change allows clang -mno-omit-leaf-frame-pointer to disable frame
pointer elimination. This behavior matches X86 and Mips, and also GCC
AArch64.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D71168
Extends the desciptor-based indirect call support for 32-bit codegen,
and enables indirect calls for AIX.
In-depth Description:
In a function descriptor based ABI, a function pointer points at a
descriptor structure as opposed to the function's entry point. The
descriptor takes the form of 3 pointers: 1 for the function's entry
point, 1 for the TOC anchor of the module containing the function
definition, and 1 for the environment pointer:
struct FunctionDescriptor {
void *EntryPoint;
void *TOCAnchor;
void *EnvironmentPointer;
};
An indirect call has several steps of loading the the information from
the descriptor into the proper registers for setting up the call. Namely
it has to:
1) Save the caller's TOC pointer into the TOC save slot in the linkage
area, and then load the callee's TOC pointer into the TOC register
(GPR 2 on AIX).
2) Load the function descriptor's entry point into the count register.
3) Load the environment pointer into the environment pointer register
(GPR 11 on AIX).
4) Perform the call by branching on count register.
5) Restore the caller's TOC pointer after returning from the indirect call.
A couple important caveats to the above:
- There is no way to directly load a value from memory into the count register.
Instead we populate the count register by loading the entry point address into
a gpr and then moving the gpr to the count register.
- The TOC restore has to come immediately after the branch on count register
instruction (i.e., the 1st instruction executed after we return from the
call). This is an implementation limitation. We could, in theory, schedule
the restore elsewhere as long as no uses of the TOC pointer fall in between
the call and the restore; however, to keep it simple, we insert a pseudo
instruction that represents both the indirect branch instruction and the
load instruction that restores the caller's TOC from the linkage area. As
they flow through the compiler as a single pseudo instruction, nothing can be
inserted between them and the caller's TOC is then valid at any use.
Differtential Revision: https://reviews.llvm.org/D70724
Legalization algorithm is complicated by two facts:
1) While regular instructions should be possible to legalize in
an isolated, per-instruction, context-free manner, legalization
artifacts can only be eliminated in pairs, which could be deeply, and
ultimately arbitrary nested: { [ () ] }, where which paranthesis kind
depicts an artifact kind, like extend, unmerge, etc. Such structure
can only be fully eliminated by simple local combines if they are
attempted in a particular order (inside out), or alternatively by
repeated scans each eliminating only one innermost pair, resulting in
O(n^2) complexity.
2) Some artifacts might in fact be regular instructions that could (and
sometimes should) be legalized by the target-specific rules. Which
means failure to eliminate all artifacts on the first iteration is
not a failure, they need to be tried as instructions, which may
produce more artifacts, including the ones that are in fact regular
instructions, resulting in a non-constant number of iterations
required to finish the process.
I trust the recently introduced termination condition (no new artifacts
were created during as-a-regular-instruction-retrial of artifacts not
eliminated on the previous iteration) to be efficient in providing
termination, but only performing the legalization in full if and only if
at each step such chains of artifacts are successfully eliminated in
full as well.
Which is currently not guaranteed, as the artifact combines are applied
only once and in an arbitrary order that has to do with the order of
creation or insertion of artifacts into their worklist, which is a no
particular order.
In this patch I make a small change to the artifact combiner, making it
to re-insert into the worklist immediate (modulo a look-through copies)
artifact users of each vreg that changes its definition due to an
artifact combine.
Here the first scan through the artifacts worklist, while not
being done in any guaranteed order, only needs to find the innermost
pair(s) of artifacts that could be immediately combined out. After that
the process follows def-use chains, making them shorter at each step, thus
combining everything that can be combined in O(n) time.
Reviewers: volkan, aditya_nandakumar, qcolombet, paquette, aemerson, dsanders
Reviewed By: aditya_nandakumar, paquette
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71448
and introducing new unittests/CodeGen/GlobalISel/LegalizerTest.cpp
relying on it to unit test the entire legalizer algorithm (including the
top-level main loop).
See also https://reviews.llvm.org/D71448
Summary:
To find potential opportunities to use getMemBasePlusOffset() I looked at
all ISD::ADD uses found with the regex getNode\(ISD::ADD,.+,.+Ptr
in lib/CodeGen/SelectionDAG. If this patch is accepted I will convert
the files in the individual backends too.
The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.
We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.
Reviewers: spatel
Reviewed By: spatel
Subscribers: merge_guards_bot, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71207
Summary:
This change is preparatory work to use this helper functions in more places.
In order to make this change, getMemBasePlusOffset() has been extended to
also take a SDNodeFlags parameter.
The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.
We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.
Reviewers: spatel
Reviewed By: spatel
Subscribers: merge_guards_bot, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71206
Summary:
This change is preparatory work to use this helper functions in more places.
Currently the function only allows integer constants offsets, but there
are cases where we can use an existing SDValue parameter.
The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.
We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.
Reviewers: spatel, craig.topper
Reviewed By: spatel, craig.topper
Subscribers: craig.topper, merge_guards_bot, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71205
Summary:
This change is preparatory work to use this helper functions in more places.
Currently the function only allows positive offsets, but there are cases
where we want to subtract an offset from an existing pointer.
The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.
We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.
Reviewers: spatel
Reviewed By: spatel
Subscribers: merge_guards_bot, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71204
Summary:
This copy ensures that debug location information is kept for
compressed instructions. There are places where both compressInstruction and
uncompressInstruction are called that were not doing this copy, discarding some
debug info.
This change merely moves the copy into the generated file, so you cannot forget
to copy the location over when compressing or uncompressing.
Reviewers: asb, luismarques
Reviewed By: luismarques
Subscribers: sameer.abuasal, aprantl, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67493
This reverts commit 0be81968a2.
The VFDatabase needs some rework to be able to handle vectorization
and subsequent scalarization of intrinsics in out-of-tree versions of
the compiler. For more details, see the discussion in
https://reviews.llvm.org/D67572.
The initial attempt (rG89633320) botched the logic by reversing
the source/dest types. Added x86 tests for additional coverage.
The vector tests show a potential improvement (fold vector load
instead of broadcasting), but that's a known/existing problem.
This fold is done in IR by instcombine, and we have a special
form of it already here in DAGCombiner, but we want the more
general transform too:
https://rise4fun.com/Alive/3jZm
Name: general
Pre: (C1 + zext(C2) < 64)
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%a = and i64 %s2, zext((1 << (16 - C2)) - 1)
%r = trunc %a to i16
Name: special
Pre: C1 == 48
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%r = trunc %s2 to i16
...because D58017 exposes a regression without this fold.
The big switch in `ARMBaseInstrInfo::getNumMicroOps` is missing cases for
`VLLDM` and `VLSTM`, which are currently defined with itineraries having a
dynamic count of micro-ops.
Assuming an optimistic case in which these instruction do not actually perform
loads or stores, and with the idea that Armv8-m cores are supposed to use the
new style scheduling models, this patch just sets the itinerary for those two
instructions to `NoItinerary`.
Differential Revision: https://reviews.llvm.org/D71266
Fix PR44284. This is probably not valid assembly but we should not crash.
Reviewed By: luporl, #powerpc, steven.zhang
Differential Revision: https://reviews.llvm.org/D71443
We've been marking VPT incompatible instructions as invalid for tail
predication too, though this may not strictly be true. VPT are
incompatible and, unless its the first predicate def in a loop,
they shouldn't be compatible for tail predication either.
Differential Revision: https://reviews.llvm.org/D71410
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.
This fixes the buildbot failures.
Differential Revision: https://reviews.llvm.org/D68328
Patch by Joseph Faulls!
Summary:
This patch adds intrinsics for the following MVE instructions:
* VABAV
* VMLADAV, VMLSDAV
* VMLALDAV, VMLSLDAV
* VRMLALDAVH, VRMLSLDAVH
Each of the above 4 groups has a corresponding new LLVM IR intrinsic,
since the instructions cannot be easily represented using
general-purpose IR operations.
Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM
Reviewed By: MarkMurrayARM
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71062
Summary:
This fills in the remaining shift operations that take a single vector
input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and
`vshll[bt]` families.
`vshll[bt]` (which shifts each input lane left into a double-width
output lane) is the most interesting one. There are separate MC
instruction ids for shifting by exactly the input lane width and
shifting by less than that, because the instruction encoding is so
completely different for the lane-width special case. So I had to
write two sets of patterns to match based on the immediate shift
count, which involved adding a ComplexPattern matcher to avoid the
general-case pattern accidentally matching the special case too. For
that family I've made sure to add an llc codegen test for both
versions of each instruction.
I'm experimenting with a new strategy for parametrising the isel
patterns for all these instructions: adding extra fields to the
relevant `Instruction` subclass itself, which are ignored by the
Tablegen backends that generate the MC data, but can be retrieved from
each instance of that instruction subclass when it's passed as a
template parameter to the multiclass that generates its isel patterns.
A nice effect of that is that I can fill in those informational fields
using `let` blocks, rather than having to type them out once per
instruction at `defm` time.
(As a result, quite a lot of existing instruction `def`s are
reindented by this patch, so it's clearer to read with whitespace
changes ignored.)
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71458
We have custom lowering for operations converting to/from floating-point types
when we don't have hardware support for those types, and this doesn't interact
well with the target-independent legalization of the strict versions of these
operations. Fix this by adding similar custom lowering of the strict versions.
This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics
test, with the remaining failures due to poor instruction selection.
Differential Revision: https://reviews.llvm.org/D71127
This reverts commit 69fcfb7d35.
As shown in the test I attached to this commit, the change I reverted
causes a problem with "zext(cc1) - zext(cc2)". It commuted
the operands to the sub and used different logic to select the addc/subc
instruction:
sub zext (setcc), x => addcarry 0, x, setcc
sub sext (setcc), x => subcarry 0, x, setcc
... but that is bogus. I believe it is not possible to fold those commuted
patterns into any form of addcarry or subcarry. It may have worked as
intended before "AMDGPU: Change boolean content type to 0 or 1" because
the setcc was considered to be -1 rather than 1.
Differential Revision: https://reviews.llvm.org/D70978
Change-Id: If2139421aa6c935cbd1d925af58fe4a4aa9e8f43
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).
In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.
Committing this change will significantly reduce our merge conflicts
for each upstream merge.
Reviewers: spatel, bogner
Reviewed By: bogner
Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70917
This reverts commit 9468e3334b.
There's a test that doesn't like this change. The RDA analysis
gets invalided by changes in the block, which is not taken into
account. Revert while I work on a fix for this.
Summary:
Better use of multiclass is used, and this helped find some existing
bugs in the predicated VMULL* intrinsics, which are now fixed.
The refactored VMULL[TB]Q_(INT|POLY)_M() intrinsics were discovered
to have an argument ("inactive") with incorrect type, and this required
a fix that is included in this whole patch. The argument "inactive"
should have been the same width (per vector element) as the return
type of the intrinsic, but was not in the case where the return type
was double the element width of the input types.
To assist in testing the multiclassing , and to thwart further gremlins,
the unit tests are improved in scope.
The *.ll tests are all generated by a small bit of throw-away scripting
from the corresponding *.c tests, and as such the diffs are large and
nasty. Look at the file rather than the diff.
Reviewers: dmgreen, miyuki, ostannard, simon_tatham
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71421
Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch
to use reg + imm non-temporal loads to fix previous test failures.
Original commit message:
Adds the following intrinsics:
- llvm.aarch64.sve.ldnt1
- llvm.aarch64.sve.stnt1
This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
Summary:
This is a quickfix for PR44275. An assertion that checks that the
DIExpression is valid failed due to attempting to create an entry value
for an indirect parameter. This started appearing after D69028, as the
indirect parameter started being represented using an DW_OP_deref,
rather than with the DBG_VALUE's second operand, meaning that the
isIndirectDebugValue() check in LiveDebugValues did not exclude such
parameters. A DIExpression that has an entry value operation can
currently not have any other operation, leading to the failed isValid()
check.
This patch simply makes us stop considering emitting entry values
for such parameters. To support such cases I think we at least need
to do the following changes:
* In DIExpression::isValid(): Remove the limitation that a
DW_OP_LLVM_entry_value operation can be the only operation in a
DIExpression.
* In LiveDebugValues::emitEntryValues(): Create an entry value of size
1, so that it only wraps the register operand, and not the whole
pre-existing expression (the DW_OP_deref).
* In LiveDebugValues::removeEntryValue(): Check that the new debug
value has the same debug expression as the original, rather than
checking that the debug expression is empty.
* In DwarfExpression::addMachineRegExpression(): Modify the logic so
that a DW_OP_reg* expression is emitted for the entry value.
That is how GCC emits entry values for indirect parameters. That will
currently not happen to due the DW_OP_deref causing the
!HasComplexExpression to fail. The LocationKind needs to be changed
also, rather than always emitting a DW_OP_stack_value for entry values.
There are probably more things I have missed, but that could hopefully
be a good starting point for emitting such entry values.
Reviewers: djtodoro, aprantl, jmorse, vsk
Reviewed By: aprantl, vsk
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D71416
Currently we have the `Flags` property that allows to
set flags for a section. The problem is that it does not
allow us to set an arbitrary value, because of bit fields
validation under the hood. An arbitrary values can be used
to test specific broken cases.
We probably do not want to relax the validation, so this
patch adds a `ShSize` property that allows to
override the `sh_size`. It is inline with others `Sh*` properties
we have already.
Differential revision: https://reviews.llvm.org/D71411
I believe this is a leftover from when fp128 was softened to fp128
on X86-64. In that case type legalization must have been able to
create a load that was the same as N which would make this
replacement fail or assert. Since we no longer do that, this
check should be unneeded.
This is a rebase of the change over D70376, which fixes an LVI cache
invalidation issue that also affected this patch.
-----
Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.
The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.
This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this change not fully NFC, because we can now fold
terminator icmps based on the dereferencability information in the
same block. This is the reason why I changed one JumpThreading test
(it would optimize the condition away without the change).
Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.
I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.
Differential Revision: https://reviews.llvm.org/D69914
(except for v4 loclists, which are sufficiently different to not fit
well in this generic implementation)
In subsequent patches I intend to refactor the DebugLoc and ranges data
structures to be more similar so I can common more of the implementation
here.
Summary:
Support alloca-referencing dbg.value in hwasan instrumentation.
Update AsmPrinter to emit DW_AT_LLVM_tag_offset when location is in
loclist format.
Reviewers: pcc
Subscribers: srhines, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70753
Summary:
Add pattern matching for the following instructions:
- add, sub, subr, sqadd, sqsub, uqadd, uqsub
This patch required complex patterns to match the immediate with optinal left shift.
I re-used the Select function from the other SVE repo to implement the complext pattern.
I plan on doing another patch to also match constant vector of the same immediate.
Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71370
When we reason about the pointer argument that is byval we actually
reason about a local copy of the value passed at the call site. This was
not the case before and we wrongly introduced attributes based on the
surrounding function.
AAMemoryBehaviorArgument, AAMemoryBehaviorCallSiteArgument and
AANoCaptureCallSiteArgument are made aware of byval now. The code
to skip "subsuming positions" for reasoning follows a common pattern and
we should refactor it. A TODO was added.
Discovered by @efriedma as part of D69748.
Removed code duplication in ThreadCmpOverSelect and broke it
into several smaller functions for reusing them.
Differential Revision: https://reviews.llvm.org/D71158
This fold is done in IR by instcombine, and we have a special
form of it already here in DAGCombiner, but we want the more
general transform too:
https://rise4fun.com/Alive/3jZm
Name: general
Pre: (C1 + zext(C2) < 64)
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%a = and i64 %s2, zext((1 << (16 - C2)) - 1)
%r = trunc %a to i16
Name: special
Pre: C1 == 48
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%r = trunc %s2 to i16
...because D58017 exposes a regression without this fold.
Summary:
This adds support for embedding bitcode in a binary during LTO. The libLTO gains supports the `-lto-embed-bitcode` flag. The option allows users of the LTO library to embed a bitcode section. For example, LLD can pass the option via `ld.lld -mllvm=-lto-embed-bitcode`.
This feature allows doing something comparable to `clang -c -fembed-bitcode`, but on the (LTO) linker level. Having bitcode alongside native code has many use-cases. To give an example, the MacOS linker can create a `-bitcode_bundle` section containing bitcode. Also, having this feature built into LLVM is an alternative to 3rd party tools such as [[ https://github.com/travitch/whole-program-llvm | wllvm ]] or [[ https://github.com/SRI-CSL/gllvm | gllvm ]]. As with these tools, this feature simplifies creating "whole-program" llvm bitcode files, but in contrast to wllvm/gllvm it does not rely on a specific llvm frontend/driver.
Patch by Josef Eisl <josef.eisl@oracle.com>
Reviewers: #llvm, #clang, rsmith, pcc, alexshap, tejohnson
Reviewed By: tejohnson
Subscribers: tejohnson, mehdi_amini, inglorion, hiraditya, aheejin, steven_wu, dexonsmith, dang, cfe-commits, llvm-commits, #llvm, #clang
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D68213
This patch renames the LoopInfo::isRotated() method to LoopInfo::isRotatedForm()
to make it clear that the method checks whether the loop is in rotated form, not
whether the loop has been rotated by the LoopRotation pass.
Any llvm function with the "packed-stack" attribute will be compiled to use
the packed stack layout which reuses unused parts of the incoming register
save area. This is needed for building the Linux kernel.
Review: Ulrich Weigand
https://reviews.llvm.org/D70821
This is not quite NFC because I changed the SDLoc to use the more
standard 'N' (the starting node for the fold).
This transform is a special-case of a more general fold that we
do in IR, but it seems like the general fold is needed here too
to avoid a potential regression seen in D58017.
https://rise4fun.com/Alive/3jZm
In order to use assumptions, computeKnownBits needs a context
instruction. We can use the GEP, if it is an instruction. We already
pass the assumption cache, but it cannot be used without a context
instruction.
Reviewers: anemet, asbirlea, hfinkel, spatel
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D71264
This is the first patch adding an initial set of matrix intrinsics and a
corresponding lowering pass. This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html
The first patch introduces four new intrinsics (transpose, multiply,
columnwise load and store) and a LowerMatrixIntrinsics pass, that
lowers those intrinsics to vector operations.
Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix
embedded in a <16 x float> vector) and the intrinsics take the dimension
information as parameters. Those parameters need to be ConstantInt.
For the memory layout, we initially assume column-major, but in the RFC
we also described how to extend the intrinsics to support row-major as
well.
For the initial lowering, we split the input of the intrinsics into a
set of column vectors, transform those column vectors and concatenate
the result columns to a flat result vector.
This allows us to lower the intrinsics without any shape propagation, as
mentioned in the RFC. In follow-up patches, we plan to submit the
following improvements:
* Shape propagation to eliminate the embedding/splitting for each
intrinsic.
* Fused & tiled lowering of multiply and other operations.
* Optimization remarks highlighting matrix expressions and costs.
* Generate loops for operations on large matrixes.
* More general block processing for operation on large vectors,
exploiting shape information.
We would like to add dedicated transpose, columnwise load and store
intrinsics, even though they are not strictly necessary. For example, we
could instead emit a large shufflevector instruction instead of the
transpose. But we expect that to
(1) become unwieldy for larger matrixes (even for 16x16 matrixes,
the resulting shufflevector masks would be huge),
(2) risk instcombine making small changes, causing us to fail to
detect the transpose, preventing better lowerings
For the load/store, we are additionally planning on exploiting the
intrinsics for better alias analysis.
Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70456
This adds ReachingDefAnalysis (RDA) to the VPTBlock pass, so that we can
reimplement findVCMPToFoldIntoVPS with just a few calls to RDA.
Differential Revision: https://reviews.llvm.org/D71330
Recommit e0b966643f. sub instructions were being generated for the
negated value, and for some reason they were the register only ones.
I think the problem was because I was grabbing the 'zero' from
vmovimm, which is a target constant. Now I'm just generating a new
Constant zero and so rsb instructions are now generated.
Original commit message:
The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.
Differential Revision: https://reviews.llvm.org/D70841
This helps delineate it in the output from later tables or other output.
Reviewed by: JDevlieghere
Differential Revision: https://reviews.llvm.org/D71344
Summary: Remove `Worklist` iteration and make use `checkForAllUses`. There is no test chage.
Reviewers: sstefan1, jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71352
During SelectionDAG, if a value which is associated with a DBG_VALUE
needs to be split across multiple registers, the DBG_VALUE will be split
into a set of fragment expressions to recreate the original value.
If one or more of these fragments cannot be created, they would
previously be silently dropped, causing the old debug value to live past
its expiry date. This patch fixes this issue by keeping invalid
fragments while setting their value as Undef.
Differential revision: https://reviews.llvm.org/D70248
This makes TimeTraceProfilerInstance thread local. Added
timeTraceProfilerFinishThread() which moves the thread local instance to
a global vector of instances. timeTraceProfilerWrite() then writes
recorded data from all instances.
Threads are identified based on their thread ids. Totals are reported
with artificial thread ids higher than the real ones.
Replaced raw pointer for TimeTraceProfilerInstance with unique_ptr.
Differential Revision: https://reviews.llvm.org/D71059
In order to properly implement these atomic we need one register more than other
binary atomics. It is used for storing result from comparing values in addition
to the one that is used for actual result of operation.
https://reviews.llvm.org/D71028
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.
Differential Revision: https://reviews.llvm.org/D68328
Patch by Joseph Faulls!
That patch adds checking into DWARFVerifier that the Skeleton
compilation unit does not have children.
Differential Revision: https://reviews.llvm.org/D71244
The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.
Differential Revision: https://reviews.llvm.org/D70841
Summary: AutoFDO compilation has two places that do inlining - the sample profile loader that does inlining with context sensitive profile, and the regular inliner as CGSCC pass. Ideally we want most inlining to come from sample profile loader as that is driven by context sensitive profile and also retains context sensitivity after inlining. However the reality is most of the inlining actually happens during regular inliner. To track the number of inline instances from sample profile loader and help move more inlining to sample profile loader, I'm adding statistics and optimization remarks for sample profile loader's inlining.
Reviewers: wmi, davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70584
No more hash collisions for memoperands. Now the MIRCanonicalization
pass shouldn't hit hash collisions when dealing with nearly identical
memory accessing instructions when their memoperands are in fact different.
Differential Revision: https://reviews.llvm.org/D71328
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.
Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.
Split off from D71320
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D71381
This reverts commit 30038da15b. It causes
the stage2 thinLTO bot to fail with:
Assertion failed: (CU.getDIE(CalleeSP) && "Expected declaration subprogram DIE for callee")
rdar://57840415
The fp16 to larger than fp32 inserts an extend that need to
re-legalized if fp16 is promoted. But if we check for fp16
promotion first, then we can avoid emiting the fp_extend all
together.
This is the initial patch for the OpenMP-IR-Builder, as discussed on the
mailing list ([1] and later) and at the US Dev Meeting'19.
The design is similar to D61953 but:
- in a non-WIP status, with proper documentation and working.
- using a OpenMPKinds.def file to manage lists of directives, runtime
functions, types, ..., similar to the current Clang implementation.
- restricted to handle only (simple) barriers, to implement most
`#pragma omp barrier` directives and most implicit barriers.
- properly hooked into Clang to be used if possible (D69922).
- compatible with the remaining code generation.
Parts have been extracted into D69853.
The plan is to have multiple people working on moving logic from Clang
here once the initial scaffolding (=this patch) landed.
[1] http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html
Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim
Subscribers: mgorny, hiraditya, bollu, guansong, jfb, cfe-commits, llvm-commits, penzn, ppenzin
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69785
This is equivalent to the existing `import_name` and `import_module`
attributes which control the import names in the final wasm binary
produced by lld.
This maps the existing
This attribute currently requires a string rather than using the
symbol name for a couple of reasons:
1. Avoid confusion with static and dynamic linking which is
based on symbol name. Exporting a function from a wasm module using
this directive is orthogonal to both static and dynamic linking.
2. Avoids name mangling.
Differential Revision: https://reviews.llvm.org/D70520
Fix for https://bugs.llvm.org/show_bug.cgi?id=40846.
This adds a combine for cases where a (a + b) < a style overflow
check is performed, but with a + b being the result of
uadd.with.overflow, so the overflow result is also already available
and we can just use it. Subsequently GVN/CSE will deduplicate the extracts.
We can run into this situation if you have both a uadd.with.overflow
and a manual add + overflow check in the same function (on the same
operands), in which case GVN will rewrite the add to the with.overflow
result and leave you with this pattern.
The implementation is a bit ugly because I'm handling the various
canonicalization edge cases.
This does not yet handle the negated version of this pattern.
Differential Revision: https://reviews.llvm.org/D58644
If the pointer was loaded/stored before the null check, the check
is redundant and can be removed. For now the optimizers do not
remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG.
The patch allows to use more nonnull constraints. Also, it found
one more optimization in some PowerPC test. This is my first llvm
review, I am free to any comments.
Differential Revision: https://reviews.llvm.org/D71177
Fix for https://bugs.llvm.org/show_bug.cgi?id=44236. This code was
originally introduced in rG36512330041201e10f5429361bbd79b1afac1ea1.
However, the attribute copying was done in the wrong place (in general
call replacement, not thunk generation) and a proper fix was
implemented in D12581.
Previously this code was just unnecessary but harmless (because
FunctionComparator ensured that the attributes of the two functions
are exactly the same), but since byval was changed to accept a type
this copying is actively wrong and may result in malformed IR.
Differential Revision: https://reviews.llvm.org/D71173
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets
to show the problem more generally.
We check the number of uses as a predicate for whether some
value is free to negate, but that use count can change as we
rewrite the expression in getNegatedExpression(). So something
that was marked free to negate during the cost evaluation
phase becomes not free to negate during the rewrite phase (or
the inverse - something that was not free becomes free).
This can lead to a crash/assert because we expect that
everything in an expression that is negatible to be handled
in the corresponding code within getNegatedExpression().
This patch skips the use check during the rewrite phase.
So we determine that some expression isNegatibleForFree
(identically to without this patch), but during the rewrite,
don't rely on use counts to decide how to create the optimal
expression.
Differential Revision: https://reviews.llvm.org/D70975
Summary:
The current da printer shows the dependence without indicating
which instructions are being considered as the src vs dst. It
also silently ignores call instructions, despite the fact that
they create confused dependence edges to other memory
instructions. This patch addresses these two issues plus a
couple of minor non-functional improvements.
Authored By: bmahjour
Reviewer: dmgreen, fhahn, philip.pfaffe, chandlerc
Reviewed By: dmgreen, fhahn
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71088
Before z14, we did not have any FMA instruction for 128-bit
floating-point, so the @llvm.fma.f128 intrinsic needs to be
expanded to a libcall on those platforms.
This worked correctly for regular FMA, but was implemented
incorrectly for the strict version. This was not noticed
because we did not have test coverage for this case.
This patch fixes that incorrect expansion and adds the
missing test cases.
Summary:
This patch adds a method to determine if a loop is in rotated form (the latch is
an exiting block). It also modifies the getLoopGuardBranch method to use this
new method. This method can also be used in Loopfusion. Once this patch lands I
will make the corresponding changes there.
Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel
Reviewed By: Meinersbur
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65958
This simplifies code where no extra details are required
Also don't write out detail when it is empty.
Differential Revision: https://reviews.llvm.org/D71347
There are a few places that check specific string attributes have
particular values, and assert if they are something else. The verifier
should catch these kinds of cases.
In some cases, we can rename a store operand, in order to enable pairing
of stores. For store pairs, that cannot be merged because the first
tored register is defined in between the second store, we try to find
suitable rename register.
First, we check if we can rename the given register:
1. The first store register must be killed at the store, which means we
do not have to rename instructions after the first store.
2. We scan backwards from the first store, to find the definition of the
stored register and check all uses in between are renamable. Along
they way, we collect the minimal register classes of the uses for
overlapping (sub/super)registers.
Second, we try to find an available register from the minimal physical
register class of the original register. A suitable register must not be
1. defined before FirstMI
2. between the previous definition of the register to rename
3. a callee saved register.
We use KILL flags to clear defined registers while scanning from the
beginning to the end of the block.
This triggers quite often, here are the top changes for MultiSource,
SPEC2000, SPEC2006 compiled with -O3 for iOS:
Metric: aarch64-ldst-opt.NumPairCreated
Program base patch diff
test-suite...nch/fourinarow/fourinarow.test 2.00 39.00 1850.0%
test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test 46.00 80.00 73.9%
test-suite...chmarks/Olden/power/power.test 70.00 96.00 37.1%
test-suite...cations/hexxagon/hexxagon.test 29.00 39.00 34.5%
test-suite...nchmarks/McCat/05-eks/eks.test 100.00 132.00 32.0%
test-suite.../Trimaran/enc-rc4/enc-rc4.test 46.00 59.00 28.3%
test-suite...T2006/473.astar/473.astar.test 160.00 200.00 25.0%
test-suite.../Trimaran/enc-md5/enc-md5.test 8.00 10.00 25.0%
test-suite...telecomm-gsm/telecomm-gsm.test 113.00 139.00 23.0%
test-suite...ediabench/gsm/toast/toast.test 113.00 139.00 23.0%
test-suite...Source/Benchmarks/sim/sim.test 91.00 111.00 22.0%
test-suite...C/CFP2000/179.art/179.art.test 41.00 49.00 19.5%
test-suite...peg2/mpeg2dec/mpeg2decode.test 245.00 279.00 13.9%
test-suite...marks/Olden/health/health.test 16.00 18.00 12.5%
test-suite...ks/Prolangs-C/cdecl/cdecl.test 90.00 101.00 12.2%
test-suite...fice-ispell/office-ispell.test 91.00 100.00 9.9%
test-suite...oxyApps-C/miniGMG/miniGMG.test 430.00 465.00 8.1%
test-suite...lowfish/security-blowfish.test 39.00 42.00 7.7%
test-suite.../Applications/spiff/spiff.test 42.00 45.00 7.1%
test-suite...arks/mafft/pairlocalalign.test 2473.00 2646.00 7.0%
test-suite.../VersaBench/ecbdes/ecbdes.test 29.00 31.00 6.9%
test-suite...nch/beamformer/beamformer.test 220.00 235.00 6.8%
test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2252.00 6.7%
test-suite...ve-susan/automotive-susan.test 109.00 116.00 6.4%
test-suite...s-C/unix-smail/unix-smail.test 65.00 69.00 6.2%
test-suite...CI_Purple/SMG2000/smg2000.test 1194.00 1265.00 5.9%
test-suite.../Benchmarks/nbench/nbench.test 472.00 500.00 5.9%
test-suite...oxyApps-C/miniAMR/miniAMR.test 248.00 262.00 5.6%
test-suite...quoia/CrystalMk/CrystalMk.test 18.00 19.00 5.6%
test-suite...rks/tramp3d-v4/tramp3d-v4.test 7331.00 7710.00 5.2%
test-suite.../Benchmarks/Bullet/bullet.test 5651.00 5938.00 5.1%
test-suite...ternal/HMMER/hmmcalibrate.test 750.00 788.00 5.1%
test-suite...T2006/456.hmmer/456.hmmer.test 764.00 802.00 5.0%
test-suite...ications/JM/ldecod/ldecod.test 1028.00 1079.00 5.0%
test-suite...CFP2006/444.namd/444.namd.test 1368.00 1434.00 4.8%
test-suite...marks/7zip/7zip-benchmark.test 4471.00 4685.00 4.8%
test-suite...6/464.h264ref/464.h264ref.test 3122.00 3271.00 4.8%
test-suite...pplications/oggenc/oggenc.test 1497.00 1565.00 4.5%
test-suite...T2000/300.twolf/300.twolf.test 742.00 774.00 4.3%
test-suite.../Prolangs-C/loader/loader.test 24.00 25.00 4.2%
test-suite...0.perlbench/400.perlbench.test 1983.00 2058.00 3.8%
test-suite...ications/JM/lencod/lencod.test 4612.00 4785.00 3.8%
test-suite...yApps-C++/PENNANT/PENNANT.test 995.00 1032.00 3.7%
test-suite...arks/VersaBench/dbms/dbms.test 54.00 56.00 3.7%
Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D70450
The Isa register is a uint8_t, but at least on Windows this is
internally an unsigned char, which meant that prior to this patch it got
formatted as an ASCII character, rather than a decimal number. This
patch fixes this by casting it to a uint64_t before printing. I did it
this way instead of using a uint8_t formatter because a) it is simpler,
and b) it allows us to change the internal type of Isa in the future
without this code breaking.
I also took the opportunity to test the printing of the other standard
opcodes.
Reviewed by: probinson
Differential Revision: https://reviews.llvm.org/D71274
Summary: Rollback of parts of D71213. After digging more into the code I think we should leave 0 when creating the instructions (CreateMemcpy, CreateMaskedStore, CreateMaskedLoad). It's probably fine for MemorySanitizer because Alignement is resolved but I'm having a hard time convincing myself it has no impact at all (although tests are passing).
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71332
Summary:
These allow you to get and set the operator of a dag node, without
affecting its list of arguments.
`!getop` is slightly fiddly because in many contexts you need its
return value to have a static type more specific than 'any record'. It
works to say `!cast<BaseClass>(!getop(...))`, but it's cumbersome, so
I made `!getop` take an optional type suffix itself, so that can be
written as the shorter `!getop<BaseClass>(...)`.
Reviewers: hfinkel, nhaehnle
Reviewed By: nhaehnle
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71191
Summary:
Adds the following intrinsics:
- llvm.aarch64.sve.ldnt1
- llvm.aarch64.sve.stnt1
This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71000
After creating a low-overhead loop, the loop update instruction was still
lingering around hurting performance. This removes dead loop update
instructions, which in our case are mostly SUBS instructions.
To support this, some helper functions were added to MachineLoopUtils and
ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses
before a particular loop instruction, respectively.
This is a first version that removes a SUBS instruction when there are no other
uses inside and outside the loop block, but there are some more interesting
cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which
shows that there is room for improvement. For example, we can't handle this
case yet:
..
dlstp.32 lr, r2
.LBB0_1:
mov r3, r2
subs r2, #4
vldrh.u32 q2, [r1], #8
vmov q1, q0
vmla.u32 q0, q2, r0
letp lr, .LBB0_1
@ %bb.2:
vctp.32 r3
..
which is a lot more tricky because r2 is not only used by the subs, but also by
the mov to r3, which is used outside the low-overhead loop by the vctp
instruction, and that requires a bit of a different approach, and I will follow
up on this.
Differential Revision: https://reviews.llvm.org/D71007
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.
There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.
In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: dmgreen, MarkMurrayARM
Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71065
Summary:
In the function `EarlyIfPredicator::shouldConvertIf()`, we call
`TII->isProfitableToIfCvt()` with `BranchProbability::getUnknown()`, it may
cause the potential assertion error for those hook which use `BranchProbability`
in `isProfitableToIfCvt()`, for example `SystemZ`.
`SystemZ` use `Probability < BranchProbability(1, 8))` in the function
`SystemZInstrInfo::isProfitableToIfCvt()`, if we call this function with
`BranchProbability::getUnknown()`, it will cause assertion error.
This patch is to fix the potential bug.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D71273
This iterator range just includes physical registers and register masks,
which are interesting when dealing with register liveness.
Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D70562
PowerPC has instruction to do the semantics of this piece of code:
vector int foo(vector int m, vector int n) {
return (m + n + 1) >> 1;
}
This patch is adding the match rule to select it.
Differential Revision: https://reviews.llvm.org/D71002
I think this is no longer needed. The system should take care
of legalizing any new nodes that are added. I think this might
have been needed prior to r371709 or r307053.
Now, flags will result in differing hashes for a given MI. In effect, if
you have two instructions with everything identical except for their
flags then you should get two different hashes and fewer collisions.
Differential Revision: https://reviews.llvm.org/D70479
Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision.
Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor
Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70582
This reverts 3e1aee2ba7 in favor
of a different approach.
Scalarizing isn't great codegen, but making the type illegal was
interfering with k constraint in inline assembly.
This allows a call site tag in CU A to reference a callee DIE in CU B
without resorting to creating an incomplete duplicate DIE for the callee
inside of CU A.
We already allow cross-CU references of subprogram declarations, so it
doesn't seem like definitions ought to be special.
This improves entry value evaluation and tail call frame synthesis in
the LTO setting. During LTO, it's common for cross-module inlining to
produce a call in some CU A where the callee resides in a different CU,
and there is no declaration subprogram for the callee anywhere. In this
case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram
in order to fill in the call site tag. That empty 'definition' defeats
entry value evaluation etc., because the debugger can't figure out what
it means.
As a follow-up, maybe we could add a DWARF verifier check that a
DW_TAG_subprogram at least has a DW_AT_name attribute.
rdar://46577651
Differential Revision: https://reviews.llvm.org/D70350
Currently for extern variables with section attribute, those
BTF_KIND_VARs will not be placed in any DataSec. This is
inconvenient as any other generated BTF_KIND_VAR belongs to
one DataSec. This patch put these extern variables into
".extern" section so bpf loader can have a consistent
processing mechanism for all data sections and variables.
This caused non-determinism in the compiler, see command on the Phabricator
code review.
> This patch addresses a performance problem reported in PR43855, and
> present in the reapplication in in 001574938e5. It turns out that
> MachineSink will (often) move instructions to the first block that
> post-dominates the current block, and then try to sink further. This
> means if we have a lot of conditionals, we can needlessly create large
> numbers of DBG_VALUEs, one in each block the sunk instruction passes
> through.
>
> To fix this, rather than immediately sinking DBG_VALUEs, record them in
> a pass structure. When sinking is complete and instructions won't be
> sunk any further, new DBG_VALUEs are added, avoiding lots of
> intermediate DBG_VALUE $noregs being created.
>
> Differential revision: https://reviews.llvm.org/D70676
This adds support for printing improved missing feature error messages
from the assembler, which now indicates which feature caused the parse
to fail.
Differential Revision: https://reviews.llvm.org/D69899
This patch introduced the VFDatabase, the framework proposed in
http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html. [*]
In this patch the VFDatabase is used to bridge the TargetLibraryInfo
(TLI) calls that were previously used to query for the availability of
vector counterparts of scalar functions.
The VFISAKind field `ISA` of VFShape have been moved into into VFInfo,
under the assumption that different vector ISAs may provide the same
vector signature. At the moment, the vectorizer accepts any of the
available ISAs as long as the signature provided by the VFDatabase
matches the one expected in the vectorization process. For example,
when targeting AVX or AVX2, which both have 256-bit registers, the IR
signature of the two vector functions associated to the two ISAs is
the same. The `getVectorizedFunction` method at the moment returns the
first available match. We will need to add more heuristics to the
search system to decide which of the available version (TLI, AVX,
AVX2, ...) the system should prefer, when multiple versions with the
same VFShape are present.
Some of the code in this patch is based on the work done by Sumedh
Arani in https://reviews.llvm.org/D66025.
[*] Notice that in the proposal the VFDatabase was called SVFS. The
name VFDatabase is more in line with LLVM recommendations for
naming classes and variables.
Differential Revision: https://reviews.llvm.org/D67572
Summary:
This patch refactors instruction selection of the complex vector
addition, multiplication and multiply-add intrinsics, so that it is
now based on TableGen patterns rather than C++ code.
It also changes the first parameter (halving vs non-halving) of the
arm_mve_vcaddq IR intrinsic to match the corresponding instruction
encoding, hence it requires some changes in the tests.
The patch addresses David's comment in https://reviews.llvm.org/D71190
Reviewers: dmgreen, ostannard, simon_tatham, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71245
SUMMARY:
Fixed a bug of XCOFFObjectFile.cpp when there is padding at the last csect of a sections.
when there is a tail padding of a section, but the value of CurrentAddressLocation do not be increased by the padding size. it will hit assert assert(CurrentAddressLocation == Section->Address && "We should have no padding between sections.");
Reviewers: daltenty,hubert.reinterpretcast,
Differential Revision: https://reviews.llvm.org/D70859
Extern variable usage in BPF is different from traditional
pure user space application. Recent discussion in linux bpf
mailing list has two use cases where debug info types are
required to use extern variables:
- extern types are required to have a suitable interface
in libbpf (bpf loader) to provide kernel config parameters
to bpf programs.
https://lore.kernel.org/bpf/CAEf4BzYCNo5GeVGMhp3fhysQ=_axAf=23PtwaZs-yAyafmXC9g@mail.gmail.com/T/#t
- extern types are required so kernel bpf verifier can
verify program which uses external functions more precisely.
This will make later link with actual external function no
need to reverify.
https://lore.kernel.org/bpf/87eez4odqp.fsf@toke.dk/T/#m8d5c3e87ffe7f2764e02d722cb0d8cbc136880ed
This patch added clang support to emit debuginfo for extern variables
with a TargetInfo hook to enable it. The debuginfo for the
extern variable is emitted only if that extern variable is
referenced in the current compilation unit.
Currently, only BPF target enables to generate debug info for
extern variables. The emission of such debuginfo is disabled for C++
at this moment since BPF only supports a subset of C language.
Emission with C++ can be enabled later if an appropriate use case
is identified.
-fstandalone-debug permits us to see more debuginfo with the cost
of bloated binary size. This patch did not add emission of extern
variable debug info with -fstandalone-debug. This can be
re-evaluated if there is a real need.
Differential Revision: https://reviews.llvm.org/D70696
This pattern is noted as a regression from:
D70246
...where we removed an over-aggressive shuffle simplification.
SimplifyDemandedVectorElts fails to catch this case when the insert has multiple uses,
so I'm proposing to pattern match the minimal sequence directly. This fold does not
conflict with any of our current shuffle undef/poison semantics.
Differential Revision: https://reviews.llvm.org/D71220
basic blocks
Originally applied in 72ce759928.
Fixed a build failure caused by incorrect use of cast instead of
dyn_cast.
This reverts commit 8b0780f795.
Summary:
This patch fixes a few issues when large arrays are allocated on the
stack. Currently, clang has inconsistent behaviour, for debug builds
there is an assertion failure when the array size on stack is around 2GB
but there is no assertion when the stack is around 8GB. For release
builds there is no assertion, the compilation succeeds but generates
incorrect code. The incorrect code generated is due to using
int/unsigned int instead of their 64-bit counterparts. This patch,
1) Removes the assertion in frame legality check.
2) Converts int/unsigned int in some places to the 64-bit variants. This
helps in generating correct code and removes the inconsistent behaviour.
3) Adds a test which runs without optimisations.
Reviewers: sdesmalen, efriedma, fhahn, aemerson
Reviewed By: efriedma
Subscribers: eli.friedman, fpetrogalli, kristof.beyls, hiraditya,
llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70496
This is not a new semantic feature. The syntax `(? 1, 2, 3)` was
disallowed by the parser in a dag //expression//, but there were
already ways to sneak a `?` into the operator field of a dag
//value//, e.g. by initializing it from a class template parameter
which is then set to `?` by the instantiating `def`.
This patch makes `?` in the operator slot syntactically legal, so it's
now easy to construct dags with an unset operator. Also, the semantics
of `!con` are relaxed so that it will allow a combination of set and
unset operator fields in the dag nodes it's concatenating, with the
restriction that all the operators that are //not// unset still have
to agree with each other.
Reviewers: hfinkel, nhaehnle
Reviewed By: hfinkel, nhaehnle
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71195
Summary:
Recognize wide compares where the wide operand is a splat of a scalar
value in the appropriate range and convert to the immediate variant of
the instruction.
Patch by Graham Hunter
Reviewers: sdesmalen, efriedma, dancgr, rovka, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71009
Summary:
This fixes PR44135.
The special case when we promote a bitcast from a vector to an int
needs special handling when we are on a big-endian target.
Prior to this fix, for the added vec_to_int we see the following in the
SelectionDAG printouts
Type-legalized selection DAG: %bb.1 'foo:bb.1'
SelectionDAG has 9 nodes:
t0: ch = EntryToken
t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
t17: v4i32 = bitcast t2
t23: i32 = extract_vector_elt t17, Constant:i32<3>
t8: ch,glue = CopyToReg t0, Register:i32 $r0, t23
t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1
and I think here the extract_vector_elt is wrong and extracts the value
from the wrong index.
The program program should return the 32 bits made up of the elements at
index 4 and 5 in the vec6 array, but with
t23: i32 = extract_vector_elt t17, Constant:i32<3>
as far as I can tell, we will extract values that originally didn't even
exist in the vec6 vectore.
If we would instead extract the element at index 2 we would get the wanted
values.
With this fix we insert a right shift after the bitcast in
DAGTypeLegalizer::PromoteIntRes_BITCAST which then gives us
Type-legalized selection DAG: %bb.1 'vec_to_int:bb.1'
SelectionDAG has 9 nodes:
t0: ch = EntryToken
t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
t23: v4i32 = bitcast t2
t27: i32 = extract_vector_elt t23, Constant:i32<2>
t8: ch,glue = CopyToReg t0, Register:i32 $r0, t27
t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1
So now we get
t27: i32 = extract_vector_elt t23, Constant:i32<2>
which is what we want.
Similarly, the new int_to_vec testcase exposes a bug where we cast the other
direction. Then we instead need to add a left shift before the bitcast on
big-endian targets for the bits in the input integer to end up at the exptected
place in the vector.
Reviewers: bogner, spatel, craig.topper, t.p.northover, dmgreen, efriedma, SjoerdMeijer, samparker
Reviewed By: efriedma
Subscribers: eli.friedman, bjope, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70942
Summary:
The new OpenMPConstants.h is a location for all OpenMP related constants
(and helpers) to live.
This patch moves the directives there (the enum OpenMPDirectiveKind) and
rewires Clang to use the new location.
Initially part of D69785.
Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim
Subscribers: jholewinski, ppenzin, penzn, llvm-commits, cfe-commits, jfb, guansong, bollu, hiraditya, mgorny
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69853
extern variable usage in BPF is different from traditional
pure user space application. Recent discussion in linux bpf
mailing list has two use cases where debug info types are
required to use extern variables:
- extern types are required to have a suitable interface
in libbpf (bpf loader) to provide kernel config parameters
to bpf programs.
https://lore.kernel.org/bpf/CAEf4BzYCNo5GeVGMhp3fhysQ=_axAf=23PtwaZs-yAyafmXC9g@mail.gmail.com/T/#t
- extern types are required so kernel bpf verifier can
verify program which uses external functions more precisely.
This will make later link with actual external function no
need to reverify.
https://lore.kernel.org/bpf/87eez4odqp.fsf@toke.dk/T/#m8d5c3e87ffe7f2764e02d722cb0d8cbc136880ed
This patch added bpf support to consume such info into BTF,
which can then be used by bpf loader. Function processFuncPrototypes()
only adds extern function definitions into BTF. The functions
with actual definition have been added to BTF in some other places.
Differential Revision: https://reviews.llvm.org/D70697
Making some changes to MIRVRegNamerUtils.cpp to use some more modern c++
features as well as some changes to generally make the code more concise
and more understandable.
I make this an NFCi because in one case I drop the whole
"if (!MO->isDef()) MO->setIsKill(false);" thing that was added in the
original implementation, generally because I don't think this is really
semantically sound. I also changed up the implementation of
VRegRenamer::createVirtualRegisterWithLowerName somewhat because I am
now lower-casing the name unconditionally because I confirmed that that
was in fact aditya_nandakumar@apple.com's intent.
In all other cases, behavior should not be changed.
Differential Revision: https://reviews.llvm.org/D71182
D34393 added MCCodePadder as an infrastructure for padding code with
NOP instructions. It lacked tests and was not being worked on since
then.
Intel has now worked on an assembler patch to mitigate performance loss
after applying microcode update for the Jump Conditional Code Erratum.
https://www.intel.com/content/www/us/en/support/articles/000055650/processors.html
This new patch shares similarity with MCCodePadder, but has a concrete
use case in mind and is being actively developed. The infrastructure it
introduces can potentially be used for general performance improvement
via alignment. Delete the unused MCCodePadder so that people can develop
the new feature from a clean state.
Reviewed By: jyknight, skan
Differential Revision: https://reviews.llvm.org/D71106
As discussed in https://reviews.llvm.org/D69998, we miss to create some dependency edges
if chained more than 2 instructions. Adding an assertion here if someone want to chain
more than 2 instructions.
Differential Revision: https://reviews.llvm.org/D71180
Don't try to fold away shuffles which can't be folded. Fix creation of
shufflevector constant expressions.
Differential Revision: https://reviews.llvm.org/D71147
The generated sequence with whilelo is unintuitive, but it's the best
I could come up with given the limited number of SVE instructions that
interact with scalar registers. The other sequence I was considering
was something like dup+cmpne, but an extra scalar instruction seems
better than an extra vector instruction.
Differential Revision: https://reviews.llvm.org/D71160
Summary:
Following on from rG884351547da2, this patch cleans up the logic for `xxpermdi`
peephole optimizations by converting two layers of nested `if`s to early breaks
and simplifying the logic.
Reviewers: hfinkel, nemanjai, jsji, lkail, #powerpc, steven.zhang
Reviewed By: #powerpc, steven.zhang
Subscribers: wuzish, steven.zhang, hiraditya, kbarton, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71170
Patch by vddvss (Colin Samples).
Summary:
Same as D60846 and D69571 but with a fix for the problem encountered
after them. Both times it was a missing context adjustment in the
handling of PHI nodes.
The reproducers created from the bugs that caused the old commits to be
reverted are included.
Reviewers: nikic, nlopes, mkazantsev, spatel, dlrobertson, uabelho, hakzsam, hans
Subscribers: hiraditya, bollu, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71181
Summary:
Split off of D67120.
Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).
A second try after reverted D71072.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71149
Summary:
This is found during https://reviews.llvm.org/D70758
All the other record forms are having suffix o at the end.
ANDIo8 and ANDISo8 are the only two that put o before 8.
This patch rename them to be consistent with others.
Reviewers: #powerpc, hfinkel, nemanjai, lei, steven.zhang, echristo, jhibbits, joerg
Reviewed By: jhibbits
Subscribers: wuzish, hiraditya, kbarton, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70928
Refactor FinishCall to be more easily understandable as a precursor to
implementing indirect calls for AIX. The refactor tries to group similar
code together at the cost of some code duplication. The high level
overview of the refactor:
- Adds a number of helper functions for things like:
* Determining if a call is indirect.
* What the Opcode for a call is.
* Transforming the callee for a direct function call.
* Extracting the Chain operand from a CallSeqStart node.
* Building the operands of the call.
- Adds helpers for building the indirect call DAG nodes
(excluding the call instruction itself which is created in
`FinishCall`).
- Removes PrepareCall, which has been subsumed by the
helpers.
- Rename 'InFlag' to 'Glue'.
- FinishCall has been refactored to:
1) Set TOC pointer usage on the DAG for the TOC based
subtargets.
2) Calculate if a call is indirect.
3) Determine the Opcode to use for the call
instruction.
4) Transform the Callee for direct calls, or build
the DAG nodes for indirect calls.
5) Buildup the call operands.
6) Emit the call instruction.
7) If needed, emit the callSeqEnd Node and
finish lowering by calling `LowerCallResult`
Differential Revision: https://reviews.llvm.org/D70126
I rewrote the isel tablegen for MVE immediate shifts, and accidentally
removed the `let Predicates=[HasMVEInt]` that was wrapping the old
version, which seems to have allowed those rules to cause trouble on
non-MVE targets. That's what I get for only re-running the MVE tests.
Summary:
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.
There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.
In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71065
The cycle values in modulo scheduling results can be negative.
The result of ModuloSchedule::getCycle() must be received as an int type.
Patch by Masaki Arai!
Differential Revision: https://reviews.llvm.org/D71122
CodeGenPrepare::placeDebugValues moves variable location intrinsics to be
immediately after the Value they refer to. This makes tracking of locations
very easy; but it changes the order in which assignments appear to the
debugger, from the source programs order to the order in which the
optimised program computes values. This then leads to PR43986 and PR38754,
where variable locations that were in a conditional block are made
unconditional, which is highly misleading.
This patch adjusts placeDbgValues to only re-order variable location
intrinsics if they use a Value before it is defined, significantly reducing
the damage that it does. This is still not 100% safe, but the rest of
CodeGenPrepare needs polishing to correctly update debug info when
optimisations are performed to fully fix this.
This will probably break downstream debuginfo tests -- if the
instruction-stream position of variable location changes isn't the focus of
the test, an easy fix should be to manually apply placeDbgValues' behaviour
to the failing tests, moving dbg.value intrinsics next to SSA variable
definitions thus:
%foo = inst1
%bar = ...
%baz = ...
void call @llvm.dbg.value(metadata i32 %foo, ...
to
%foo = inst1
void call @llvm.dbg.value(metadata i32 %foo, ...
%bar = ...
%baz = ...
This should return your test to exercising whatever it was testing before.
Differential Revision: https://reviews.llvm.org/D58453
Summary:
This patch adds intrinsics for the following MVE instructions:
* VCADD, VHCADD
* VCMUL
* VCMLA
Each of the above 3 groups has a corresponding new LLVM IR intrinsic.
Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen
Reviewed By: MarkMurrayARM
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71190
With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.
Differential Revision: https://reviews.llvm.org/D70968
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
This adds some extra cost model tests for shifts, and does some minor
adjustments to some Neon code to make it clear as to what it applies to.
Both NFC.
Summary:
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.
This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.
This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.
This also fixes situations in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.
This also allows us to handle cases like this:
$ebx = [...]
$rdi = MOVSX64rr32 $ebx
$esi = MOV32rr $edi
CALL64pcrel32 @call
The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.
This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.
I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: djtodoro, vsk
Subscribers: ormris, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D70431
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.
This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.
This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.
This also fixes a case in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.
This also allows us to handle cases like this:
$ebx = [...]
$rdi = MOVSX64rr32 $ebx
$esi = MOV32rr $edi
CALL64pcrel32 @call
The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.
This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.
I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.
This caused "Too many bits for uint64_t" asserts when building Chromium. See
https://crbug.com/1031978#c2 for a reproducer. I'll follow up on the
llvm-commits thread with a creduced version.
> ARMCodeGenPrepare has already been generalized and renamed to
> TypePromotion. We've had it enabled and tested downstream for a
> while, so enable it by default.
>
> Differential Revision: https://reviews.llvm.org/D70998
This is another transform suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
The backend for some targets already manages to get
this if it converts copysign to bitwise logic.
MVE doesn't have the range of shuffle instructions available in Neon. We
also cannot use the trick of cutting a difficult vector shuffle in half
to simplify things. Instead we need to be more careful about how we
lower shuffles.
This patch adds an extra combine that attempts to find "whole lane"
vmovs when lowering shuffles of smaller types. This helps us make some
shuffles a lot simpler, generating single lane movs for the parts that
can make use of it, falling back to the original shuffle for the rest.
Differential Revision: https://reviews.llvm.org/D69509
Alas, using half the available vector registers in a single instruction
is just too much for the register allocator to handle. The mve-vldst4.ll
test here fails when these instructions are enabled at present. This
patch disables the generation of VLD4 and VST4 by adding a
mve-max-interleave-factor option, which we currently default to 2.
Differential Revision: https://reviews.llvm.org/D71109
Currently we fail to pick the right insertion point when
PreviousLastPart of a first-order-recurrence is a PHI node not in the
LoopVectorBody. This can happen when PreviousLastPart is produce in a
predicated block. In that case, we should pick the insertion point in
the BB the PHI is in.
Fixes PR44020.
Reviewers: hsaito, fhahn, Ayal, dorit
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D71071
My patch 9db13b5a7d seems to have
caused some build bots to fail due to warnings that appear only
when using -Wcovered-switch-default.
This patch is an attempt to fix this by trying to avoid both the warning
"default label in switch which covers all enumeration values"
for the inner switch statements and at the same time the warning
"this statement may fall through"
for the outer switch statement in getVectorComparison
(SystemZISelLowering.cpp).
Generate types for global variables with "weak" attribute.
Keep allocation scope the same for both weak and non-weak
globals as ELF symbol table can determine whether a global
symbol is weak or not.
Differential Revision: https://reviews.llvm.org/D71162
AssumptionCache can be null in SimplifyCFGOptions. However, FoldCondBranchOnPHI() was not properly handling that when passing a null AssumptionCache to simplifyCFG.
Patch by Rodrigo Caetano Rocha <rcor.cs@gmail.com>
Reviewers: fhahn, lebedev.ri, spatel
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D69963
This adds support for constrained floating-point comparison intrinsics.
Specifically, we add:
declare <ty2>
@llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
declare <ty2>
@llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).
The condition code is implemented as a metadata string. The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).
These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations. Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes. The patch includes support
for the common legalization operations for those nodes.
The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).
Differential Revision: https://reviews.llvm.org/D69281
The file is intended to gather various VPlan transformations, not only
CFG related transforms. Actually, the only transformation there is not
CFG related.
Reviewers: Ayal, gilr, hsaito, rengolin
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D70732
Summary:
This patch fixes an issue where the PPC MI peephole optimization pass incorrectly remove a vector swap.
Specifically, the pass can combine a splat/swap to a splat/copy. It uses `TargetRegisterInfo::lookThruCopyLike` to determine that the operands to the splat are the same. However, the current logic only compares the operands based on register numbers. In the case where the splat operands are ultimately feed from the same physical register, the pass can incorrectly remove a swap if the feed register for one of the operands has been clobbered.
This patch adds a check to ensure that the registers feeding are both virtual registers or the operands to the splat or swap are both the same register.
Here is an example in pseudo-MIR of what happens in the test cased added in this patch:
Before PPC MI peephole optimization:
```
%arg = XVADDDP %0, %1
$f1 = COPY %arg.sub_64
call double rint(double)
%res.first = COPY $f1
%vec.res.first = SUBREG_TO_REG 1, %res.first, %subreg.sub_64
%arg.swapped = XXPERMDI %arg, %arg, 2
$f1 = COPY %arg.swapped.sub_64
call double rint(double)
%res.second = COPY $f1
%vec.res.second = SUBREG_TO_REG 1, %res.second, %subreg.sub_64
%vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0
%vec.res = XXPERMDI %vec.res.splat, %vec.res.splat, 2
; %vec.res == [ %vec.res.second[0], %vec.res.first[0] ]
```
After optimization:
```
; ...
%vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0
; lookThruCopyLike(%vec.res.first) == lookThruCopyLike(%vec.res.second) == $f1
; so the pass replaces the swap with a copy:
%vec.res = COPY %vec.res.splat
; %vec.res == [ %vec.res.first[0], %vec.res.second[0] ]
```
As best as I can tell, this has occurred since r288152, which added support for lowering certain vector operations to direct moves in the form of a splat.
Committed for vddvss (Colin Samples). Thanks Colin for the patch!
Differential Revision: https://reviews.llvm.org/D69497
Before this change, the *InstPrinter.cpp files of each target where some
of the slowest objects to compile in all of LLVM. See this snippet produced by
ClangBuildAnalyzer:
https://reviews.llvm.org/P8171$96
Search for "InstPrinter", and see that it shows up in a few places.
Tablegen was emitting a large switch containing a sequence of operand checks,
each of which created many conditions and many BBs. Register allocation and
jump threading both did not scale well with such a large repetitive sequence of
basic blocks.
So, this change essentially turns those control flow structures into
data. The previous structure looked like:
switch (Opc) {
case TGT::ADD:
// check alias 1
if (MI->getOperandCount() == N && // check num opnds
MI->getOperand(0).isReg() && // check opnd 0
...
MI->getOperand(1).isImm() && // check opnd 1
AsmString = "foo";
break;
}
// check alias 2
if (...)
...
return false;
The new structure looks like:
OpToPatterns: Sorted table of opcodes mapping to pattern indices.
\->
Patterns: List of patterns. Previous table points to subrange of
patterns to match.
\->
Conds: The if conditions above encoded as a kind and 32-bit value.
See MCInstPrinter.cpp for the details of how the new data structures are
interpreted.
Here are some before and after metrics.
Time to compile AArch64InstPrinter.cpp:
0m29.062s vs. 0m2.203s
size of the obj:
3.9M vs. 676K
size of clang.exe:
97M vs. 96M
I have not benchmarked disassembly performance, but typically
disassemblers are bottlenecked on IO and string processing, not alias
matching, so I'm not sure it's interesting enough to be worth doing.
Reviewers: RKSimon, andreadb, xbolva00, craig.topper
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D70650
The xor'ing behaviour is only used for msvc/crt environments, when we're targeting
macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing
when targeting win32-macho to be consistent.
Differential Revision: https://reviews.llvm.org/D71095
D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence:
// Sel = Src < 0x8000000000000000
// Val = select Sel, Src, Src - 0x8000000000000000
// Ofs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val) ^ Ofs
The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.)
Instead, I'd suggest to use the following sequence:
// Sel = Src < 0x8000000000000000
// FltOfs = select Sel, 0, 0x8000000000000000
// IntOfs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val - FltOfs) ^ IntOfs
In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway).
In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit.
There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.)
Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper
Differential Revision: https://reviews.llvm.org/D67105
Summary:
musttail calls should not require allocating extra stack for arguments.
Updates to arguments passed in memory should happen in place before the
epilogue.
This bug was mostly a missed optimization, unless inalloca was used and
store to push conversion fired.
If a reserved call frame was used for an inalloca musttail call, the
call setup and teardown instructions would be deleted, and SP
adjustments would be inserted in the prologue and epilogue. You can see
these are removed from several test cases in this change.
In the case where the stack frame was not reserved, i.e. call frame
optimization fires and turns argument stores into pushes, then the
imbalanced call frame setup instructions created for inalloca calls
become a problem. They remain in the instruction stream, resulting in a
call setup that allocates zero bytes (expected for inalloca), and a call
teardown that deallocates the inalloca pack. This deallocation was
unbalanced, leading to subsequent crashes.
Reviewers: hans
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71097
Summary:
Sample profile loader of AutoFDO tries to replay previous inlining using context sensitive profile. The replay only repeats inlining if the call site block is hot. As a result it punts inlining of small functions, some of which can be beneficial for size, and will still be inlined by CSGCC inliner later. The oscillation between sample profile loader's inlining and regular CGSSC inlining cause unnecessary loss of context-sensitive profile. It doesn't have much impact for inline decision itself, but it negatively affects post-inline profile quality as CGSCC inliner have to scale counts which is not as accurate as the original context sensitive profile, and bad post-inline profile can misguide code layout.
This change added regular Inline Cost calculation for sample profile loader, so we can inline small functions upfront under switch -sample-profile-inline-size. In addition -sample-profile-cold-inline-threshold is added so we can tune the separate size threshold - currently the default is chosen to be the same as regular inliner's cold call-site threshold.
Reviewers: wmi, davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70750
GCC says:
.../llvm/lib/DebugInfo/GSYM/FunctionInfo.cpp:195:12:
error: ‘InfoType’ is not a class, namespace, or enumeration
case InfoType::EndOfList:
^
Presumably, GCC thinks InfoType is a variable here. Work around it by
using the name IT as is done above.
This is a follow-up to D70607 where we made any
extract element on SLM more costly than default. But that is
pessimistic for extract from element 0 because that corresponds
to x86 movd/movq instructions. These generally have >1 cycle
latency, but they are probably implemented as single uop
instructions.
Note that no vectorization tests are affected by this change.
Also, no targets besides SLM are affected because those are
falling through to the default cost of 1 anyway. But this will
become visible/important if we add more specializations via cost
tables.
Differential Revision: https://reviews.llvm.org/D71023
Summary:
Split off of D67120.
Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71072
Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead.
To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks:
make sure there is at least one duplication in current work set.
the number of duplication should not exceed the number of successors.
The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain.
Differential Revision: https://reviews.llvm.org/D64376
SUMMARY:
if the size of Csect is zero, the Csect do not need write any data into sections
for example, the TOC Csect has zero size, it do not need invoke a
Asm.writeSectionData(W.OS, Csect.MCCsect, Layout);
Reviewers: daltenty
Subscribers: rupprecht, seiyai,hiraditya
Differential Revision: https://reviews.llvm.org/D71120
Summary:
The immediate forms of the MVE VQSHL instruction have MC names like
`MVE_VSLIimms8` and `MVE_VSLIimmu32`. Those names are confusing,
because VSLI is a completely different shift instruction with no
semantic relation to VQSHL. But it just happens to be defined
immediately before VQSHL in `ARMInstrMVE.td`, so this looks like a
copy-paste error. Renamed the ids to match the instruction name.
Reviewers: ostannard, dmgreen, MarkMurrayARM, miyuki
Reviewed By: miyuki
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71114
Summary:
When trying to calculate the offsets for the jump table entries
we fail to take into account the block alignment, which could be
greater than 4 bytes. This led to cases where the jump table
offset was too big to fit in a byte.
Reviewers: t.p.northover, sdesmalen, ostannard
Reviewed By: ostannard
Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits
Committed on behalf of David Sherwood (david-arm)
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70533
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit moves GEP operand queries controlling how GEPs are widened to a
dedicated recipe and extracts GEP widening code to its own ILV method taking
those recorded decisions as arguments. This reduces ingredient def-use usage by
ILV as a step towards full VPlan-based def-use relations.
Differential revision: https://reviews.llvm.org/D69067