Summary:
This copy ensures that debug location information is kept for
compressed instructions. There are places where both compressInstruction and
uncompressInstruction are called that were not doing this copy, discarding some
debug info.
This change merely moves the copy into the generated file, so you cannot forget
to copy the location over when compressing or uncompressing.
Reviewers: asb, luismarques
Reviewed By: luismarques
Subscribers: sameer.abuasal, aprantl, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67493
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
AMDGPU was the last in tree target to use this tablegen mode. I plan to
split up the global intrinsic enum similar to the way that clang
diagnostics are split up today. I don't plan to build on this mode.
Reviewers: arsenm, echristo, efriedma
Reviewed By: echristo
Differential Revision: https://reviews.llvm.org/D71318
Before this change, the *InstPrinter.cpp files of each target where some
of the slowest objects to compile in all of LLVM. See this snippet produced by
ClangBuildAnalyzer:
https://reviews.llvm.org/P8171$96
Search for "InstPrinter", and see that it shows up in a few places.
Tablegen was emitting a large switch containing a sequence of operand checks,
each of which created many conditions and many BBs. Register allocation and
jump threading both did not scale well with such a large repetitive sequence of
basic blocks.
So, this change essentially turns those control flow structures into
data. The previous structure looked like:
switch (Opc) {
case TGT::ADD:
// check alias 1
if (MI->getOperandCount() == N && // check num opnds
MI->getOperand(0).isReg() && // check opnd 0
...
MI->getOperand(1).isImm() && // check opnd 1
AsmString = "foo";
break;
}
// check alias 2
if (...)
...
return false;
The new structure looks like:
OpToPatterns: Sorted table of opcodes mapping to pattern indices.
\->
Patterns: List of patterns. Previous table points to subrange of
patterns to match.
\->
Conds: The if conditions above encoded as a kind and 32-bit value.
See MCInstPrinter.cpp for the details of how the new data structures are
interpreted.
Here are some before and after metrics.
Time to compile AArch64InstPrinter.cpp:
0m29.062s vs. 0m2.203s
size of the obj:
3.9M vs. 676K
size of clang.exe:
97M vs. 96M
I have not benchmarked disassembly performance, but typically
disassemblers are bottlenecked on IO and string processing, not alias
matching, so I'm not sure it's interesting enough to be worth doing.
Reviewers: RKSimon, andreadb, xbolva00, craig.topper
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D70650
This reverts commit 3f76260dc0.
Breaks at least these tests on Windows:
Clang :: Driver/clang-offload-bundler.c
Clang :: Driver/clang-offload-wrapper.c
For lldb and dsymutil, the command guide is essentially a copy of its
help output generated by libOption. Making sure the two stay in sync is
tedious and error prone. Given that we already generate the help from a
tablegen file, we might as well generate the RST as well.
This adds a tablegen backend for generating Sphinx/RST command guides
from the tablegen file.
Differential revision: https://reviews.llvm.org/D70610
* Implements scalable size queries for MVTs, split out from D53137.
* Contains a fix for FindMemType to avoid using scalable vector type
to contain non-scalable types.
* Explicit casts for several places where implicit integer sign
changes or promotion from 32 to 64 bits caused problems.
* CodeGenDAGPatterns will treat scalable and non-scalable vector types
as different.
Reviewers: greened, cameron.mcinally, sdesmalen, rovka
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D66871
AMDGPU has some atomic instructions that do not return the previous
result, and can only be selected if there are no uses. The source
pattern will only match if the use is empty, so it should be safe to
discard the result.
Summary:
To drive the automaton we used a uint64_t as an action type. This
contained the transition's resource requirements as a conjunction:
(a OR b) AND (b OR c)
We encoded this conjunction as a sequence of four 16-bit bitmasks.
This limited the number of addressable functional units to 16, which
is quite low and has bitten many people in the past.
Instead, the DFAEmitter now generates a lookup table from InstrItinerary
class (index of the ItinData inside the ProcItineraries) to an internal
action index which is essentially a dense embedding of the conjunctive
form. Because we never materialize the conjunctive form, we no longer
have the 16 FU restriction.
In this patch we limit to 64 functional units due to using a uint64_t
bitmask in the DFAEmitter. Now that we've decoupled these representations
we can increase this in future.
Reviewers: ThomasRaoux, kparzysz, majnemer
Reviewed By: ThomasRaoux
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69110
Summary:
Found by PVS Studio
Not familiar with this code; no testcase.
Reviewers: craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69741
If there is a dag node with a variable number of operands that has at
least N operands (for some non-negative N), and multiple patterns with
that node with different number of operands, we would drop the number of
operands check in patterns with N operands, presumably because it's
guaranteed in such case that none of the per-operand checks will access
the operand list out-of-bounds.
Except semantically the check is about having exactly N operands, not at
least N operands, and a backend might rely on it to disambiguate
different patterns.
In this patch we change the condition on emitting the number of operands
check from "the instruction is not guaranteed to have at least as many
operands as are checked by the pattern being matched" to "the
instruction is not guaranteed to have a specific number of operands".
We're relying (still) on the rest of the CodeGenPatterns mechanics to
validate that the pattern itself doesn't try to access more operands
than there is in the instruction in cases when the instruction does have
fixed number of operands, and on the machine verifier to validate at
runtime that particular MIs like that satisfy the constraint as well.
Reviewers: dsanders, qcolombet
Reviewed By: qcolombet
Subscribers: arsenm, rovka, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69653
D68992 / rL375086 refactored the packetizer and removed a bunch of logic. Unfortunately it creates an Automaton object whenever a DFAPacketizer is required. These objects have no longevity, and in particular on a debug build the population of the Automaton's transition map from the underlying table is very slow (because it is called ~10 times per MachineFunction, in the testcase I'm looking at).
This patch changes Automaton to wrap its underlying constant data in std::shared_ptr, which allows trivial copy construction. The DFAPacketizer creation function now creates a static archetypical Automaton and copies that whenever a new DFAPacketizer is required.
This takes a testcase down from ~20s to ~0.5s in debug mode.
llvm-svn: 375240
Summary:
This is a NFC change that removes the NFA->DFA construction and emission logic from DFAPacketizerEmitter and instead uses the generic DFAEmitter logic. This allows DFAPacketizer to use the Automaton class from Support and remove a bunch of logic there too.
After this patch, DFAPacketizer is mostly logic for grepping Itineraries and collecting functional units, with no state machine logic. This will allow us to modernize by removing the 16-functional-unit limit and supporting non-itinerary functional units. This is all for followup patches.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68992
llvm-svn: 375086
Summary:
Each generated helper can be configured to generate an option that disables
rules in that helper. This can be used to bisect rulesets.
The disable bits are stored in a SparseVector as this is very cheap for the
common case where nothing is disabled. It gets more expensive the more rules
are disabled but you're generally doing that for debug purposes where
performance is less of a concern.
Depends on D68426
Reviewers: volkan, bogner
Reviewed By: volkan
Subscribers: hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68438
llvm-svn: 375067
Summary:
This is just moving the existing C++ code around and will be NFC w.r.t
AArch64. Renamed 'CombineBr' to something more descriptive
('ElideByByInvertingCond') at the same time.
The remaining combines in AArch64PreLegalizeCombiner require features that
aren't implemented at this point and will be hoisted as they are added.
Depends on D68424
Reviewers: bogner, volkan
Subscribers: kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68426
llvm-svn: 375057
Assume that, ModelA has scheduling resource for InstA and ModelB has scheduling resource for InstB. This is what the llvm::MCSchedClassDesc looks like:
llvm::MCSchedClassDesc ModelASchedClasses[] = {
...
InstA, 0, ...
InstB, -1,...
};
llvm::MCSchedClassDesc ModelBSchedClasses[] = {
...
InstA, -1,...
InstB, 0,...
};
The -1 means invalid num of macro ops, while it is valid if it is >=0. This is what we look like now:
llvm::MCSchedClassDesc ModelASchedClasses[] = {
...
InstA, 0, ...
InstB, 0,...
};
llvm::MCSchedClassDesc ModelBSchedClasses[] = {
...
InstA, 0,...
InstB, 0,...
};
And compiler hit the assertion here because the SCDesc is valid now for both InstA and InstB.
Differential Revision: https://reviews.llvm.org/D67950
llvm-svn: 374524
When an instruction has an encoding definition for only a subset of
the available HwModes, ensure we just avoid generating an encoding
rather than crash.
llvm-svn: 374150
Summary:
While working with DagInit's, it's often the case that you expect the
operator to be a reference to a def. This patch adds a wrapper for this
common case to reduce the amount of boilerplate callers need to duplicate
repeatedly.
getOperatorAsDef() returns the record if the DagInit has an operator that is
a DefInit. Otherwise, it prints a fatal error.
There's only a few pre-existing examples in LLVM at the moment and I've
left a few instances of the code this simplifies as they had more specific
error messages than the generic one this produces. I'm going to be using
this a fair bit in my subsequent patches.
Reviewers: bogner, volkan, nhaehnle
Reviewed By: nhaehnle
Subscribers: nhaehnle, hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68424
llvm-svn: 374101
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.
Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.
llvm-svn: 373937
Summary:
This patch introduces -gen-automata, a backend for generating deterministic finite-state automata.
DFAs are already generated by the -gen-dfa-packetizer backend. This backend is more generic and will
hopefully be used to implement the DFA generation (and determinization) for the packetizer in the
future.
This backend allows not only generation of a DFA from an NFA (nondeterministic finite-state
automaton), it also emits sidetables that allow a path through the DFA under a sequence of inputs to
be analyzed, and the equivalent set of all possible NFA transitions extracted.
This allows a user to not just answer "can my problem be solved?" but also "what is the
solution?". Clearly this analysis is more expensive than just playing a DFA forwards so is
opt-in. The DFAPacketizer has this behaviour already but this is a more compact and generic
representation.
Examples are bundled in unittests/TableGen/Automata.td. Some are trivial, but the BinPacking example
is a stripped-down version of the original target problem I set out to solve, where we pack values
(actually immediates) into bins (an immediate pool in a VLIW bundle) subject to a set of esoteric
constraints.
Reviewers: t.p.northover
Subscribers: mgorny, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67968
llvm-svn: 373718
Summary:
This will handle expansion of C++ fragments in the declarative combiner
including custom predicates, and escapes into C++ to aid the migration
effort.
Fixed the -DLLVM_LINK_LLVM_DYLIB=ON using DISABLE_LLVM_LINK_LLVM_DYLIB when
creating the library. Apparently it automatically links to libLLVM.dylib
and we don't want that from tablegen.
Reviewers: bogner, volkan
Subscribers: mgorny, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68288
> llvm-svn: 373551
llvm-svn: 373651
Summary:
This will handle expansion of C++ fragments in the declarative combiner
including custom predicates, and escapes into C++ to aid the migration
effort.
Reviewers: bogner, volkan
Subscribers: mgorny, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68288
llvm-svn: 373551
Summary:
This is the first of a series of patches extracted from a much bigger WIP
patch. It merely establishes the tblgen pass and the way empty combiner
helpers are declared and integrated into a combiner info.
The tablegen pass takes a -combiners option to select the combiner helper
that will be generated. This can be given multiple values to generate
multiple combiner helpers at once. Doing so helps to minimize parsing
overhead.
The reason for creating a GlobalISel subdirectory in utils/TableGen is that
there will be quite a lot of non-pass files (~15) by the time the patch
series is done.
Reviewers: volkan
Subscribers: mgorny, hiraditya, simoncook, Petar.Avramovic, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68286
llvm-svn: 373527
Summary:
This allows intrinsics such as the following to be defined:
- declare <n x 4 x i32> @llvm.something.nxv4f32(<n x 4 x i32>, <n x 4 x i1>, <n x 4 x float>)
...where <n x 4 x i32> is derived from <n x 4 x float>, but
the element needs bitcasting to int.
Reviewers: c-rhodes, sdesmalen, rovka
Reviewed By: c-rhodes
Subscribers: tschuett, hiraditya, jdoerfert, llvm-commits, cfe-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68021
llvm-svn: 373437
Previously the match was ambiguous and VMAXPS/PD and VMAXCPS/PD
were mapped to the same VEX instruction. But we should keep
the commutableness when change the opcode.
llvm-svn: 373303
https://reviews.llvm.org/D66773
The OpTypes::OperandType was creating an enum for all records that
inherit from Operand, but in reality there are operands for instructions
that inherit from other types too. In particular, RegisterOperand and
RegisterClass. This commit adds those types to the list of operand types
that are tracked by the OperandType enum.
Patch by: nlguillemot
llvm-svn: 372641
We're now using a lot more TargetConstant nodes in SelectionDAG.
But we were still telling isel to convert some of them
to TargetConstants even though they already are. This is because
isel emits a conversion anytime the output pattern has a an 'imm'.
I guess for patterns in instructions we take the 'timm' from the
'set' pattern, but for Pat patterns with explcicit output we
previously had to say 'imm' since 'timm' wasn't allowed in outputs.
llvm-svn: 372525
Summary:
Both match the type of another intrinsic parameter of a vector type, but where each element is subdivided to form a vector with more elements of a smaller type.
Subdivide2Argument allows intrinsics such as the following to be defined:
- declare <vscale x 4 x i32> @llvm.something.nxv4i32(<vscale x 8 x i16>)
Subdivide4Argument allows intrinsics such as:
- declare <vscale x 4 x i32> @llvm.something.nxv4i32(<vscale x 16 x i8>)
Tests are included in follow up patches which add intrinsics using these types.
Reviewers: sdesmalen, SjoerdMeijer, greened, rovka
Reviewed By: sdesmalen
Subscribers: rovka, tschuett, jdoerfert, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67549
llvm-svn: 372380
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)
This was missing one switch to getTargetConstant in an untested case.
llvm-svn: 372338
Much like ValueTypeByHwMode/RegInfoByHwMode, this patch allows targets
to modify an instruction's encoding based on HwMode. When the
EncodingInfos field is non-empty the Inst and Size fields of the Instruction
are ignored and taken from EncodingInfos instead.
As part of this promote getHwMode() from TargetSubtargetInfo to MCSubtargetInfo.
This is NFC for all existing targets - new code is generated only if targets
use EncodingByHwMode.
llvm-svn: 372320
This broke the Chromium build, causing it to fail with e.g.
fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>
See llvm-commits thread of r372285 for details.
This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.
> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372314
Encode them directly as an imm argument to G_INTRINSIC*.
Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.
This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.
SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.
Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.
The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.
This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372285
Summary:
Also fixup rL371928 for cases that occur on our out-of-tree backend
There were still quite a few intermediate APInts and this caused the
compile time of MCCodeEmitter for our target to jump from 16s up to
~5m40s. This patch, brings it back down to ~17s by eliminating pretty
much all of them using two new APInt functions (extractBitsAsZExtValue(),
insertBits() but with a uint64_t). The exact conditions for eliminating
them is that the field extracted/inserted must be <=64-bit which is
almost always true.
Note: The two new APInt API's assume that APInt::WordSize is at least
64-bit because that means they touch at most 2 APInt words. They
statically assert that's true. It seems very unlikely that someone
is patching it to be smaller so this should be fine.
Reviewers: jmolloy
Reviewed By: jmolloy
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67686
llvm-svn: 372243
The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't.
llvm-svn: 372146
* Reordered MVT simple types to group scalable vector types
together.
* New range functions in MachineValueType.h to only iterate over
the fixed-length int/fp vector types.
* Stopped backends which don't support scalable vector types from
iterating over scalable types.
Reviewers: sdesmalen, greened
Reviewed By: greened
Differential Revision: https://reviews.llvm.org/D66339
llvm-svn: 372099
Some VLIW instruction sets are Very Long Indeed. Using uint64_t constricts the Inst encoding to 64 bits (naturally).
This change switches CodeEmitter to a mode that uses APInts when Inst's bitwidth is > 64 bits (NFC for existing targets).
When Inst.BitWidth > 64 the prototype changes to:
void TargetMCCodeEmitter::getBinaryCodeForInstr(const MCInst &MI,
SmallVectorImpl<MCFixup> &Fixups,
APInt &Inst,
APInt &Scratch,
const MCSubtargetInfo &STI);
The Inst parameter returns the encoded instruction, the Scratch parameter is used internally for manipulating operands and is exposed so that the underlying storage can be reused between calls to getBinaryCodeForInstr. The goal is to elide any APInt constructions that we can.
Similarly the operand encoding prototype changes to:
getMachineOpValue(const MCInst &MI, const MCOperand &MO, APInt &op, SmallVectorImpl<MCFixup> &Fixups, const MCSubtargetInfo &STI);
That is, the operand is passed by reference as APInt rather than returned as uint64_t.
To reiterate, this APInt mode is enabled only when Inst.BitWidth > 64, so this change is NFC for existing targets.
llvm-svn: 371928
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.
llvm-svn: 371722
The scalar f64 patterns don't work yet because they fail on multiple
results from the unused implicit def of scc in the result bit
operation.
llvm-svn: 371542
Reapply with fix to reduce resources required by the compiler - use
unsigned[2] instead of std::pair. This causes clang and gcc to compile
the generated file multiple times faster, and hopefully will reduce
the resource requirements on Visual Studio also. This fix is a little
ugly but it's clearly the same issue the previous author of
DFAPacketizer faced (the previous tables use unsigned[2] rather uglily
too).
This patch allows the DFAPacketizer to be queried after a packet is formed to work out which
resources were allocated to the packetized instructions.
This is particularly important for targets that do their own bundle packing - it's not
sufficient to know simply that instructions can share a packet; which slots are used is
also required for encoding.
This extends the emitter to emit a side-table containing resource usage diffs for each
state transition. The packetizer maintains a set of all possible resource states in its
current state. After packetization is complete, all remaining resource states are
possible packetization strategies.
The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default
(most uses of the packetizer like MachinePipeliner don't care and don't need the extra
maintained state).
Differential Revision: https://reviews.llvm.org/D66936
llvm-svn: 371399
This patch allows the DFAPacketizer to be queried after a packet is formed to work out which
resources were allocated to the packetized instructions.
This is particularly important for targets that do their own bundle packing - it's not
sufficient to know simply that instructions can share a packet; which slots are used is
also required for encoding.
This extends the emitter to emit a side-table containing resource usage diffs for each
state transition. The packetizer maintains a set of all possible resource states in its
current state. After packetization is complete, all remaining resource states are
possible packetization strategies.
The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default
(most uses of the packetizer like MachinePipeliner don't care and don't need the extra
maintained state).
Differential Revision: https://reviews.llvm.org/D66936
........
Reverted as this is causing "compiler out of heap space" errors on MSVC 2017/19 NDEBUG builds
llvm-svn: 371393
This patch allows the DFAPacketizer to be queried after a packet is formed to work out which
resources were allocated to the packetized instructions.
This is particularly important for targets that do their own bundle packing - it's not
sufficient to know simply that instructions can share a packet; which slots are used is
also required for encoding.
This extends the emitter to emit a side-table containing resource usage diffs for each
state transition. The packetizer maintains a set of all possible resource states in its
current state. After packetization is complete, all remaining resource states are
possible packetization strategies.
The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default
(most uses of the packetizer like MachinePipeliner don't care and don't need the extra
maintained state).
Differential Revision: https://reviews.llvm.org/D66936
llvm-svn: 371198
This was only using the correct register constraints if this was the
final result instruction. If the extract was a sub instruction of the
result, it would attempt to use GIR_ConstrainSelectedInstOperands on a
COPY, which won't work. Move the handling to
createAndImportSubInstructionRenderer so it works correctly.
I don't fully understand why runOnPattern and
createAndImportSubInstructionRenderer both need to handle these
special cases, and constrain them with slightly different methods. If
I remove the runOnPattern handling, it does break the constraint when
the final result instruction is EXTRACT_SUBREG.
llvm-svn: 371150
This partially adds support for patterns with REG_SEQUENCE. The source
patterns are now accepted, but the pattern is still rejected due to
missing support for the instruction renderer.
llvm-svn: 370920
The Hexagon itineraries are cunningly crafted such that functional units between
itineraries do not clash. Because all itineraries are bundled into the same DFA,
a functional unit index clash would cause an incorrect DFA to be generated.
A workaround for this is to ensure all itineraries declare the universe of all
possible functional units, but this isn't ideal for three reasons:
1) We only have a limited number of FUs we can encode in the packetizer, and
using the universe causes us to hit the limit without care.
2) Silent codegen faults are bad, and careful triage of the FU list shouldn't
be required.
3) Smooshing all itineraries into the same automaton allows combinations of
instruction classes that cannot exist, which bloats the table.
A simple solution is to allow "namespacing" packetizers.
Differential Revision: https://reviews.llvm.org/D66940
llvm-svn: 370508
This is a special case because one node maps to two different G_
instructions, and the operand order is changed.
This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually
selected for now since it has the SALU and VALU complication to deal
with.
llvm-svn: 370280
Reuse the logic for INSERT_SUBREG to also import SUBREG_TO_REG patterns.
- Split `inferSuperRegisterClass` into two functions, one which tries to use
an existing TreePatternNode (`inferSuperRegisterClassForNode`), and one that
doesn't. SUBREG_TO_REG doesn't have a node to leverage, which is the cause
for the split.
- Rename GlobalISelEmitterInsertSubreg.td to GlobalISelEmitterSubreg.td and
update it.
- Update impacted tests in the AArch64 and X86 backends.
This is kind of a hit/miss for code size improvements/regressions. E.g. in
add-ext.ll, we now get some identity copies. This isn't really anything the
importer can handle, since it's caused by a later pass introducing the copy for
the sake of correctness.
Differential Revision: https://reviews.llvm.org/D66769
llvm-svn: 370254
I thought `llvm::sort` was stable for some reason but it's not.
Use `llvm::stable_sort` in `CodeGenTarget::getSuperRegForSubReg`.
Original patch: https://reviews.llvm.org/D66498
llvm-svn: 370084
When EXPENSIVE_CHECKS are enabled, GlobalISelEmitterSubreg.td doesn't get
stable output.
Reverting while I debug it.
See: https://reviews.llvm.org/D66498
llvm-svn: 370080
Summary:
This patch adds support for scalable vectors in intrinsics, enabling
intrinsics such as the following to be defined:
declare <vscale x 4 x i32> @llvm.something.nxv4i32(<vscale x 4 x i32>)
Support for this is implemented by defining a new type descriptor for
scalable vectors and adding mangling support for scalable vector types
in the name mangling scheme used by 'any' types in intrinsic signatures.
Tests have been added for IRBuilder to test scalable vectors work as
expected when using intrinsics through this interface. This required
implementing an intrinsic that is explicitly defined with scalable
vectors, e.g. LLVMType<nxv4i32>, an SVE floating-point convert
intrinsic was used for this. The behaviour of the overloaded type
LLVMScalarOrSameVectorWidth with scalable vectors is tested using the
existing masked load intrinsic. Also added an .ll test to test the
Verifier catches a bad intrinsic argument when passing a fixed-width
predicate (mask) to the masked.load intrinsic where a scalable is
expected.
Patch by Paul Walker
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D65930
llvm-svn: 370053
This teaches the importer to handle INSERT_SUBREG instructions.
We were missing patterns involving INSERT_SUBREG in AArch64. It appears in
AArch64InstrInfo.td 107 times, and 14 times in AArch64InstrFormats.td.
To meaningfully import it, the GlobalISelEmitter needs to know how to infer a
super register class for a given register class.
This patch introduces the following:
- `getSuperRegForSubReg`, a function which finds the largest register class
which supports a value type and subregister index
- `inferSuperRegisterClass`, a function which finds the appropriate super
register class for an INSERT_SUBREG'
- `inferRegClassFromPattern`, a function which allows for some trivial
lookthrough into instructions
- `getRegClassFromLeaf`, a helper function which returns the register class for
a leaf `TreePatternNode`
- Support for subregister index operands in `importExplicitUseRenderer`
It also
- Updates tests in each backend which are impacted by the change
- Adds GlobalISelEmitterSubreg.td to test that we import and skip the expected
patterns
As a result of this patch, INSERT_SUBREG patterns in X86 may use the
LOW32_ADDR_ACCESS_RBP register class instead of GR32. This is correct, since the
register class contains the same registers as GR32 (except with the addition of
RBP). So, this also teaches X86 to handle that register class. This is in line
with X86ISelLowering, which treats this as a GR class.
Differential Revision: https://reviews.llvm.org/D66498
llvm-svn: 369973
This requires std::intializer_list to be a literal type, which it is
starting with C++14. The downside is that std::bitset is still not
constexpr-friendly so this change contains a re-implementation of most
of it.
Shrinks clang by ~60k.
llvm-svn: 369847
Overloaded intrinsics can use iPTRAny in used/input operands.
The GlobalISelEmitter doesn't know that these are pointers, so it treats them
as scalars. As a result, these intrinsics can't be imported.
This teaches the GlobalISelEmitter to recognize these as pointers rather than
scalars.
Differential Revision: https://reviews.llvm.org/D65756
llvm-svn: 369455
AMDGPU has some buffer intrinsics which theoretically could use
this. Some of the generated tables include the 3 and 4 element vector
versions of these rounded to 64-bits, which is ambiguous. Add these to
help the table disambiguate these.
Assertion change is for the path odd sized vectors now take for R600.
v3i16 is widened to v4i16, which then needs to be promoted to v4i32.
llvm-svn: 369038
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
Summary:
The problem:
When an operand had bits explicitly set to "1" (as in the InitValue.td test case attached), the decoder was ignoring those bits, and the DecoderMethod was receiving an input where the bits were still zero.
The solution:
We added an "InitValue" variable that stores the initial value of the operand based on what bits were explicitly initialized to 1 in TableGen code. The generated decoder code then uses that initial value to initialize the "tmp" variable, then calls fieldFromInstruction to read the values for the remaining bits that were left unknown in TableGen.
This is mainly useful when there are variations of an instruction that differ based on what bits are set in the operands, since this change makes it possible to access those bits in a DecoderMethod. The DecoderMethod can use those bits to know how to handle the input.
Patch by Nicolas Guillemot
Reviewers: craig.topper, dsanders, fhahn
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D63741
llvm-svn: 368458
The upper 4 bits of the immediate byte are used to encode a
register. We need to limit the explicit immediate to fit in the
remaining 4 bits.
Fixes PR42899.
llvm-svn: 368123
This was causing a bug where non-truncating stores would be selected instead of truncating ones.
Differential Revision: https://reviews.llvm.org/D64845
llvm-svn: 367737
AMDGPU uses some custom code predicates for testing alignments.
I'm still having trouble comprehending the behavior of predicate bits
in the PatFrag hierarchy. Any attempt to abstract these properties
unexpectdly fails to apply them.
llvm-svn: 367373
Empty condition strings are considerde always true. This removes a lot
of clutter from the generated matcher tables.
This shrinks the source size of AMDGPUGenDAGISel.inc from 7.3M to
6.1M.
llvm-svn: 367326