llvm-project/llvm/lib/CodeGen/MIRParser/MILexer.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

237 lines
5.4 KiB
C
Raw Normal View History

//===- MILexer.h - Lexer for machine instructions ---------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file declares the function that lexes the machine instruction source
// string.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_CODEGEN_MIRPARSER_MILEXER_H
#define LLVM_LIB_CODEGEN_MIRPARSER_MILEXER_H
#include "llvm/ADT/APSInt.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/StringRef.h"
#include <string>
namespace llvm {
class Twine;
/// A token produced by the machine instruction lexer.
struct MIToken {
enum TokenKind {
// Markers
Eof,
Error,
MIR Serialization: Change MIR syntax - use custom syntax for MBBs. This commit modifies the way the machine basic blocks are serialized - now the machine basic blocks are serialized using a custom syntax instead of relying on YAML primitives. Instead of using YAML mappings to represent the individual machine basic blocks in a machine function's body, the new syntax uses a single YAML block scalar which contains all of the machine basic blocks and instructions for that function. This is an example of a function's body that uses the old syntax: body: - id: 0 name: entry instructions: - '%eax = MOV32r0 implicit-def %eflags' - 'RETQ %eax' ... The same body is now written like this: body: | bb.0.entry: %eax = MOV32r0 implicit-def %eflags RETQ %eax ... This syntax change is motivated by the fact that the bundled machine instructions didn't map that well to the old syntax which was using a single YAML sequence to store all of the machine instructions in a block. The bundled machine instructions internally use flags like BundledPred and BundledSucc to determine the bundles, and serializing them as MI flags using the old syntax would have had a negative impact on the readability and the ease of editing for MIR files. The new syntax allows me to serialize the bundled machine instructions using a block construct without relying on the internal flags, for example: BUNDLE implicit-def dead %itstate, implicit-def %s1 ... { t2IT 1, 24, implicit-def %itstate %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate } This commit also converts the MIR testcases to the new syntax. I developed a script that can convert from the old syntax to the new one. I will post the script on the llvm-commits mailing list in the thread for this commit. llvm-svn: 244982
2015-08-14 07:10:16 +08:00
Newline,
// Tokens with no info.
comma,
equal,
underscore,
colon,
coloncolon,
dot,
exclaim,
lparen,
rparen,
lbrace,
rbrace,
plus,
minus,
less,
greater,
// Keywords
kw_implicit,
kw_implicit_define,
kw_def,
kw_dead,
kw_dereferenceable,
kw_killed,
kw_undef,
kw_internal,
kw_early_clobber,
kw_debug_use,
kw_renamable,
kw_tied_def,
kw_frame_setup,
kw_frame_destroy,
kw_nnan,
kw_ninf,
kw_nsz,
kw_arcp,
kw_contract,
kw_afn,
kw_reassoc,
kw_nuw,
kw_nsw,
kw_exact,
Allow target to handle STRICT floating-point nodes The ISD::STRICT_ nodes used to implement the constrained floating-point intrinsics are currently never passed to the target back-end, which makes it impossible to handle them correctly (e.g. mark instructions are depending on a floating-point status and control register, or mark instructions as possibly trapping). This patch allows the target to use setOperationAction to switch the action on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code will stop converting the STRICT nodes to regular floating-point nodes, but instead pass the STRICT nodes to the target using normal SelectionDAG matching rules. To avoid having the back-end duplicate all the floating-point instruction patterns to handle both strict and non-strict variants, we make the MI codegen explicitly aware of the floating-point exceptions by introducing two new concepts: - A new MCID flag "mayRaiseFPException" that the target should set on any instruction that possibly can raise FP exception according to the architecture definition. - A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI instruction resulting from expansion of any constrained FP intrinsic. Any MI instruction that is *both* marked as mayRaiseFPException *and* FPExcept then needs to be considered as raising exceptions by MI-level codegen (e.g. scheduling). Setting those two new flags is straightforward. The mayRaiseFPException flag is simply set via TableGen by marking all relevant instruction patterns in the .td files. The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes in the SelectionDAG, and gets inherited in the MachineSDNode nodes created from it during instruction selection. The flag is then transfered to an MIFlag when creating the MI from the MachineSDNode. This is handled just like fast-math flags like no-nans are handled today. This patch includes both common code changes required to implement the new features, and the SystemZ implementation. Reviewed By: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D55506 llvm-svn: 362663
2019-06-06 06:33:10 +08:00
kw_fpexcept,
kw_debug_location,
kw_cfi_same_value,
kw_cfi_offset,
kw_cfi_rel_offset,
kw_cfi_def_cfa_register,
kw_cfi_def_cfa_offset,
kw_cfi_adjust_cfa_offset,
kw_cfi_escape,
kw_cfi_def_cfa,
kw_cfi_register,
kw_cfi_remember_state,
kw_cfi_restore,
kw_cfi_restore_state,
kw_cfi_undefined,
kw_cfi_window_save,
kw_cfi_aarch64_negate_ra_sign_state,
kw_blockaddress,
kw_intrinsic,
kw_target_index,
kw_half,
kw_float,
kw_double,
kw_x86_fp80,
kw_fp128,
kw_ppc_fp128,
kw_target_flags,
kw_volatile,
kw_non_temporal,
kw_invariant,
kw_align,
kw_addrspace,
kw_stack,
kw_got,
kw_jump_table,
kw_constant_pool,
kw_call_entry,
kw_liveout,
MIR Serialization: Change MIR syntax - use custom syntax for MBBs. This commit modifies the way the machine basic blocks are serialized - now the machine basic blocks are serialized using a custom syntax instead of relying on YAML primitives. Instead of using YAML mappings to represent the individual machine basic blocks in a machine function's body, the new syntax uses a single YAML block scalar which contains all of the machine basic blocks and instructions for that function. This is an example of a function's body that uses the old syntax: body: - id: 0 name: entry instructions: - '%eax = MOV32r0 implicit-def %eflags' - 'RETQ %eax' ... The same body is now written like this: body: | bb.0.entry: %eax = MOV32r0 implicit-def %eflags RETQ %eax ... This syntax change is motivated by the fact that the bundled machine instructions didn't map that well to the old syntax which was using a single YAML sequence to store all of the machine instructions in a block. The bundled machine instructions internally use flags like BundledPred and BundledSucc to determine the bundles, and serializing them as MI flags using the old syntax would have had a negative impact on the readability and the ease of editing for MIR files. The new syntax allows me to serialize the bundled machine instructions using a block construct without relying on the internal flags, for example: BUNDLE implicit-def dead %itstate, implicit-def %s1 ... { t2IT 1, 24, implicit-def %itstate %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate } This commit also converts the MIR testcases to the new syntax. I developed a script that can convert from the old syntax to the new one. I will post the script on the llvm-commits mailing list in the thread for this commit. llvm-svn: 244982
2015-08-14 07:10:16 +08:00
kw_address_taken,
kw_landing_pad,
kw_liveins,
kw_successors,
kw_floatpred,
kw_intpred,
kw_shufflemask,
[x86/MIR] Implement support for pre- and post-instruction symbols, as well as MIR parsing support for `MCSymbol` `MachineOperand`s. The only real way to test pre- and post-instruction symbol support is to use them in operands, so I ended up implementing that within the patch as well. I can split out the operand support if folks really want but it doesn't really seem worth it. The functional implementation of pre- and post-instruction symbols is now *completely trivial*. Two tiny bits of code in the (misnamed) AsmPrinter. It should be completely target independent as well. We emit these exactly the same way as we emit basic block labels. Most of the code here is to give full dumping, MIR printing, and MIR parsing support so that we can write useful tests. The MIR parsing of MC symbol operands still isn't 100%, as it forces the symbols to be non-temporary and non-local symbols with names. However, those names often can encode most (if not all) of the special semantics desired, and unnamed symbols seem especially annoying to serialize and de-serialize. While this isn't perfect or full support, it seems plenty to write tests that exercise usage of these kinds of operands. The MIR support for pre-and post-instruction symbols was quite straightforward. I chose to print them out in an as-if-operand syntax similar to debug locations as this seemed the cleanest way and let me use nice introducer tokens rather than inventing more magic punctuation like we use for memoperands. However, supporting MIR-based parsing of these symbols caused me to change the design of the symbol support to allow setting arbitrary symbols. Without this, I don't see any reasonable way to test things with MIR. Differential Revision: https://reviews.llvm.org/D50833 llvm-svn: 339962
2018-08-17 07:11:05 +08:00
kw_pre_instr_symbol,
kw_post_instr_symbol,
kw_unknown_size,
// Named metadata keywords
md_tbaa,
md_alias_scope,
md_noalias,
md_range,
md_diexpr,
md_dilocation,
// Identifier tokens
Identifier,
NamedRegister,
NamedVirtualRegister,
MIR Serialization: Change MIR syntax - use custom syntax for MBBs. This commit modifies the way the machine basic blocks are serialized - now the machine basic blocks are serialized using a custom syntax instead of relying on YAML primitives. Instead of using YAML mappings to represent the individual machine basic blocks in a machine function's body, the new syntax uses a single YAML block scalar which contains all of the machine basic blocks and instructions for that function. This is an example of a function's body that uses the old syntax: body: - id: 0 name: entry instructions: - '%eax = MOV32r0 implicit-def %eflags' - 'RETQ %eax' ... The same body is now written like this: body: | bb.0.entry: %eax = MOV32r0 implicit-def %eflags RETQ %eax ... This syntax change is motivated by the fact that the bundled machine instructions didn't map that well to the old syntax which was using a single YAML sequence to store all of the machine instructions in a block. The bundled machine instructions internally use flags like BundledPred and BundledSucc to determine the bundles, and serializing them as MI flags using the old syntax would have had a negative impact on the readability and the ease of editing for MIR files. The new syntax allows me to serialize the bundled machine instructions using a block construct without relying on the internal flags, for example: BUNDLE implicit-def dead %itstate, implicit-def %s1 ... { t2IT 1, 24, implicit-def %itstate %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate } This commit also converts the MIR testcases to the new syntax. I developed a script that can convert from the old syntax to the new one. I will post the script on the llvm-commits mailing list in the thread for this commit. llvm-svn: 244982
2015-08-14 07:10:16 +08:00
MachineBasicBlockLabel,
MachineBasicBlock,
StackObject,
FixedStackObject,
NamedGlobalValue,
GlobalValue,
ExternalSymbol,
[x86/MIR] Implement support for pre- and post-instruction symbols, as well as MIR parsing support for `MCSymbol` `MachineOperand`s. The only real way to test pre- and post-instruction symbol support is to use them in operands, so I ended up implementing that within the patch as well. I can split out the operand support if folks really want but it doesn't really seem worth it. The functional implementation of pre- and post-instruction symbols is now *completely trivial*. Two tiny bits of code in the (misnamed) AsmPrinter. It should be completely target independent as well. We emit these exactly the same way as we emit basic block labels. Most of the code here is to give full dumping, MIR printing, and MIR parsing support so that we can write useful tests. The MIR parsing of MC symbol operands still isn't 100%, as it forces the symbols to be non-temporary and non-local symbols with names. However, those names often can encode most (if not all) of the special semantics desired, and unnamed symbols seem especially annoying to serialize and de-serialize. While this isn't perfect or full support, it seems plenty to write tests that exercise usage of these kinds of operands. The MIR support for pre-and post-instruction symbols was quite straightforward. I chose to print them out in an as-if-operand syntax similar to debug locations as this seemed the cleanest way and let me use nice introducer tokens rather than inventing more magic punctuation like we use for memoperands. However, supporting MIR-based parsing of these symbols caused me to change the design of the symbol support to allow setting arbitrary symbols. Without this, I don't see any reasonable way to test things with MIR. Differential Revision: https://reviews.llvm.org/D50833 llvm-svn: 339962
2018-08-17 07:11:05 +08:00
MCSymbol,
// Other tokens
IntegerLiteral,
FloatingPointLiteral,
HexLiteral,
VectorLiteral,
VirtualRegister,
ConstantPoolItem,
JumpTableIndex,
NamedIRBlock,
IRBlock,
NamedIRValue,
IRValue,
QuotedIRValue, // `<constant value>`
SubRegisterIndex,
StringConstant
};
private:
TokenKind Kind = Error;
StringRef Range;
StringRef StringValue;
std::string StringValueStorage;
APSInt IntVal;
public:
MIToken() = default;
MIToken &reset(TokenKind Kind, StringRef Range);
MIToken &setStringValue(StringRef StrVal);
MIToken &setOwnedStringValue(std::string StrVal);
MIToken &setIntegerValue(APSInt IntVal);
TokenKind kind() const { return Kind; }
bool isError() const { return Kind == Error; }
MIR Serialization: Change MIR syntax - use custom syntax for MBBs. This commit modifies the way the machine basic blocks are serialized - now the machine basic blocks are serialized using a custom syntax instead of relying on YAML primitives. Instead of using YAML mappings to represent the individual machine basic blocks in a machine function's body, the new syntax uses a single YAML block scalar which contains all of the machine basic blocks and instructions for that function. This is an example of a function's body that uses the old syntax: body: - id: 0 name: entry instructions: - '%eax = MOV32r0 implicit-def %eflags' - 'RETQ %eax' ... The same body is now written like this: body: | bb.0.entry: %eax = MOV32r0 implicit-def %eflags RETQ %eax ... This syntax change is motivated by the fact that the bundled machine instructions didn't map that well to the old syntax which was using a single YAML sequence to store all of the machine instructions in a block. The bundled machine instructions internally use flags like BundledPred and BundledSucc to determine the bundles, and serializing them as MI flags using the old syntax would have had a negative impact on the readability and the ease of editing for MIR files. The new syntax allows me to serialize the bundled machine instructions using a block construct without relying on the internal flags, for example: BUNDLE implicit-def dead %itstate, implicit-def %s1 ... { t2IT 1, 24, implicit-def %itstate %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate } This commit also converts the MIR testcases to the new syntax. I developed a script that can convert from the old syntax to the new one. I will post the script on the llvm-commits mailing list in the thread for this commit. llvm-svn: 244982
2015-08-14 07:10:16 +08:00
bool isNewlineOrEOF() const { return Kind == Newline || Kind == Eof; }
bool isErrorOrEOF() const { return Kind == Error || Kind == Eof; }
bool isRegister() const {
return Kind == NamedRegister || Kind == underscore ||
Kind == NamedVirtualRegister || Kind == VirtualRegister;
}
bool isRegisterFlag() const {
return Kind == kw_implicit || Kind == kw_implicit_define ||
Kind == kw_def || Kind == kw_dead || Kind == kw_killed ||
Kind == kw_undef || Kind == kw_internal ||
Kind == kw_early_clobber || Kind == kw_debug_use ||
Kind == kw_renamable;
}
bool isMemoryOperandFlag() const {
return Kind == kw_volatile || Kind == kw_non_temporal ||
Kind == kw_dereferenceable || Kind == kw_invariant ||
Kind == StringConstant;
}
bool is(TokenKind K) const { return Kind == K; }
bool isNot(TokenKind K) const { return Kind != K; }
StringRef::iterator location() const { return Range.begin(); }
StringRef range() const { return Range; }
/// Return the token's string value.
StringRef stringValue() const { return StringValue; }
const APSInt &integerValue() const { return IntVal; }
bool hasIntegerValue() const {
return Kind == IntegerLiteral || Kind == MachineBasicBlock ||
MIR Serialization: Change MIR syntax - use custom syntax for MBBs. This commit modifies the way the machine basic blocks are serialized - now the machine basic blocks are serialized using a custom syntax instead of relying on YAML primitives. Instead of using YAML mappings to represent the individual machine basic blocks in a machine function's body, the new syntax uses a single YAML block scalar which contains all of the machine basic blocks and instructions for that function. This is an example of a function's body that uses the old syntax: body: - id: 0 name: entry instructions: - '%eax = MOV32r0 implicit-def %eflags' - 'RETQ %eax' ... The same body is now written like this: body: | bb.0.entry: %eax = MOV32r0 implicit-def %eflags RETQ %eax ... This syntax change is motivated by the fact that the bundled machine instructions didn't map that well to the old syntax which was using a single YAML sequence to store all of the machine instructions in a block. The bundled machine instructions internally use flags like BundledPred and BundledSucc to determine the bundles, and serializing them as MI flags using the old syntax would have had a negative impact on the readability and the ease of editing for MIR files. The new syntax allows me to serialize the bundled machine instructions using a block construct without relying on the internal flags, for example: BUNDLE implicit-def dead %itstate, implicit-def %s1 ... { t2IT 1, 24, implicit-def %itstate %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate } This commit also converts the MIR testcases to the new syntax. I developed a script that can convert from the old syntax to the new one. I will post the script on the llvm-commits mailing list in the thread for this commit. llvm-svn: 244982
2015-08-14 07:10:16 +08:00
Kind == MachineBasicBlockLabel || Kind == StackObject ||
Kind == FixedStackObject || Kind == GlobalValue ||
Kind == VirtualRegister || Kind == ConstantPoolItem ||
Kind == JumpTableIndex || Kind == IRBlock || Kind == IRValue;
}
};
/// Consume a single machine instruction token in the given source and return
/// the remaining source string.
StringRef lexMIToken(
StringRef Source, MIToken &Token,
function_ref<void(StringRef::iterator, const Twine &)> ErrorCallback);
} // end namespace llvm
#endif // LLVM_LIB_CODEGEN_MIRPARSER_MILEXER_H