llvm-project/llvm/lib/CodeGen/MIRParser/MILexer.h

233 lines
5.3 KiB
C
Raw Normal View History

//===- MILexer.h - Lexer for machine instructions ---------------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This file declares the function that lexes the machine instruction source
// string.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_CODEGEN_MIRPARSER_MILEXER_H
#define LLVM_LIB_CODEGEN_MIRPARSER_MILEXER_H
#include "llvm/ADT/APSInt.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/StringRef.h"
#include <string>
namespace llvm {
class Twine;
/// A token produced by the machine instruction lexer.
struct MIToken {
enum TokenKind {
// Markers
Eof,
Error,
MIR Serialization: Change MIR syntax - use custom syntax for MBBs. This commit modifies the way the machine basic blocks are serialized - now the machine basic blocks are serialized using a custom syntax instead of relying on YAML primitives. Instead of using YAML mappings to represent the individual machine basic blocks in a machine function's body, the new syntax uses a single YAML block scalar which contains all of the machine basic blocks and instructions for that function. This is an example of a function's body that uses the old syntax: body: - id: 0 name: entry instructions: - '%eax = MOV32r0 implicit-def %eflags' - 'RETQ %eax' ... The same body is now written like this: body: | bb.0.entry: %eax = MOV32r0 implicit-def %eflags RETQ %eax ... This syntax change is motivated by the fact that the bundled machine instructions didn't map that well to the old syntax which was using a single YAML sequence to store all of the machine instructions in a block. The bundled machine instructions internally use flags like BundledPred and BundledSucc to determine the bundles, and serializing them as MI flags using the old syntax would have had a negative impact on the readability and the ease of editing for MIR files. The new syntax allows me to serialize the bundled machine instructions using a block construct without relying on the internal flags, for example: BUNDLE implicit-def dead %itstate, implicit-def %s1 ... { t2IT 1, 24, implicit-def %itstate %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate } This commit also converts the MIR testcases to the new syntax. I developed a script that can convert from the old syntax to the new one. I will post the script on the llvm-commits mailing list in the thread for this commit. llvm-svn: 244982
2015-08-14 07:10:16 +08:00
Newline,
// Tokens with no info.
comma,
equal,
underscore,
colon,
coloncolon,
dot,
exclaim,
lparen,
rparen,
lbrace,
rbrace,
plus,
minus,
less,
greater,
// Keywords
kw_implicit,
kw_implicit_define,
kw_def,
kw_dead,
kw_dereferenceable,
kw_killed,
kw_undef,
kw_internal,
kw_early_clobber,
kw_debug_use,
kw_renamable,
kw_tied_def,
kw_frame_setup,
kw_frame_destroy,
kw_nnan,
kw_ninf,
kw_nsz,
kw_arcp,
kw_contract,
kw_afn,
kw_reassoc,
kw_nuw,
kw_nsw,
kw_exact,
kw_debug_location,
kw_cfi_same_value,
kw_cfi_offset,
kw_cfi_rel_offset,
kw_cfi_def_cfa_register,
kw_cfi_def_cfa_offset,
kw_cfi_adjust_cfa_offset,
kw_cfi_escape,
kw_cfi_def_cfa,
kw_cfi_register,
kw_cfi_remember_state,
kw_cfi_restore,
kw_cfi_restore_state,
kw_cfi_undefined,
kw_cfi_window_save,
kw_blockaddress,
kw_intrinsic,
kw_target_index,
kw_half,
kw_float,
kw_double,
kw_x86_fp80,
kw_fp128,
kw_ppc_fp128,
kw_target_flags,
kw_volatile,
kw_non_temporal,
kw_invariant,
kw_align,
kw_addrspace,
kw_stack,
kw_got,
kw_jump_table,
kw_constant_pool,
kw_call_entry,
kw_liveout,
MIR Serialization: Change MIR syntax - use custom syntax for MBBs. This commit modifies the way the machine basic blocks are serialized - now the machine basic blocks are serialized using a custom syntax instead of relying on YAML primitives. Instead of using YAML mappings to represent the individual machine basic blocks in a machine function's body, the new syntax uses a single YAML block scalar which contains all of the machine basic blocks and instructions for that function. This is an example of a function's body that uses the old syntax: body: - id: 0 name: entry instructions: - '%eax = MOV32r0 implicit-def %eflags' - 'RETQ %eax' ... The same body is now written like this: body: | bb.0.entry: %eax = MOV32r0 implicit-def %eflags RETQ %eax ... This syntax change is motivated by the fact that the bundled machine instructions didn't map that well to the old syntax which was using a single YAML sequence to store all of the machine instructions in a block. The bundled machine instructions internally use flags like BundledPred and BundledSucc to determine the bundles, and serializing them as MI flags using the old syntax would have had a negative impact on the readability and the ease of editing for MIR files. The new syntax allows me to serialize the bundled machine instructions using a block construct without relying on the internal flags, for example: BUNDLE implicit-def dead %itstate, implicit-def %s1 ... { t2IT 1, 24, implicit-def %itstate %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate } This commit also converts the MIR testcases to the new syntax. I developed a script that can convert from the old syntax to the new one. I will post the script on the llvm-commits mailing list in the thread for this commit. llvm-svn: 244982
2015-08-14 07:10:16 +08:00
kw_address_taken,
kw_landing_pad,
kw_liveins,
kw_successors,
kw_floatpred,
kw_intpred,
[x86/MIR] Implement support for pre- and post-instruction symbols, as well as MIR parsing support for `MCSymbol` `MachineOperand`s. The only real way to test pre- and post-instruction symbol support is to use them in operands, so I ended up implementing that within the patch as well. I can split out the operand support if folks really want but it doesn't really seem worth it. The functional implementation of pre- and post-instruction symbols is now *completely trivial*. Two tiny bits of code in the (misnamed) AsmPrinter. It should be completely target independent as well. We emit these exactly the same way as we emit basic block labels. Most of the code here is to give full dumping, MIR printing, and MIR parsing support so that we can write useful tests. The MIR parsing of MC symbol operands still isn't 100%, as it forces the symbols to be non-temporary and non-local symbols with names. However, those names often can encode most (if not all) of the special semantics desired, and unnamed symbols seem especially annoying to serialize and de-serialize. While this isn't perfect or full support, it seems plenty to write tests that exercise usage of these kinds of operands. The MIR support for pre-and post-instruction symbols was quite straightforward. I chose to print them out in an as-if-operand syntax similar to debug locations as this seemed the cleanest way and let me use nice introducer tokens rather than inventing more magic punctuation like we use for memoperands. However, supporting MIR-based parsing of these symbols caused me to change the design of the symbol support to allow setting arbitrary symbols. Without this, I don't see any reasonable way to test things with MIR. Differential Revision: https://reviews.llvm.org/D50833 llvm-svn: 339962
2018-08-17 07:11:05 +08:00
kw_pre_instr_symbol,
kw_post_instr_symbol,
kw_unknown_size,
// Named metadata keywords
md_tbaa,
md_alias_scope,
md_noalias,
md_range,
md_diexpr,
// Identifier tokens
Identifier,
NamedRegister,
NamedVirtualRegister,
MIR Serialization: Change MIR syntax - use custom syntax for MBBs. This commit modifies the way the machine basic blocks are serialized - now the machine basic blocks are serialized using a custom syntax instead of relying on YAML primitives. Instead of using YAML mappings to represent the individual machine basic blocks in a machine function's body, the new syntax uses a single YAML block scalar which contains all of the machine basic blocks and instructions for that function. This is an example of a function's body that uses the old syntax: body: - id: 0 name: entry instructions: - '%eax = MOV32r0 implicit-def %eflags' - 'RETQ %eax' ... The same body is now written like this: body: | bb.0.entry: %eax = MOV32r0 implicit-def %eflags RETQ %eax ... This syntax change is motivated by the fact that the bundled machine instructions didn't map that well to the old syntax which was using a single YAML sequence to store all of the machine instructions in a block. The bundled machine instructions internally use flags like BundledPred and BundledSucc to determine the bundles, and serializing them as MI flags using the old syntax would have had a negative impact on the readability and the ease of editing for MIR files. The new syntax allows me to serialize the bundled machine instructions using a block construct without relying on the internal flags, for example: BUNDLE implicit-def dead %itstate, implicit-def %s1 ... { t2IT 1, 24, implicit-def %itstate %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate } This commit also converts the MIR testcases to the new syntax. I developed a script that can convert from the old syntax to the new one. I will post the script on the llvm-commits mailing list in the thread for this commit. llvm-svn: 244982
2015-08-14 07:10:16 +08:00
MachineBasicBlockLabel,
MachineBasicBlock,
StackObject,
FixedStackObject,
NamedGlobalValue,
GlobalValue,
ExternalSymbol,
[x86/MIR] Implement support for pre- and post-instruction symbols, as well as MIR parsing support for `MCSymbol` `MachineOperand`s. The only real way to test pre- and post-instruction symbol support is to use them in operands, so I ended up implementing that within the patch as well. I can split out the operand support if folks really want but it doesn't really seem worth it. The functional implementation of pre- and post-instruction symbols is now *completely trivial*. Two tiny bits of code in the (misnamed) AsmPrinter. It should be completely target independent as well. We emit these exactly the same way as we emit basic block labels. Most of the code here is to give full dumping, MIR printing, and MIR parsing support so that we can write useful tests. The MIR parsing of MC symbol operands still isn't 100%, as it forces the symbols to be non-temporary and non-local symbols with names. However, those names often can encode most (if not all) of the special semantics desired, and unnamed symbols seem especially annoying to serialize and de-serialize. While this isn't perfect or full support, it seems plenty to write tests that exercise usage of these kinds of operands. The MIR support for pre-and post-instruction symbols was quite straightforward. I chose to print them out in an as-if-operand syntax similar to debug locations as this seemed the cleanest way and let me use nice introducer tokens rather than inventing more magic punctuation like we use for memoperands. However, supporting MIR-based parsing of these symbols caused me to change the design of the symbol support to allow setting arbitrary symbols. Without this, I don't see any reasonable way to test things with MIR. Differential Revision: https://reviews.llvm.org/D50833 llvm-svn: 339962
2018-08-17 07:11:05 +08:00
MCSymbol,
// Other tokens
IntegerLiteral,
FloatingPointLiteral,
HexLiteral,
VirtualRegister,
ConstantPoolItem,
JumpTableIndex,
NamedIRBlock,
IRBlock,
NamedIRValue,
IRValue,
QuotedIRValue, // `<constant value>`
SubRegisterIndex,
StringConstant
};
private:
TokenKind Kind = Error;
StringRef Range;
StringRef StringValue;
std::string StringValueStorage;
APSInt IntVal;
public:
MIToken() = default;
MIToken &reset(TokenKind Kind, StringRef Range);
MIToken &setStringValue(StringRef StrVal);
MIToken &setOwnedStringValue(std::string StrVal);
MIToken &setIntegerValue(APSInt IntVal);
TokenKind kind() const { return Kind; }
bool isError() const { return Kind == Error; }
MIR Serialization: Change MIR syntax - use custom syntax for MBBs. This commit modifies the way the machine basic blocks are serialized - now the machine basic blocks are serialized using a custom syntax instead of relying on YAML primitives. Instead of using YAML mappings to represent the individual machine basic blocks in a machine function's body, the new syntax uses a single YAML block scalar which contains all of the machine basic blocks and instructions for that function. This is an example of a function's body that uses the old syntax: body: - id: 0 name: entry instructions: - '%eax = MOV32r0 implicit-def %eflags' - 'RETQ %eax' ... The same body is now written like this: body: | bb.0.entry: %eax = MOV32r0 implicit-def %eflags RETQ %eax ... This syntax change is motivated by the fact that the bundled machine instructions didn't map that well to the old syntax which was using a single YAML sequence to store all of the machine instructions in a block. The bundled machine instructions internally use flags like BundledPred and BundledSucc to determine the bundles, and serializing them as MI flags using the old syntax would have had a negative impact on the readability and the ease of editing for MIR files. The new syntax allows me to serialize the bundled machine instructions using a block construct without relying on the internal flags, for example: BUNDLE implicit-def dead %itstate, implicit-def %s1 ... { t2IT 1, 24, implicit-def %itstate %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate } This commit also converts the MIR testcases to the new syntax. I developed a script that can convert from the old syntax to the new one. I will post the script on the llvm-commits mailing list in the thread for this commit. llvm-svn: 244982
2015-08-14 07:10:16 +08:00
bool isNewlineOrEOF() const { return Kind == Newline || Kind == Eof; }
bool isErrorOrEOF() const { return Kind == Error || Kind == Eof; }
bool isRegister() const {
return Kind == NamedRegister || Kind == underscore ||
Kind == NamedVirtualRegister || Kind == VirtualRegister;
}
bool isRegisterFlag() const {
return Kind == kw_implicit || Kind == kw_implicit_define ||
Kind == kw_def || Kind == kw_dead || Kind == kw_killed ||
Kind == kw_undef || Kind == kw_internal ||
Kind == kw_early_clobber || Kind == kw_debug_use ||
Kind == kw_renamable;
}
bool isMemoryOperandFlag() const {
return Kind == kw_volatile || Kind == kw_non_temporal ||
Kind == kw_dereferenceable || Kind == kw_invariant ||
Kind == StringConstant;
}
bool is(TokenKind K) const { return Kind == K; }
bool isNot(TokenKind K) const { return Kind != K; }
StringRef::iterator location() const { return Range.begin(); }
StringRef range() const { return Range; }
/// Return the token's string value.
StringRef stringValue() const { return StringValue; }
const APSInt &integerValue() const { return IntVal; }
bool hasIntegerValue() const {
return Kind == IntegerLiteral || Kind == MachineBasicBlock ||
MIR Serialization: Change MIR syntax - use custom syntax for MBBs. This commit modifies the way the machine basic blocks are serialized - now the machine basic blocks are serialized using a custom syntax instead of relying on YAML primitives. Instead of using YAML mappings to represent the individual machine basic blocks in a machine function's body, the new syntax uses a single YAML block scalar which contains all of the machine basic blocks and instructions for that function. This is an example of a function's body that uses the old syntax: body: - id: 0 name: entry instructions: - '%eax = MOV32r0 implicit-def %eflags' - 'RETQ %eax' ... The same body is now written like this: body: | bb.0.entry: %eax = MOV32r0 implicit-def %eflags RETQ %eax ... This syntax change is motivated by the fact that the bundled machine instructions didn't map that well to the old syntax which was using a single YAML sequence to store all of the machine instructions in a block. The bundled machine instructions internally use flags like BundledPred and BundledSucc to determine the bundles, and serializing them as MI flags using the old syntax would have had a negative impact on the readability and the ease of editing for MIR files. The new syntax allows me to serialize the bundled machine instructions using a block construct without relying on the internal flags, for example: BUNDLE implicit-def dead %itstate, implicit-def %s1 ... { t2IT 1, 24, implicit-def %itstate %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate } This commit also converts the MIR testcases to the new syntax. I developed a script that can convert from the old syntax to the new one. I will post the script on the llvm-commits mailing list in the thread for this commit. llvm-svn: 244982
2015-08-14 07:10:16 +08:00
Kind == MachineBasicBlockLabel || Kind == StackObject ||
Kind == FixedStackObject || Kind == GlobalValue ||
Kind == VirtualRegister || Kind == ConstantPoolItem ||
Kind == JumpTableIndex || Kind == IRBlock || Kind == IRValue;
}
};
/// Consume a single machine instruction token in the given source and return
/// the remaining source string.
StringRef lexMIToken(
StringRef Source, MIToken &Token,
function_ref<void(StringRef::iterator, const Twine &)> ErrorCallback);
} // end namespace llvm
#endif // LLVM_LIB_CODEGEN_MIRPARSER_MILEXER_H