Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=47983
The AsmLexer currently has an issue with lexing line comments in files
with CRLF line endings, in which it reads the carriage return as being
part of the line comment. This causes an error for certain valid comment
layouts; this patch fixes this by excluding the carriage return from the
line comment.
Differential Revision: https://reviews.llvm.org/D90234
- As per the HLASM support we are providing, i.e. support only for the first parameter of the inline asm block, only pertaining to Z machine instructions defined in LLVM, character literals and string literals are not supported (see Figure 4 - https://www-01.ibm.com/servers/resourcelink/svc00100.nsf/pages/zOSV2R3sc264940/$file/asmr1023.pdf for more information)
- This patch explicitly rejects the usage of char literals and string literals (for example "abc 'a'") when the relevant field is set
- This is achieved by introducing a field called `LexHLASMStrings` in MCAsmLexer similar to `LexMasmStrings`
Reviewed By: abhina.sreeskantharajan, Kai
Differential Revision: https://reviews.llvm.org/D101660
- Previously, https://reviews.llvm.org/D99889 changed the framework in the AsmLexer to treat special tokens, if they occur at the start of the string, as Identifiers.
- These are used by the MASM Parser implementation in LLVM, and we can extend some of the changes made in the previous patch to SystemZ.
- In SystemZ, the special "tokens" referred to here are "_", "$", "@", "#". [_|$|@|#] are already supported as "part" of an Identifier.
- The changes in this patch ensure that these special tokens, when they occur at the start of the Identifier, are treated as Identifiers.
Reviewed By: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D100959
- Previously, https://reviews.llvm.org/D72680 introduced a new attribute called `AllowSymbolAtNameStart` (in relation to the MAsmParser changes) in `MCAsmInfo.h` which (according to the comment in the header) allows the following behaviour:
```
/// This is true if the assembler allows $ @ ? characters at the start of
/// symbol names. Defaults to false.
```
- However, the usage of this field in AsmLexer.cpp doesn't seem completely accurate* for a couple of reasons.
```
default:
if (MAI.doesAllowSymbolAtNameStart()) {
// Handle Microsoft-style identifier: [a-zA-Z_$.@?][a-zA-Z0-9_$.@#?]*
if (!isDigit(CurChar) &&
isIdentifierChar(CurChar, MAI.doesAllowAtInName(),
AllowHashInIdentifier))
return LexIdentifier();
}
```
1. The Dollar and At tokens, when occurring at the start of the string, are treated as separate tokens (AsmToken::Dollar and AsmToken::At respectively) and not lexed as an Identifier.
2. I'm not too sure why `MAI.doesAllowAtInName()` is used when `AllowAtInIdentifier` could be used. For X86 platforms, afaict, this shouldn't be an issue, since the `CommentString` attribute isn't "@". (alternatively the call to the setter can be set anywhere else as needed). The `AllowAtInName` does have an additional important meaning, but in the context of AsmLexer, shouldn't mean anything different compared to `AllowAtInIdentifier`
My proposal is the following:
- Introduce 3 new fields called `AllowQuestionTokenAtStartOfString`, `AllowDollarTokenAtStartOfString` and `AllowAtTokenAtStartOfString` in MCAsmInfo.h which will encapsulate the previously documented behaviour of "allowing $, @, ? characters at the start of symbol names")
- Introduce these fields where "$", "@" are lexed, and treat them as identifiers depending on whether `Allow[Dollar|At]TokenAtStartOfString` is set.
- For the sole case of "?", append it to the existing logic for treating a "default" token as an Identifier.
z/OS (HLASM) will also make use of some of these fields in follow up patches.
completely accurate* - This was based on the comments and the intended behaviour the code. I might have completely misinterpreted it, and if that is the case my sincere apologies. We can close this patch if necessary, if there are no changes to be made :)
Depends on https://reviews.llvm.org/D99374
Reviewed By: Jonathan.Crowther
Differential Revision: https://reviews.llvm.org/D99889
- Add support for HLASM style integers. These are the decimal integers [0-9].
- HLASM does not support the additional prefixed integers like, `0b`, `0x`, octal integers and Masm style integers.
- To achieve this, a field `LexHLASMStyleIntegers` (similar to the `LexMasmStyleIntegers` field) is introduced in `MCAsmLexer.h` as well as a corresponding setter.
Note: This field could also go into MCAsmInfo.h. I used the previous precedent set by the `LexMasmIntegers` field.
Depends on https://reviews.llvm.org/D99286
Reviewed By: epastor
Differential Revision: https://reviews.llvm.org/D99374
- Currently, MCAsmInfo provides a CommentString attribute, that various targets can set, so that the AsmLexer can appropriately lex a string as a comment based on the set value of the attribute.
- However, AsmLexer also supports a few additional comment syntaxes, in addition to what's specified as a CommentString attribute. This includes regular C-style block comments (/* ... */), regular C-style line comments (// .... ) and #. While I'm not sure as to why this behaviour exists, I am assuming it does to maintain backward compatibility with GNU AS (see https://sourceware.org/binutils/docs/as/Comments.html#Comments for reference)
For example:
Consider a target which sets the CommentString attribute to '*'.
The following strings are all lexed as comments.
```
"# abc" -> comment
"// abc" -> comment
"/* abc */ -> comment
"* abc" -> comment
```
- In HLASM however, only "*" is accepted as a comment string, and nothing else.
- To achieve this, an additional attribute (`AllowAdditionalComments`) has been added to MCAsmInfo. If this attribute is set to false, then only the string specified by the CommentString attribute is used as a possible comment string to be lexed by the AsmLexer. The regular C-style block comments, line comments and "#" are disabled. As a final note, "#" will still be treated as a comment, if the CommentString attribute is set to "#".
Depends on https://reviews.llvm.org/D99277
Reviewed By: abhina.sreeskantharajan, myiwanch
Differential Revision: https://reviews.llvm.org/D99286
The GNU AS manual states the following about single-character constants enclosed within single quotes:
> Some backslash escapes apply to characters, \b, \f, \n, \r, \t, and \" with the same meaning as for strings, plus \' for a single quote.
Add two more characters to the switch handling this case to match GAS behaviour, plus a test to make sure nothing regresses.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D99609
These look like $00A0cf for hex and %001010101 for binary. They are used in Motorola assembly syntax.
Differential Revision: https://reviews.llvm.org/D98519
- This patch adds in support to accept the "#" character as part of an Identifier.
- This support is needed especially for the HLASM dialect since "#" is treated as part of the valid "Alphabet" range
- The way this is done is by making use of the previous precedent set by the `AllowAtInIdentifier` field in `MCAsmLexer.h`. A new field called `AllowHashInIdentifier` is introduced.
- The static function `IsIdentifierChar` is also updated to accept the `#` character if the `AllowHashInIdentifier` field is set to true.
Note: The field introduced in `MCAsmLexer.h` could very well be moved to `MCAsmInfo.h`. I'm not opposed to it. I decided to put it in `MCAsmLexer` since there seems to be some sort of precedent already with `AllowAtInIdentifier`.
Reviewed By: abhina.sreeskantharajan, nickdesaulniers, MaskRay
Differential Revision: https://reviews.llvm.org/D99277
- https://reviews.llvm.org/rGb605cfb336989705f391d255b7628062d3dfe9c3 was reverted due to sanitizer bugs in the introduced unit-test (specifically in the Address sanitizer https://lab.llvm.org/buildbot/#/builders/5/builds/5697)
- This patch attempts to rectify that, as well as re-factor parts of the test
- The issue was previously, within the `setupCallToAsmParser` function in the unit-test, `SrcMgr` was declared as a local variable. `SrcMgr` owns a unique pointer. Since the variable goes out of scope at the end of the function, the unique pointer is released.
- This patch, moves the declaration of the `SrcMgr` variable to a class field, since the scope will remain until the class's destructor is invoked (which in this case is at the end of the unit test)
- Furthermore, this patch also moves the `MCContext Ctx` declaration from a local variable instance inside a function, to a unique pointer class field. This ensures the instantiation of the MCContext remains until the tear down of the test.
Reviewed By: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D99004
- Previously, https://reviews.llvm.org/D97703 was [[ https://reviews.llvm.org/D98543 | reverted ]] as it broke when building the unit tests when shared libs on.
- This patch reverts the "revert" and makes two minor changes
- The first is it also links in the MCParser lib when building the unittest. This should resolve the issue when building with with shared libs on and off
- The second renames the name of the unit test from `SystemZAsmLexer` to `SystemZAsmLexerTests` since the convention for unittest binaries is to suffix the name of the unit test with "Tests"
Reviewed By: Kai
Differential Revision: https://reviews.llvm.org/D98666
- This patch adds in support for the ordinary HLASM comment syntax asm
statements (Reference - Chapter 7, Comment Statements, Ordinary Comment
Statements)
- In brief, the ordinary comment syntax if used, must begin with the "*"
character
- To achieve this, this patch makes use of the CommentString attribute
provided in the base MCAsmInfo class
- In the SystemZMCAsmInfo class, the CommentString attribute was set to
"*" based on the assembler dialect
- Furthermore, a new attribute RestrictCommentString, is provided to only
treat a string as a comment if it appears at the start of the asm
statement. Example: "jo *-4" is valid in HLASM (jump back 4 bytes from
current point - similar to jo -4 in gnu asm) and we don't want "*-4" to
be treated as a comment.
- RFC for HLASM Parser support implementation: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html
Reviewed By: scott.linder, Kai
Differential Revision: https://reviews.llvm.org/D97703
There is an explicit option for the lexer to support this, but we crash
when `-preserve-comments` is enabled because it checks for
`getTok().getString().empty()` to detect the case. This doesn't
work currently because the lexer reports this case as a string of length
1, containing a null byte.
Change the lexer to instead report this case via an empty string, as the
null terminator isn't logically a part of the textual input, and the
check for `.empty()` seems natural and obvious in the calling code.
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D92681
Allow single-quoted strings and double-quoted character values, as well as doubled-quote escaping.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D89731
Add support for .radix directive, and radix specifiers [yY] (binary), [oOqQ] (octal), and [tT] (decimal).
Also, when lexing MASM integers, require radix specifier; MASM requires that all literals without a radix specifier be treated as in the default radix. (e.g., 0100 = 100)
Relanding D87400, now with fewer ms-inline-asm tests broken!
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D88337
Add support for .radix directive, and radix specifiers [yY] (binary), [oOqQ] (octal), and [tT] (decimal).
Also, when lexing MASM integers, require radix specifier; MASM requires that all literals without a radix specifier be treated as in the default radix. (e.g., 0100 = 100)
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D87400
Summary:
Many directives are unavailable, and support for others may be limited.
This first draft has preliminary support for:
- conditional directives (including errors),
- data allocation (unsigned types up to 8 bytes, and ALIGN),
- equates/variables (numeric and text),
- and procedure directives (without parameters),
as well as COMMENT, ECHO, INCLUDE, INCLUDELIB, PUBLIC, and EXTERN. Text variables (aka text macros) are expanded in-place wherever the identifier occurs.
We deliberately ignore all ml.exe processor directives.
Prominent features not yet supported:
- structs
- macros (both procedures and functions)
- procedures (with specified parameters)
- substitution & expansion operators
Conditional directives are complicated by the fact that "ifdef rax" is a valid way to check if a file is being assembled for a 64-bit x86 processor; we add support for "ifdef <register>" in general, which requires adding a tryParseRegister method to all MCTargetAsmParsers. (Some targets require backtracking in the non-register case.)
Reviewers: rnk, thakis
Reviewed By: thakis
Subscribers: kerbowa, merge_guards_bot, wuzish, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, mgorny, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72680
This patch has three related fixes to improve float literal lexing:
1. Make AsmLexer::LexDigit handle floats without a decimal point more
consistently.
2. Make AsmLexer::LexFloatLiteral print an error for floats which are
apparently missing an "e".
3. Make APFloat::convertFromString use binutils-compatible exponent
parsing.
Together, this fixes some cases where a float would be incorrectly
rejected, fixes some cases where the compiler would crash, and improves
diagnostics in some cases.
Patch by Brandon Jones.
Differential Revision: https://reviews.llvm.org/D57321
llvm-svn: 357214
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
This makes it easier/cleaner to generate a single signature from
this directive. Also:
- Adds the symbol name, such that we don't depend on the location
of this directive anymore.
- Actually constructs the signature in the assembler, and make the
assembler own it.
- Refactor the use of MVT vs ValType in the streamer and assembler
to require less conversions overall.
- Changed 700 or so tests to use it.
Reviewers: sbc100, dschuff
Subscribers: jgravelle-google, eraman, aheejin, sunfish, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D54652
llvm-svn: 347228
Summary:
This renames the IsParsingMSInlineAsm member variable of AsmLexer to
LexMasmIntegers and moves it up to MCAsmLexer. This is the only behavior
controlled by that variable. I added a public setter, so that it can be
set from outside or from the llvm-mc command line. We may need to
arrange things so that users can get this behavior from clang, but
that's future work.
I also put additional hex literal lexing functionality under this flag
to fix PR32973. It appears that this hex literal parsing wasn't intended
to be enabled in non-masm-style blocks.
Now, masm integers (0b1101 and 0ABCh) work in __asm blocks from clang,
but 0b label references work when using .intel_syntax in standalone .s
files.
However, 0b label references will *not* work from __asm blocks in clang.
They will work from GCC inline asm blocks, which it sounds like is
important for Crypto++ as mentioned in PR36144.
Essentially, we only lex masm literals for inline asm blobs that use
intel syntax. If the .intel_syntax directive is used inside a gnu-style
inline asm statement, masm literals will not be lexed, which is
compatible with gas and llvm-mc standalone .s assembly.
This fixes PR36144 and PR32973.
Reviewers: Gerolf, avt77
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D53535
llvm-svn: 345189
Differential Revision: https://reviews.llvm.org/D39737
This is the second attempt to commit this. The test was broken on Linux in the first attempt.
llvm-svn: 318560
This will prevent doubling of line endings when parsing assembly and
emitting assembly.
Otherwise we'd parse the directive, consume the end of statement, hit
the next end of statement, and emit a fresh newline.
llvm-svn: 315943
I found that llvm-mc does not like non-english characters even in comments,
which it tries to tokenize.
Problem happens because of functions like isdigit(), isalnum() which takes
int argument and expects it is not negative.
But at the same time MCParser uses char* to store input buffer poiner, char has signed value,
so it is possible to pass negative value to one of functions from above and
that triggers an assert.
Testcase for demonstration is provided.
To fix the issue helper functions were introduced in StringExtras.h
Differential revision: https://reviews.llvm.org/D38461
llvm-svn: 314883
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
This allows clients to register an AsmCommentConsumer with the MCAsmLexer,
which receives a callback each time a comment is parsed.
Differential Revision: https://reviews.llvm.org/D27511
llvm-svn: 289036
Retrying after buildbot reset.
To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.
This fixes PR28921.
Reviewers: rnk, loladiro
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24839
llvm-svn: 283111
To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.
This fixes PR28921.
Reviewers: rnk, loladiro
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24839
llvm-svn: 282992
1. 0xNN and NNh are accepted as valid hexadecimal numbers, but 0xNNh is not.
0xNN and NNh may come with optional U or L suffix.
2. NNb is accepted as a valid binary (base-2) number, but 0bNN is not.
NNb may come with optional U or L suffix.
Differential Revision: https://reviews.llvm.org/D22112
llvm-svn: 280555
Summary:
They are now lexed as a single token on targets where
MCAsmInfo::HasMipsExpressions is true and then parsed in a similar way to
the '~' operator as part of MCExpr::parseExpression.
As a result:
* expressions and immediates no longer have different parsing rules. The
difference is now solely down to whether evaluateAsAbsolute() succeeds.
* %hi(%neg(%gp_rel(x))) are no longer parsed as a single operator and
decomposed into the three MipsMCExpr nodes. They are parsed directly as
three MipsMCExpr nodes.
* parseMemOperand no longer needs to eat all the surrounding parenthesis
to get at the outermost operator to make this work
* %hi(%neg(%gp_rel(x))) and %lo(%neg(%gp_rel(x))) are no longer the only
3-in-1 relocs that parse for N64. They're still the only combinations that
are permitted in relocatable expressions though. Fixing that should be a
later patch.
* We no longer need to list all the tokens that can occur as the first token of
an expression or immediate.
test/MC/Mips/expr1.s:
This change also prevents the incorrect lowering of %lo(2*4)+foo to
%lo(8+foo) which is not an equivalent expression (the difference is
whether foo is truncated to 16-bit or not) and the test has been
updated to account for the macro expansion the correct expression requires.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D23110
llvm-svn: 277988
Recommiting after fixing non-atomic insert to front of SmallVector in
MCAsmLexer.h
Add explicit Comment Token in Assembly Lexing for future support for
outputting explicit comments from inline assembly. As part of this,
CPPHash Directives are now explicitly distinguished from Hash line
comments in Lexer.
Line comments are recorded as EndOfStatement tokens, not Comment tokens
to simplify compatibility with current TargetParsers. This slightly
complicates comment output.
This remove all lexing tasks out of the parser, does minor cleanup
to remove extraneous newlines Asm Output, and some improvements white
space handling.
Reviewers: rtrieu, dwmw2, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20009
llvm-svn: 273007
Add explicit Comment Token in Assembly Lexing for future support for
outputting explicit comments from inline assembly. As part of this,
CPPHash Directives are now explicitly distinguished from Hash line
comments in Lexer.
Line comments are recorded as EndOfStatement tokens, not Comment tokens
to simplify compatibility with current TargetParsers. This slightly
complicates comment output.
This remove all lexing tasks out of the parser, does minor cleanup
to remove extraneous newlines Asm Output, and some improvements white
space handling.
Reviewers: rtrieu, dwmw2, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20009
llvm-svn: 272953
Do not issue lexing errors found during the parsing of macro body
definitions and parseIdentifier function in AsmParser. This changes the
Parser to not issue a lexing error when we reach an error, but rather
when it is consumed allowing us time to examine and recover from an
error.
As a result, of this, we stop issuing a both lexing error and a parsing
error in floating-literals test. Minor tweak to parseDirectiveRealValue
to favor more meaningful lexing error over less helpful parse error.
Reviewers: rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20535
llvm-svn: 271542