r200064 depends on r200051.
r200051 is broken: I tries to replace .mips_hack_elf_flags, which is a good
thing, but what it replaces it with is even worse.
The new emitMipsELFFlags it adds corresponds to no assembly directive, is not
marked as a hack and is not even printed to the .s file.
The patch also introduces more uses of hasRawTextSupport.
The correct way to remove .mips_hack_elf_flags is to have the mips target
streamer handle the default flags (and command line options). That way the
same code path is used for asm and obj. The streamer interface should *really*
correspond to what is printed in the .s file.
llvm-svn: 200078
This patch uses a common MipsTargetSteamer interface for both
MipsAsmPrinter and MipsAsmParser for recording default and commandline
driven directives that affect ELF header flags.
It has been noted that the .ll tests affected by this patch belong in
test/Codegen/Mips. I will move them in a separate patch.
Also, a number of directives do not get expressed by AsmPrinter in the
resultant .s assembly such as setting the correct ASI. I have noted this
in the tests and they will be addressed in later patches.
llvm-svn: 200051
scale factors in memory addresses. As it does for .att_syntax.
It was producing:
Assertion failed: (((Scale == 1 || Scale == 2 || Scale == 4 || Scale == 8)) && "Invalid scale!"), function CreateMem, file /Volumes/SandBox/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp, line 1133.
rdar://14967214
llvm-svn: 199942
This patch updates .set mips16 support which
affects the ELF ABI and its flags. In addition the patch uses
a common interface for both the MipsTargetSteamer and
MipsObjectStreamer that the assembler uses for
both ELF and ASCII output for these directives.
llvm-svn: 199851
Add support to llvm-readobj to decode the actual opcodes. The ARM EHABI opcodes
are a variable length instruction set that describe the operations required for
properly unwinding stack frames.
The primary motivation for this change is to ease the creation of tests for the
ARM EHABI object emission as well as the unwinding directive handling in the ARM
IAS.
Thanks to Logan Chien for an extra test case!
llvm-svn: 199708
This implements the unwind_raw directive for the ARM IAS. The unwind_raw
directive takes the form of a stack offset value followed by one or more bytes
representing the opcodes to be emitted. The opcode emitted will interpreted as
if it were assembled by the opcode assembler via the standard unwinding
directives.
Thanks to Logan Chien for an extra test!
llvm-svn: 199707
The .personalityindex directive is equivalent to the .personality directive with
the ARM EABI personality with the specific index (0, 1, 2). Both of these
directives indicate personality routines, so enhance the personality directive
handling to take into account personalityindex.
Bonus fix: flush the UnwindContext at the beginning of a new function.
Thanks to Logan Chien for additional tests!
llvm-svn: 199706
The addition of IC_OPSIZE_ADSIZE in r198759 wasn't quite complete. It
also turns out to have been unnecessary. The disassembler handles the
AdSize prefix for itself, and doesn't care about the difference between
(e.g.) MOV8ao8 and MOB8ao8_16 definitions. So just let them coexist and
don't worry about it.
llvm-svn: 199654
The disassembler has a special case for 'L' vs. 'W' in its heuristic for
checking for 32-bit and 16-bit equivalents. We could expand the heuristic,
but better just to be consistent in using the 'L' suffix.
llvm-svn: 199652
When disassembling in 16-bit mode the meaning of the OpSize bit is
inverted. Instructions found in the IC_OPSIZE context will actually
*not* have the 0x66 prefix, and instructions in the IC context will
have the 0x66 prefix. Make use of the existing special-case handling
for the 0x66 prefix being in the wrong place, to cope with this.
llvm-svn: 199650
For FCMEQ, FCMGE, FCMGT, FCMLE and FCMLT, floating point zero will be
printed as #0.0 instead of #0. To support the history codes using #0,
we consider to let asm parser accept both #0.0 and #0.
llvm-svn: 199621
Ensure that the tag types are reflected on a replacement. This is particularly
important for the compatibility tag which has multiple representations where the
last definition wins.
llvm-svn: 199577
Fix MLA defs to use register class GPRnopc.
Add encoding tests for multiply instructions.
(Alias for MUL/SMLAL/UMLAL added by r199026.)
Patch by Zhaoshi.
llvm-svn: 199491
ARM assembly syntax uses @ for a comment, execpt for the second
parameter of the .symver directive which requires @ as part of the
symbol name. This commit fixes the parsing of this directive by
adding a special case for ARM for this one argumnet.
To make the change we had to move the AllowAtInIdentifier variable
to the MCAsmLexer interface (from AsmLexer) and expose a setter for
the value. The ELFAsmParser then toggles this value when parsing
the second argument to the .symver directive for a target that
uses @ as a comment symbol
llvm-svn: 199339
MSVC on x64 requires that we create image relative symbol
references to refer to RTTI data. Seeing as how there is no way to
explicitly make reference to a given relocation type in LLVM IR, pattern
match expressions of the form &foo - &__ImageBase.
Differential Revision: http://llvm-reviews.chandlerc.com/D2523
llvm-svn: 199312
The GNU as behavior is a bit different and very strange. It will mark any
label that contains an instruction. We can implement that, but using the
type looks more natural since gas will not mark a function if a .word is
used to output the instructions!
llvm-svn: 199287
This finishes the job started in r198756, and creates separate opcodes for
64-bit vs. 32-bit versions of the rest of the RET instructions too.
LRETL/LRETQ are interesting... I can't see any justification for their
existence in the SDM. There should be no 'LRETL' in 64-bit mode, and no
need for a REX.W prefix for LRETQ. But this is what GAS does, and my
Sandybridge CPU and an Opteron 6376 concur when tested as follows:
asm __volatile__("pushq $0x1234\nmovq $0x33,%rax\nsalq $32,%rax\norq $1f,%rax\npushq %rax\nlretl $8\n1:");
asm __volatile__("pushq $1234\npushq $0x33\npushq $1f\nlretq $8\n1:");
asm __volatile__("pushq $0x33\npushq $1f\nlretq\n1:");
asm __volatile__("pushq $0x1234\npushq $0x33\npushq $1f\nlretq $8\n1:");
cf. PR8592 and commit r118903, which added LRETQ. I only added LRETIQ to
match it.
I don't quite understand how the Intel syntax parsing for ret
instructions is working, despite r154468 allegedly fixing it. Aren't the
explicitly sized 'retw', 'retd' and 'retq' supposed to work? I have at
least made the 'lretq' work with (and indeed *require*) the 'q'.
llvm-svn: 199106
The target specific parser should return `false' if the target AsmParser handles
the directive, and `true' if the generic parser should handle the directive.
Many of the target specific directive handlers would `return Error' which does
not follow these semantics. This change simply changes the target specific
routines to conform to the semantis of the ParseDirective correctly.
Conformance to the semantics improves diagnostics emitted for the invalid
directives. X86 is taken as a sample to ensure that multiple diagnostics are
not presented for a single error.
llvm-svn: 199068
An improper qualifier would result in a superfluous error due to the parser not
consuming the remainder of the statement. Simply consume the remainder of the
statement to avoid the error.
llvm-svn: 199035
The implicit immediate 0 forms are assembly aliases, not distinct instruction
encodings. Fix the initial implementation introduced in r198914 to an alias to
avoid two separate instruction definitions for the same encoding.
An InstAlias is insufficient in this case as the necessary due to the need to
add a new additional operand for the implicit zero. By using the AsmPsuedoInst,
fall back to the C++ code to transform the instruction to the equivalent
_POST_IMM form, inserting the additional implicit immediate 0.
llvm-svn: 199032
A 32-bit immediate value can be formed from a constant expression and loaded
into a register. Add support to emit this into an object file. Because this
value is a constant, a relocation must *not* be produced for it.
llvm-svn: 199023
The disassembler would no longer be able to disambiguage between the two
variants (explicit immediate #0 vs implicit, omitted #0) for the ldrt, strt,
ldrbt, strbt mnemonics as both versions indicated the disassembler routine.
llvm-svn: 198944
The GNU assembler supports prefixing the expression with a '#' to indiciate that
the value that is being moved is infact a constant. This improves the
compatibility of the integrated assembler's parser for this.
llvm-svn: 198916
The GNU assembler has an extension that allows for the elision of the paired
register (dt2) for the LDRD and STRD mnemonics. Add support for this in the
assembly parser. Canonicalise the usage during the instruction parsing from
the specified version.
llvm-svn: 198915
The ARM ARM indicates the mnemonics as follows:
ldrbt{<c>}{<q>} <Rt>, [<Rn>], {, #+/-<imm>}
ldrt{<c>}{<q>} <Rt>, [<Rn>] {, #+/-<imm>}
strbt{<c>}{<q>} <Rt>, [<Rn>] {, #<imm>}
strt{<c>}{<q>} <Rt>, [<Rn>] {, #+/-<imm>}
This improves the parser to deal with the implicit immediate 0 for the mnemonics
as per the specification.
Thanks to Joerg Sonnenberger for the tests!
llvm-svn: 198914
We can't do a perfect job here. We *have* to allow (%dx) even in 64-bit
mode, for example, because it might be used for an unofficial form of
the in/out instructions. We actually want to do a better job of validation
*later*. Perhaps *instead* of doing it where we are at the moment.
But for now, doing what validation we *can* do in the place that the code
already has its validation, is an improvement.
llvm-svn: 198760
It seems there is no separate instruction class for having AdSize *and*
OpSize bits set, which is required in order to disambiguate between all
these instructions. So add that to the disassembler.
Hm, perhaps we do need an AdSize16 bit after all?
llvm-svn: 198759
Where "where possible" means that it's an immediate value and it's below
0x10000. In fact GAS will either truncate or error with larger values,
and will insist on using the addr32 prefix to get 32-bit addressing. So
perhaps we should do that, in a later patch.
llvm-svn: 198758
JCXZ should have the 0x67 prefix only if we're in 32-bit mode, so make that
appropriately conditional. And JECXZ needs the prefix instead.
llvm-svn: 198757
I couldn't see how to do this sanely without splitting RETQ from RETL.
Eric says: "sad about the inability to roundtrip them now, but...".
I have no idea what that means, but perhaps it wants preserving in the
commit comment.
llvm-svn: 198756
This fixes the bulk of 16-bit output, and the corresponding test case
x86-16.s now looks mostly like the x86-32.s test case that it was
originally based on. A few irrelevant instructions have been dropped,
and there are still some corner cases to be fixed in subsequent patches.
llvm-svn: 198752
Operands which involved label arithemetic would previously fail to parse. This
corrects that by adding the additional case for the shift operand validation.
llvm-svn: 198735
take type from the new symbol but merge them so that the type
is never "downgraded".
This is probably quite rare, except for IFUNC symbols which
we used to misassemble, losing the IFUNC type.
Fixes#18372.
llvm-svn: 198706
This commit adds the pre-UAL aliases of fconsts and fconstd for
vmov.f32 and vmov.f64. They use an InstAlias rather than a
MnemonicAlias to properly support the predicate operand.
We need to support encoded 8-bit constants in order to implement the
pre-UAL fconsts/fconstd aliases for vmov.f32/vmov.f64, so this
commit also fixes parsing of encoded floating point constants used
in vmov.f32/vmov.f64 instructions. Now we can support assembly code
like this:
fconsts s0, #0x70
which is equivalent to vmov.f32 s0, #1.0.
Most of the code was already in place to support this feature.
Previously the code was trying to accept encoded 8-bit float
constants for the vmov.f32/vmov.f64 instructions. It looks like the
support for parsing encoded floats was lost in a refactoring in
commit r148556 and we did not have any tests in place to catch it.
The change in this commit is to keep the parsed value as a 32-bit
float instead of a 64-bit double because that is what the isFPImm()
function expects to find. There is no loss of precision by using a
32-bit float here because we are still limited to an 8-bit encoded
value in the end.
Additionally, we explicitly reject encoded 8-bit floats for
vmovf.32/64. This is the same as the current behavior, but we now do
it explicitly rather than accidently.
llvm-svn: 198697
Move the unwinding context for the ARM IAS into a helper class. This is purely
a structural refactoring. A follow up change allows for recording additional
depth to improve diagnostics.
llvm-svn: 198664
Parse tag names as well as expressions. The former is part of the
specification, the latter is for improved compatibility with the GNU assembler.
Fix attribute value handling to be comformant to the specification.
llvm-svn: 198662
Introduce a new virtual method Note into the AsmParser. This completements the
existing Warning and Error methods. Use the new method to clean up the output
of the unwind routines in the ARM AsmParser.
llvm-svn: 198661
This patch adds .abicalls and .set pic0 support which
affects the ELF ABI and its flags. In addition the patch uses
a common interface for both the MipsTargetSteamer and
MipsObjectStreamer that both the integrated and standalone
assemblers will use for the output for these directives.
llvm-svn: 198646
The 0x66 prefix toggles between 16-bit and 32-bit addressing mode.
So in 32-bit mode it is used to switch to 16-bit addressing mode for the
following instruction, while in 16-bit mode it's the other way round — it's
used to switch to 32-bit mode instead.
Thus, emit the 0x66 prefix byte for OpSize only in 32-bit (and 64-bit) mode,
and introduce a new OpSize16 bit which is used in 16-bit mode instead.
This is just the basic infrastructure for that change; a subsequent patch
will add the new OpSize16 bit to the 32-bit instructions that need it.
Patch from David Woodhouse.
llvm-svn: 198586
This is not really expected to work right yet. Mostly because we will
still emit the OpSize (0x66) prefix in all the wrong places, along with
a number of other corner cases. Those will all be fixed in the subsequent
commits.
Patch from David Woodhouse.
llvm-svn: 198584
Add some tests to validate correct register selection, including a fix
to an existing test which was requiring the *wrong* output.
Patch from David Woodhouse.
llvm-svn: 198566
Checking the trailing letter of the mnemonic is insufficient. Be more thorough
in the scanning of the instruction to ensure that we correctly work with the
predicated mnemonics.
llvm-svn: 198235
The GNU assembler supports .rep as an alias for .rept. This simply creates the
alias for it and introduces a test for both .rept and .rep.
llvm-svn: 198097
The bkpt mnemonic has an implicit immediate constant of 0 unless otherwise
specified. Add an instruction alias for the unvalued breakpoint mnemonic to
treat it as a 0. This improves compatibility with GNU AS.
Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org>
llvm-svn: 197913
this commit as the only one on the Blamelist so I quickly reverted this.
However it was actually Nick's change who has since fixed that issue.
Original commit message:
Changed the X86 assembler for intel syntax to work with directional labels.
The X86 assembler as a separate code to parser the intel assembly syntax
in X86AsmParser::ParseIntelOperand(). This did not parse directional labels.
And if something like 1f was used as a branch target it would get an
"Unexpected token" error.
The fix starts in X86AsmParser::ParseIntelExpression() in the case for
AsmToken::Integer, it needs to grab the IntVal from the current token
then look for a 'b' or 'f' following an Integer. Then it basically needs to
do what is done in AsmParser::parsePrimaryExpr() for directional
labels. It saves the MCExpr it creates in the IntelExprStateMachine
in the Sym field.
When it returns to X86AsmParser::ParseIntelOperand() it looks
for a non-zero Sym field in the IntelExprStateMachine and if
set it creates a memory operand not an immediate operand
it would normally do for the Integer.
rdar://14961158
llvm-svn: 197744
The X86 assembler has a separate code to parser the intel assembly syntax
in X86AsmParser::ParseIntelOperand(). This did not parse directional labels.
And if something like 1f was used as a branch target it would get an
"Unexpected token" error.
The fix starts in X86AsmParser::ParseIntelExpression() in the case for
AsmToken::Integer, it needs to grab the IntVal from the current token
then look for a 'b' or 'f' following the Integer. Then it basically needs to
do what is done in AsmParser::parsePrimaryExpr() for directional
labels. It saves the MCExpr it creates in the IntelExprStateMachine
in the Sym field.
When it returns to X86AsmParser::ParseIntelOperand() it looks
for a non-zero Sym field in the IntelExprStateMachine and if
set it creates a memory operand not an immediate operand
it would normally do for the Integer.
rdar://14961158
llvm-svn: 197728
This directive will write out the assembler-maintained constant
pool for the current section. These constant pools are created to
support the ldr-pseudo instruction (e.g. ldr r0, =val).
The directive can be used by the programmer to place the constant
pool in a location that can be reached by a pc-relative offset in
the ldr instruction.
llvm-svn: 197711
The ldr-pseudo opcode is a convenience for loading 32-bit constants.
It is converted into a pc-relative load from a constant pool. For
example,
ldr r0, =0x10001
ldr r1, =bar
will generate this output in the final assembly
ldr r0, .Ltmp0
ldr r1, .Ltmp1
...
.Ltmp0: .long 0x10001
.Ltmp1: .long bar
Sketch of the LDR pseudo implementation:
Keep a map from Section => ConstantPool
When parsing ldr r0, =val
parse val as an MCExpr
get ConstantPool for current Section
Label = CreateTempSymbol()
remember val in ConstantPool at next free slot
add operand to ldr that is MCSymbolRef of Label
On finishParse() callback
Write out all non-empty constant pools
for each Entry in ConstantPool
Emit Entry.Label
Emit Entry.Value
Possible improvements to be added in a later patch:
1. Does not convert load of small constants to mov
(e.g. ldr r0, =0x1 => mov r0, 0x1)
2. Does reuse constant pool entries for same constant
The implementation was tested for ARM, Thumb1, and Thumb2 targets on
linux and darwin.
llvm-svn: 197708
The tests for the disassembler were adapted from the encoder tests, and for the
most part, the output from the disassembler matches that encoder-test inputs.
There are some places where more-informative mnemonics could be produced
(notably for the branch instructions), and those cases are noted in the tests
with FIXMEs.
Future work includes:
- Generating more-informative mnemonics when possible (this may also be done
in the printer).
- Remove the dependence on positional "numbered" operand-to-variable mapping
(for both encoding and decoding).
- Internally using 64-bit instruction variants in 64-bit mode (if this turns
out to matter).
llvm-svn: 197693
This adds support for the .inst directive. This is an ARM specific directive to
indicate an instruction encoded as a constant expression. The major difference
between .word, .short, or .byte and .inst is that the latter will be
disassembled as an instruction since it does not get flagged as data.
llvm-svn: 197657
The .end directive indicates the end of the file. No further instructions are
processed after a .end directive is encountered.
One potential (glaringly obvious) optimisation that could be pursued here is to
extend MCAsmParser with a DiscardRemainder method to avoid processing lexemes to
the end of the file. It was unclear at this point if that would be worth
adding, and could easily be added in a follow on change.
Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org>
llvm-svn: 197547
Without this, assembling clang's disassembly would produce an object
file with the IMAGE_SCN_CNT_INITIALIZED_DATA section characteristic
rather than the uninitialized one. link.exe would warn when merging
comdats with different flags.
llvm-svn: 197529
were falling into the cases for 24-bit branch kinds which are not 24-bit
branches. The routine is to return false for fixups are expected to always
be resolvable at assembly time. Which these three fixups are as they have
limited displacement and are for local references within a function.
rdar://15586725
llvm-svn: 197282
branch instructions for mips and micromips instruction sets thus avoiding
the situation of generating branches to undesired locations if offsets
cannot be encoded.
This patch also checks if a fixup cannot be applied and returns a fatal error
if that's the case.
llvm-svn: 197223
As we can't make a complete solution now it has been decided to enable .set directive to handle long jump expressions. This will cause parser to report errors when parsing integer based register assignments, for example:
.set r3, will be reported as error. Still, the need for expressions is higher priority as the integer based register assignments are Mips specific and can be avoided using register names.
llvm-svn: 196773
The integrated assembler fails to properly lex arm comments when
they are adjacent to an identifier in the input stream. The reason
is that the arm comment symbol '@' is also used as symbol variant in
other assembly languages so when lexing an identifier it allows the
'@' symbol as part of the identifier.
Example:
$ cat comment.s
foo:
add r0, r0@got to parse this as a comment
$ llvm-mc -triple armv7 comment.s
comment.s:4:18: error: unexpected token in argument list
add r0, r0@got to parse this as a comment
^
This should be parsed as correctly as `add r0, r0`.
This commit modifes the assembly lexer to not include the '@' symbol
in identifiers when lexing for targets that use '@' for comments.
llvm-svn: 196607
not being correctly encoded/decoded.
In more detail, immediate fields of LD/ST instructions should be
divided/multiplied by the size of the data format before encoding and
after decoding, respectively.
llvm-svn: 196494
ELF_Other_Weakref and ELF_Other_ThumbFunc seems to be LLVM
internal ELF symbol flags. These should not be emitted to
object file.
This commit defines ELF_STO_Shift for the target-defined
flags for st_other, and increase the value of
ELF_Other_Shift to 16.
llvm-svn: 196440
Where it would use a scattered relocation entry but falls back to a
normal relocation entry because the FixupOffset is more than 24-bits.
The bug is in the X86MachObjectWriter::RecordScatteredRelocation() where
it changes reference parameter FixedValue but then returns false to indicate
it did not create a scattered relocation entry. The fix is simply to save the
original value of the parameter FixedValue at the start of the method and
restore it if we are returning false in that case.
rdar://15526046
llvm-svn: 196432
ARM symbol variants are written with parens instead of @ like this:
.word __GLOBAL_I_a(target1)
This commit adds support for parsing these symbol variants in
expressions. We introduce a new flag to MCAsmInfo that indicates the
parser should use parens to parse the symbol variant. The expression
parser is modified to look for symbol variants using parens instead
of @ when the corresponding MCAsmInfo flag is true.
The MCAsmInfo parens flag is enabled only for ARM on ELF.
By adding this flag to MCAsmInfo, we are able to get rid of
redundant ARM-specific symbol variants and use the generic variants
instead (e.g. VK_GOT instead of VK_ARM_GOT). We use the new
UseParensForSymbolVariant attribute in MCAsmInfo to correctly print
the symbol variants for arm.
To achive this we need to keep a handle to the MCAsmInfo in the
MCSymbolRefExpr class that we can check when printing the symbol
variant.
Updated Tests:
Changed case of symbol variant to match the generic kind.
test/CodeGen/ARM/tls-models.ll
test/CodeGen/ARM/tls1.ll
test/CodeGen/ARM/tls2.ll
test/CodeGen/Thumb2/tls1.ll
test/CodeGen/Thumb2/tls2.ll
PR18080
llvm-svn: 196424