The 16 bank LDS case is complicated due to using multiple
instructions. If I attempt to write a pattern for it, the generated
selector incorrectly places the copy to m0 after the first
instruction, so that needs to be separately addressed.
Also fix not gluing the copy to m0 to the second operation in the
second half of the 16 bank lowering.
This flag was originally part of D70157, but was removed as we carved away pieces of the review. Since we have the nop support checked in, and it appears mature(*), I think it's time to add the master flag. For now, it will default to nop padding, but once the prefix padding support lands, we'll update the defaults.
(*) I can now confirm that downstream testing of the changes which have landed to date - nop padding and compiler support for suppressions - is passing all of the functional testing we've thrown at it. There might still be something lurking, but we've gotten enough coverage to be confident of the basic approach.
Note that the new flag can be used either when assembling an .s file, or when using the integrated assembler directly from the compiler. The later will use all of the suppression mechanism and should always generate correct code. We don't yet have assembly syntax for the suppressions, so passing this directly to the assembler w/a raw .s file may result in broken code. Use at your own risk.
Also note that this isn't the wiring for the clang option. I think the most recent review for that is D72227, but I've lost track, so that might be off.
Differential Revision: https://reviews.llvm.org/D72738
Summary:
This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80".
The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable.
To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP).
Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations.
When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant )
It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt).
Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma, andreadb
Reviewed By: efriedma
Subscribers: gbedwell, john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70680
Summary:
This adds checks that the expected error was actually reported against
the correct instruction, and fixes a couple of problems that that showed
up: one incorrect W32-ERR:
v_cmp_class_f16_sdwa vcc, v1, v2 src0_sel:DWORD src1_sel:DWORD
// W64: encoding: [0xf9,0x04,0x1e,0x7d,0x01,0x00,0x06,0x06]
-// W32-ERR: error: invalid operand for instruction
+// W32-ERR: error: {{instruction not supported on this GPU|invalid operand for instruction}}
and one missing W32-ERR:
v_cmp_class_f16_sdwa s[6:7], v1, v2 src0_sel:DWORD src1_sel:DWORD
// W64: encoding: [0xf9,0x04,0x1e,0x7d,0x01,0x86,0x06,0x06]
+// W32-ERR: error: invalid operand for instruction
Reviewers: rampitec, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72611
Summary:
This adds assembler tests for cases that were previously only in the
disassembler tests, and vice versa.
Reviewers: rampitec, arsenm, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72592
Summary:
The Pointer Authentication Extension (PAC) was added in Armv8.3-A. Some
instructions are implemented in the HINT space to allow compiling code
common to CPUs regardless of whether they feature PAC or not, and still
benefit from PAC protection in the PAC-enabled CPUs.
The 8.3-specific mnemonics were currently enabled in any architecture, and
LLVM was emitting them in assembly files when PAC code generation was
enabled. This was ok for compilations where both LLVM codegen and the
integrated assembler were used. However, the LLVM codegen was not
compatible with other assemblers (e.g. GAS). Given the fact that the
approach from these assemblers (i.e. to disallow Armv8.3-A mnemonics if
compiling for Armv8.2-A or lower) is entirely reasonable, this patch makes
LLVM to emit HINT when building for Armv8.2-A and below, instead of
PACIASP, AUTIASP and friends. Then, LLVM assembly should be compatible
with other assemblers.
Reviewers: samparker, chill, LukeCheeseman
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71658
When compiling position-independent executables, we now use
DW_EH_PE_pcrel | DW_EH_PE_sdata4. However, the MIPS ABI does not define a
64-bit PC-relative ELF relocation so we cannot use sdata8 for the large
code model case. When using the large code model, we fall back to the
previous behaviour of generating absolute relocations.
With this change clang-generated .o files can be linked by LLD without
having to pass -Wl,-z,notext (which creates text relocations).
This is simpler than the approach used by ld.bfd, which rewrites the
.eh_frame section to convert absolute relocations into relative references.
I saw in D13104 that apparently ld.bfd did not accept pc-relative relocations
for MIPS ouput at some point. However, I also checked that recent ld.bfd
can process the clang-generated .o files so this no longer seems true.
Reviewed By: atanasyan
Differential Revision: https://reviews.llvm.org/D72228
Summary:
AMO memory operands use a custom parser in order to accept both (reg)
and 0(reg). However, the validation predicate used for these operands
was only checking that they were registers, and not the register class,
so non-GPRs (such as FPRs) were also accepted. Thus, fix this by making
the predicate check that they are GPRs.
Reviewers: asb, lenary
Reviewed By: asb, lenary
Subscribers: hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72471
For a target symbol defined in the same section, currently we don't emit
a relocation if VariantKind is VK_None (with few exceptions like RISC-V
relaxation), while GNU as emits one. This causes program behavior
differences with and without -ffunction-sections, and can break intended
symbol interposition in a -shared link.
```
.globl foo
foo:
call foo # no relocation. On other targets, may be written as b foo, etc
call bar # a relocation if bar is in another section (e.g. -ffunction-sections)
call foo@plt # a relocation
```
Unify these cases by always emitting a relocation. If we ever want to
optimize `call foo` in -shared links, we should emit a STB_LOCAL alias
and call via the alias.
ARM/thumb2-beq-fixup.s: we now emit a relocation to global_thumb_fn as GNU as does.
X86/Inputs/align-branch-64-2.s: we now emit R_X86_64_PLT32 to foo as GNU does.
ELF/relax.s: rewrite the test as target-in-same-section.s .
We omitted relocations to `global` and now emit R_X86_64_PLT32.
Note, GNU as does not emit a relocation for `jmp global` (maybe its own
bug). Our new behavior is compatible except `jmp global`.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D72197
Summary:
This adds assembler tests for cases that were previously only in the
disassembler tests, and vice versa.
Reviewers: rampitec, arsenm, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72561
The primary motivation of this change is to bring the code more closely in sync behavior wise with the assembler's version of nop emission. I'd like to eventually factor them into one, but that's hard to do when one has features the other doesn't.
The longest encodeable nop on x86 is 15 bytes, but many processors - for instance all intel chips - can't decode the 15 byte form efficiently. On those processors, it's better to use either a 10 byte or 11 byte sequence depending.
Summary:
This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80".
The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable.
To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP).
Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations.
When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant )
It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt).
Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma
Reviewed By: efriedma
Subscribers: john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70680
The patch gives out the details of the znver2 scheduler model.
There are few improvements with respect to execution units, latencies and
throughput when compared with znver1.
The tests that were present for znver1 for llvm-mca tool were replicated.
The latencies, execution units, timeline and throughput information are updated for znver2.
Reviewers: craig.topper, Simon Pilgrim
Differential Revision: https://reviews.llvm.org/D66088
Summary:
This is analogous to D58943, which correctly finds the corresponding
fixup. However, when linker relaxations are disabled and we evaluate the
fixup, we need to also ensure we use an offset of 0 rather than the size
of the previous fragment.
Reviewers: asb, efriedma, lenary
Reviewed By: efriedma
Subscribers: hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71978
Summary:
Amend MS offset operator implementation, to more closely fit with its MS counterpart:
1. InlineAsm: evaluate non-local source entities to their (address) location
2. Provide a mean with which one may acquire the address of an assembly label via MS syntax, rather than yielding a memory reference (i.e. "offset asm_label" and "$asm_label" should be synonymous
3. address PR32530
Based on http://llvm.org/D37461
Fix broken test where the break appears unrelated.
- Set up appropriate memory-input rewrites for variable references.
- Intel-dialect assembly printing now correctly handles addresses by adding "offset".
- Pass offsets as immediate operands (using "r" constraint for offsets of locals).
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D71436
If we have references to the same extern_weak in multiple objects,
all of them would generate external symbols with the same name. Make
them static to avoid duplicate definitions; nothing should need to
refer to this symbol outside of the current object.
GCC/binutils seems to handle the same by not using a fixed string
for the ".default" suffix, but instead using the name of some other
defined external symbol from the same object (which is supposed to
be unique among objects unless there's other duplicate definitions).
Differential Revision: https://reviews.llvm.org/D71711
This is in the context of the automatic padding work for the jcc erratum mitigation. These are example cases we need to *not* pad for correctness. Exact mechanism to suppress is still TBD, but saving the tests which have come up.
WARNING: If you're looking at this patch because you're looking for a full
performace mitigation of the Intel JCC Erratum, this is not it!
This is a preliminary patch on the patch towards mitigating the performance
regressions caused by Intel's microcode update for Jump Conditional Code
Erratum. For context, see:
https://www.intel.com/content/www/us/en/support/articles/000055650.html
The patch adds the required assembler infrastructure and command line options
needed to exercise the logic for INTERNAL TESTING. These are NOT public flags,
and should not be used for anything other than LLVM's own testing/debugging
purposes. They are likely to change both in spelling and meaning.
WARNING: This patch is knowingly incorrect in some cornercases. We need, and
do not yet provide, a mechanism to selective enable/disable the padding.
Conversation on this will continue in parellel with work on extending this
infrastructure to support prefix padding.
The goal here is to have the assembler align specific instructions such that
they neither cross or end at a 32 byte boundary. The impacted instructions are:
a. Conditional jump.
b. Fused conditional jump.
c. Unconditional jump.
d. Indirect jump.
e. Ret.
f. Call.
The new options for llvm-mc are:
-x86-align-branch-boundary=NUM aligns branches within NUM byte boundary.
-x86-align-branch=TYPE[+TYPE...] specifies types of branches to align.
A new MCFragment type, MCBoundaryAlignFragment, is added, which may emit
NOP to align the fused/unfused branch.
alignBranchesBegin inserts MCBoundaryAlignFragment before instructions,
alignBranchesEnd marks the end of the branch to be aligned,
relaxBoundaryAlign grows or shrinks sizes of NOP to align the target branch.
Nop padding is disabled when the instruction may be rewritten by the linker,
such as TLS Call.
Process Note: I am landing a patch by skan as it has been LGTMed, and
continuing to iterate on the review is simply slowing us down at this point.
We can and will continue to iterate in tree.
Patch By: skan
Differential Revision: https://reviews.llvm.org/D70157
Summary: Instead of crashing due to the `llvm_unreachable`, provide a proper
error when invalid fixups/relocations are encountered.
Reviewers: asb, lenary
Reviewed By: asb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71536
(This commit restores the original branch (4272372c57) and applies an
additional change dropped from the original in a bad merge. This change
should address the previous bot failures. Both changes reviewed by pete.)
Summary:
This commit builds upon Derek Schuff's 2014 commit for attaching labels to
existing fragments ( Diff Revision: http://reviews.llvm.org/D5915 )
When temporary labels appear ahead of a fragment, MCObjectStreamer will
track the temporary label symbol in a "Pending Labels" list. Labels are
associated with fragments when a real fragment arrives; otherwise, an empty
data fragment will be created if the streamer's section changes or if the
stream finishes.
This commit moves the "Pending Labels" list into each MCStream, so that
this label-fragment matching process is resilient to section changes. If
the streamer emits a label in a new section, switches to another section to
do other work, then switches back to the first section and emits a
fragment, that initial label will be associated with this new fragment.
Labels will only receive empty data fragments in the case where no other
fragment exists for that section.
The downstream effects of this can be seen in Mach-O relocations. The
previous approach could produce local section relocations and external
symbol relocations for the same data in an object file, and this mix of
relocation types resulted in problems in the ld64 Mach-O linker. This
commit ensures relocations triggered by temporary labels are consistent.
Reviewers: pete, ab, dschuff
Reviewed By: pete, dschuff
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71368
GNU uses `l` for SHF_X86_64_LARGE and `y` for SHF_ARM_PURECODE.
Lets follow.
To do this I had to refactor and refine how we print the help flags description.
It was too generic and inconsistent with GNU readelf.
Differential revision: https://reviews.llvm.org/D71464
Summary:
These instructions were added to the spec proposal in
https://github.com/WebAssembly/simd/pull/126. Their semantics are
equivalent to `(a + b + 1) / 2`. The opcode for the experimental
i32x4.dot_i16x8_s is also bumped due to a collision with the
i8x16.avgr_u opcode.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71628
Now that our `.s` format is stable(ish) and useable we should really
convert all our MC and lld tests over to .s format to match other
targets.
This is a test PR that just converts 2 of our MC tests to see what
it might look like.
Differential Revision: https://reviews.llvm.org/D71506
Summary:
This commit builds upon Derek Schuff's 2014 commit for attaching labels to
existing fragments ( Diff Revision: http://reviews.llvm.org/D5915 )
When temporary labels appear ahead of a fragment, MCObjectStreamer will
track the temporary label symbol in a "Pending Labels" list. Labels are
associated with fragments when a real fragment arrives; otherwise, an empty
data fragment will be created if the streamer's section changes or if the
stream finishes.
This commit moves the "Pending Labels" list into each MCStream, so that
this label-fragment matching process is resilient to section changes. If
the streamer emits a label in a new section, switches to another section to
do other work, then switches back to the first section and emits a
fragment, that initial label will be associated with this new fragment.
Labels will only receive empty data fragments in the case where no other
fragment exists for that section.
The downstream effects of this can be seen in Mach-O relocations. The
previous approach could produce local section relocations and external
symbol relocations for the same data in an object file, and this mix of
relocation types resulted in problems in the ld64 Mach-O linker. This
commit ensures relocations triggered by temporary labels are consistent.
Reviewers: pete, ab, dschuff
Reviewed By: pete, dschuff
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71368
Summary:
This copy ensures that debug location information is kept for
compressed instructions. There are places where both compressInstruction and
uncompressInstruction are called that were not doing this copy, discarding some
debug info.
This change merely moves the copy into the generated file, so you cannot forget
to copy the location over when compressing or uncompressing.
Reviewers: asb, luismarques
Reviewed By: luismarques
Subscribers: sameer.abuasal, aprantl, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67493
Fix PR44284. This is probably not valid assembly but we should not crash.
Reviewed By: luporl, #powerpc, steven.zhang
Differential Revision: https://reviews.llvm.org/D71443
I've noticed that when we have all regular flags set, we print "WAEXMSILoGTx"
instead of "WAXMSILOGTCE" printed by GNU readelf.
It happens because:
1) We print SHF_EXCLUDE at the wrong place.
2) We do not recognize SHF_COMPRESSED, we print "x" instead of "C".
3) We print "o" instead of "O" for SHF_OS_NONCONFORMING.
This patch fixes differences and adds test cases.
Differential revision: https://reviews.llvm.org/D71418
This is equivalent to the existing `import_name` and `import_module`
attributes which control the import names in the final wasm binary
produced by lld.
This maps the existing
This attribute currently requires a string rather than using the
symbol name for a couple of reasons:
1. Avoid confusion with static and dynamic linking which is
based on symbol name. Exporting a function from a wasm module using
this directive is orthogonal to both static and dynamic linking.
2. Avoids name mangling.
Differential Revision: https://reviews.llvm.org/D70520
This adds support for printing improved missing feature error messages
from the assembler, which now indicates which feature caused the parse
to fail.
Differential Revision: https://reviews.llvm.org/D69899
Summary:
In rG643ac6c0420b, the syntax `ldraa x1, [x0]!` was added as an alias
for `ldraa x1, [x0, #0]!`. That syntax is less obvious in meaning, and
also will not be accepted by assemblers that haven't been updated yet.
So it would be better not to emit it as the preferred disassembly for
that instruction.
This change lowers the EmitPriority of the new alias so that the more
explicit syntax `[x0, #0]!` is preferred by the disassembler. The new
syntax is still accepted by the assembler.
Reviewers: ab, ostannard
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70813
The third parameter to Streamer.EmitSymbolValue() is "bool
IsSectionRelative = false".
For ELF, these debug sections are mapped to address zero, so a normal,
absolute address relocation works just fine, but COFF needs a section
relative relocation, and COFF is the only target where
needsDwarfSectionOffsetDirective() returns true. This matches how
EmitSymbolValue is called elsewhere in the same source file.
Differential Revision: https://reviews.llvm.org/D70661
There are a couple of bugs with the sc, scs, ll, lld instructions expanding:
1. On R6 these instruction pack immediate offset into a 9-bit field. Now
if an immediate exceeds 9-bits assembler does not perform expansion and
just rejects such instruction.
2. On 64-bit non-PIC code if an operand is a symbol assembler generates
incorrect sequence of instructions. It uses R_MIPS_HI16 and R_MIPS_LO16
relocations and skips R_MIPS_HIGHEST and R_MIPS_HIGHER ones.
To solve these problems this patch:
- Introduces `mem_simm9_exp` to mark 9-bit memory immediate operands
which require expansion. Probably later all `mem_simm9` operands will be
able to migrate on `mem_simm9_exp` and we rename it to `mem_simm9`.
- Adds new `OPERAND_MEM_SIMM9` operand type and assigns it to the
`mem_simm9_exp`. That allows to know operand size in the `processInstruction`
method and decide whether we need to expand instruction.
- Adds `expandMem9Inst` method to expand instructions with 9-bit memory
immediate operand. This method just load immediate into a "base"
register used by origibal instruction:
sc $2, 256($sp) => addiu $1, $sp, 256
sc $2, 0($1)
- Fix `expandMem16Inst` to support a correct set of relocations for
symbol loading in case of 64-bit non-PIC code.
ll $12, symbol => lui $12, 0
R_MIPS_HIGHEST symbol
daddiu $12, $12, 0
R_MIPS_HIGHER symbol
dsll $12, $12, 16
daddiu $12, $12, 0
R_MIPS_HI16 symbol
dsll $12, $12, 16
ll $12, 0($12)
R_MIPS_LO16 symbol
- Fix `expandMem16Inst` to unify handling of 3 and 4 operands
instructions.
- Delete unused now `MipsTargetStreamer::emitSCWithSymOffset` method.
Task for next patches - implement expanding for other instructions use
`mem_simm9` operand and other `mem_simm##` operands.
Differential Revision: https://reviews.llvm.org/D70648
Fix for PR24072:
X86 instructions jrcxz/jecxz/jcxz performs short jumps if rcx/ecx/cx register is 0
The maximum relative offset for a forward short jump is 127 Bytes (0x7F).
The maximum relative offset for a backward short jump is 128 Bytes (0x80).
Gnu assembler warns when the distance of the jump exceeds the maximum but llvm-as does not.
Patch by Konstantin Belochapka and Alexey Lapshin
Differential Revision: https://reviews.llvm.org/D70652