scalarizeBinop currently folds
vec_bo((inselt VecC0, V0, Index), (inselt VecC1, V1, Index))
->
inselt(vec_bo(VecC0, VecC1), scl_bo(V0,V1), Index)
This patch extends this to account for cases where one of the vec_bo operands is already all-constant and performs similar cost checks to determine if the scalar binop with a constant still makes sense:
vec_bo((inselt VecC0, V0, Index), VecC1)
->
inselt(vec_bo(VecC0, VecC1), scl_bo(V0,extractelt(V1,Index)), Index)
Fixes PR42174
Differential Revision: https://reviews.llvm.org/D80885
Summary:
It is important to emit HINT instructions instead of BTI ones when
BTI is disabled. This allows compatibility with other assemblers
(e.g. GAS).
Still, developers of assembly code will want to write code that is
compatible with both pre- and post-BTI CPUs. They could use HINT
mnemonics, but the new mnemonics are a lot more readable (e.g.
bti c instead of hint #34), and they will result in the same
encodings. So, while LLVM should not *emit* the new mnemonics when
BTI is disabled, this patch will at least make LLVM *accept*
assembly code that uses them.
Reviewers: pbarrio, tamas.petz, ostannard
Reviewed By: pbarrio, ostannard
Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81257
Same idea as for zip, uzp, etc. Teach the post-legalizer combiner to recognize
G_SHUFFLE_VECTORs that are trn1/trn2 instructions.
- Add G_TRN1 and G_TRN2
- Port mask matching code from AArch64ISelLowering
- Produce G_TRN1 and G_TRN2 in the post-legalizer combiner
- Select via importer
Add select-trn.mir to test selection.
Add postlegalizer-combiner-trn.mir to test the combine. This is similar to the
existing arm64-trn test.
Note that both of these tests contain things we currently don't legalize.
I figured it would be easier to test these now rather than later, since once
we legalize the G_SHUFFLE_VECTORs, it's not guaranteed that someone will update
the tests.
Differential Revision: https://reviews.llvm.org/D81182
Summary:
As specified in https://github.com/WebAssembly/simd/pull/232. These
instructions are implemented as LLVM intrinsics for now rather than
normal ISel patterns to make these instructions opt-in. Once the
instructions are merged to the spec proposal, the intrinsics will be
replaced with proper ISel patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D81222
This patch enables yaml2elf emit the .debug_line section.
Test cases for emitting the dwarf64 .debug_line section and opcodes will be added later.
Known issues:
- We should replace `InitialLength` with `Format` and `Length`
- Currently implementation of the .debug_line section only fully supports DWARFv2, some header fields in DWARFv4 and DWARFv5 is missing, e.g., `header_length` in DWARFv4, `address_size` and `segment_selector_size` in DWARFv5.
- Some opcodes relies on the .debug_info section, we should warn user about it.
These issues will be addressed in a follow-up patch.
Reviewed By: jhenderson, grimar
Differential Revision: https://reviews.llvm.org/D81450
If fmul and fadd are separated by an fma, we can fold them together
to save an instruction:
fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1))
The fold implemented here is actually a specialization - we should
be able to peek through >1 fma to find this pattern. That's another
patch if we want to try that enhancement though.
This transform was guarded by the TLI hook enableAggressiveFMAFusion(),
so it was done for some in-tree targets like PowerPC, but not AArch64
or x86. The hook is protecting against forming a potentially more
expensive computation when fma takes longer to execute than a single
fadd. That hook may be needed for other transforms, but in this case,
we are replacing fmul+fadd with fma, and the fma should never take
longer than the 2 individual instructions.
'contract' FMF is all we need to allow this transform. That flag
corresponds to -ffp-contract=fast in Clang, so we are allowed to form
fma ops freely across expressions.
Differential Revision: https://reviews.llvm.org/D80801
The new line printing for debug line verbose output was inconsistent.
For new rows in the matrix, a blank line followed, whilst the
DW_LNS_copy opcode actually resulted in two blank lines. There was also
potential inconsistency in the blank lines at the end of the table. This
patch mostly resolves these issues - no blank lines appear in the output
except for a single line after the prologue and at table end to separate
it from any subsquent table, plus some instances after error messages.
Also add a unit test for verbose output to test the fine details of new
line placement and other aspects of verbose output.
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D81102
Verbose and non-verbose parsing of .debug_line produced their output at
different points in the program. The most obvious impact of this was
that error messages were produced at different times, but it also
potentially reduced what clients could do by customising the stream or
warning/error handlers.
This change makes the two variants consistent by printing non-verbose
output inline, the same as verbose output.
Testing of the error messages has been modified to check the messages
always appear in the same location to illustrate the behaviour.
Reviewed by: JDevlieghere, dblaikie, MaskRay, labath
Differential Revision: https://reviews.llvm.org/D80989
The flushes previously existed to help ensure consistent error message
output when stdout and stderr were passed to the same location. This is
no longer necessary as errs() is now tied to outs().
Reviewed by: dblaikie, MaskRay, JDevlieghere, labath
Differential Revision: https://reviews.llvm.org/D80803
Summary:
Add DLD/DLDU/DLDL/PFCH/TS1AM/TS2AM/TS3AM/ATMAM/CAS instructions newly.
Add regression tests for them to asmparser, mccodeemitter, and disassembler.
In order to add those instructions, change asmparser to support UImm0to2 and
UImm1 operands, add new decode functions to disassembler, and add new print
functions to instprinter.
Differential Revision: https://reviews.llvm.org/D81454
This patch helps make yaml2obj emit an error message when we try to assign an invalid offset to the entry of the 'debug_ranges' section.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D81357
Summary:
The naked function attribute is meant to suppress all function
prologue/epilogue instructions.
On ARM, some are still emitted if an argument greater than 64 bytes in size
(the threshold for using the byval attribute in IR) is passed partially
in registers.
Perform the check for Attribute::Naked and early exit in
SelectionDAGISel::LowerArguments().
Checking in ARMFrameLowering::determineCalleeSaves() is too late.
A test case is included.
Reviewers: llvm-commits, olista01, danielkiss
Reviewed By: danielkiss
Subscribers: kristof.beyls, hiraditya, danielkiss
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80715
Change-Id: Icedecf2a4ad31bc3c35ab0df7489a9d346e1f7cc
Previously, if an extended opcode was truncated, it would manifest as an
"unexpected line op length error" which wasn't quite accurate. This
change checks for errors any time data is read whilst parsing an
extended opcode, and reports any errors detected.
Reviewed by: MaskRay, labath, aprantl
Differential Revision: https://reviews.llvm.org/D80797
Summary:
This patch adds initial support for the following instrinsics:
* llvm.aarch64.sve.ld2
* llvm.aarch64.sve.ld3
* llvm.aarch64.sve.ld4
For loading two, three and four vectors worth of data. Basic codegen is
implemented with reg+reg and reg+imm addressing modes being addressed
in a later patch.
The types returned by these intrinsics have a number of elements that is a
multiple of the elements in a 128-bit vector for a given type and N, where N is
the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for 32-bit elements
the types are:
LD2 : <vscale x 8 x i32>
LD3 : <vscale x 12 x i32>
LD4 : <vscale x 16 x i32>
This is implemented with target-specific intrinsics for each variant that take
the same operands as the IR intrinsic but return N values, where the type of
each value is a full vector, i.e. <vscale x 4 x i32> in the above example.
These values are then concatenated using the standard concat_vector intrinsic
to maintain type legality with the IR.
These intrinsics are intended for use in the Arm C Language
Extension (ACLE).
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D75751
Summary:
Add regression tests of asmparser, mccodeemitter, and disassembler for
transfer control instructions. Add FENCEI/FENCEM/FENCEC/SVOB instructions
also. Add new instruction format to represent FENCE* instructions too.
Differential Revision: https://reviews.llvm.org/D81440
Summary:
Make use of both the - (1) clustered bytes and (2) cluster length, to decide on
the max number of mem ops that can be clustered. On an average, when loads
are dword or smaller, consider `5` as max threshold, otherwise `4`. This heuristic
is purely based on different experimentation conducted, and there is no analytical
logic here.
Reviewers: foad, rampitec, arsenm, vpykhtin
Reviewed By: foad, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81085
Summary:
Remove an old test for an explicit return in a naked function from test/CodeGen/AVR/return.ll
clang no longer allows a C return in a naked function.
This test is causing failure of the patch https://reviews.llvm.org/D80715
Reviewers: llvm-commits, dylanmckay
Reviewed By: dylanmckay
Subscribers: Jim
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81099
Change-Id: Id218027e520247ae480b92e7801a660fbe0cf29b
Summary:
Currently, MachineVerifier will attempt to verify that tied operands
satisfy register constraints as soon as the function is no longer in
SSA form. However, PHIElimination will take the function out of SSA
form while TwoAddressInstructionPass will actually rewrite tied operands
to match the constraints. PHIElimination runs first in the pipeline.
Therefore, whenever the MachineVerifier is run after PHIElimination,
it will encounter verification errors on any tied operands.
This patch adds a function property called TiedOpsRewritten that will be
set by TwoAddressInstructionPass and will control when the verifier checks
tied operands.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D80538
In two instances of CreateStackTemporary we are sometimes promoting
alignments beyond the stack alignment. I have introduced a new function
called getReducedAlign that will return the alignment for the broken
down parts of illegal vector types. For example, on NEON a <32 x i8>
type is made up of two <16 x i8> types - in this case the sensible
alignment is 16 bytes, not 32.
In the legalization code wherever we create stack temporaries I have
started using the reduced alignments instead for illegal vector types.
I added a test to
CodeGen/AArch64/build-one-lane.ll
that tries to insert an element into an illegal fixed vector type
that involves creating a temporary stack object.
Differential Revision: https://reviews.llvm.org/D80370
The previous implementation used "asm parser only" pseudo-instructions in their
output patterns. Those are not meant to emit code and will caused crashes when
built with -filetype=obj.
Differential Revision: https://reviews.llvm.org/D80151
The function emitRLDICWhenLoweringJumpTables in PPCMIPeephole.cpp
was supposed to convert a pair of RLDICL and RLDICR to a single RLDIC,
but it was leaving out the RLDICL instruction. This PR fixes the bug.
Differential Revision: https://reviews.llvm.org/D78063
Fix the incorrect PC Relative relocations for Big Endian for 34 bit offsets.
The offset should be zero for both BE and LE in this situation.
Differential Revision: https://reviews.llvm.org/D81033
Summary:
This patch adds support of using the result of an expression as an
immediate value. For example,
0:
.skip 4
1:
mov x0, 1b - 0b
is assembled to
mov x0, #4
Currently it does not support expressions requiring relocation unless
explicitly specified. This fixes PR#45781.
Reviewers: peter.smith, ostannard, efriedma
Reviewed By: efriedma
Subscribers: nickdesaulniers, llozano, manojgupta, efriedma, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80028
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object. This
metadata prevents discarding of the global object in linker GC unless
the referenced object is also discarded.
Furthermore, when a function symbol is discarded by the linker, setting
up !associated metadata allows linker to discard counters, data and
values associated with that function symbol. This is not possible today
because there's metadata to guide the linker. This approach is also used
by other instrumentations like sanitizers.
Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.
Differential Revision: https://reviews.llvm.org/D76802
Slice::operator<() has a non-deterministic behavior. If we have
identical slices comparison will depend on the order or operands.
Normally that does not result in unstable compilation results
because the order in which slices are inserted into the vector
is deterministic and llvm::sort() normally behaves as a stable
sort, although that is not guaranteed.
However, there is test option -sroa-random-shuffle-slices which
is used to check exactly this aspect. The vector is first randomly
shuffled and then sorted. The same shuffling happens without this
option under expensive llvm checks.
I have managed to write a test which has hit this problem.
There are no fields in the Slice class to resolve the instability.
We only have offsets, IsSplittable and Use, but neither Use nor
User have anything suitable for predictable comparison.
I have switched to stable_sort which has to be sufficient and
removed that randon shuffle option.
Differential Revision: https://reviews.llvm.org/D81310
Commit d77ae1552f ("[DebugInfo] Support to emit debugInfo
for extern variables") added support to emit debuginfo
for extern variables. Currently, only BPF target enables to
emit debuginfo for extern variables.
But if the extern variable has "void" type, the compilation will
fail.
-bash-4.4$ cat t.c
extern void bla;
void *test() {
void *x = &bla;
return x;
}
-bash-4.4$ clang -target bpf -g -O2 -S t.c
missing global variable type
!1 = distinct !DIGlobalVariable(name: "bla", scope: !2, file: !3, line: 1,
isLocal: false, isDefinition: false)
...
fatal error: error in backend: Broken module found, compilation aborted!
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace,
preprocessed source, and associated run script.
Stack dump:
...
The IR requires a DIGlobalVariable must have a valid type and the
"void" type does not generate any type, hence the above fatal error.
Note that if the extern variable is defined as "const void", the
compilation will succeed.
-bash-4.4$ cat t.c
extern const void bla;
const void *test() {
const void *x = &bla;
return x;
}
-bash-4.4$ clang -target bpf -g -O2 -S t.c
-bash-4.4$ cat t.ll
...
!1 = distinct !DIGlobalVariable(name: "bla", scope: !2, file: !3, line: 1,
type: !6, isLocal: false, isDefinition: false)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: null)
...
Since currently, "const void extern_var" is supported by the
debug info, it is natural that "void extern_var" should also
be supported. This patch disabled assertion of "void extern_var"
in IR verifier and add proper guarding when emiting potential
null debug info type to dwarf types.
Differential Revision: https://reviews.llvm.org/D81131
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object. This
metadata prevents discarding of the global object in linker GC unless
the referenced object is also discarded.
Furthermore, when a function symbol is discarded by the linker, setting
up !associated metadata allows linker to discard counters, data and
values associated with that function symbol. This is not possible today
because there's metadata to guide the linker. This approach is also used
by other instrumentations like sanitizers.
Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.
Differential Revision: https://reviews.llvm.org/D76802
If there are more than 65534 relocation entries in a single section,
we should generate an overflow section.
Since we don't support overflow section for now, we should generate
an error.
Differential revision: https://reviews.llvm.org/D81104
Currently aarch64-ldst-opt will incorrectly rename registers with
multiple disjunct subregisters (e.g. result of LD3). This patch updates
the canRenameUpToDef to bail out if it encounters such a register class
that contains the register to rename.
Fixes PR46105.
Reviewers: efriedma, dmgreen, paquette, t.p.northover
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D81108
LowerSELECT sees the CMP with 0 and wants to use a trick with SUB
and SBB. But we can use the flags from the BSF/TZCNT.
Fixes PR46203.
Differential Revision: https://reviews.llvm.org/D81312
This patch adds a test to check that we do not use an undef renamable
register for renaming the other operand in a LDP instruction, as
suggested in D81108.
The update_*test_checks scripts miss new stuff added at the end of
lines. Regenerate checks so the new mode register operands don't show
up in the diff of a future patch.