Summary:
When parsing 64-bit MASM, treat memory operands with unspecified base register as RIP-based.
Documented in several places, including https://software.intel.com/en-us/articles/introduction-to-x64-assembly: "Unfortunately, MASM does not allow this form of opcode, but other assemblers like FASM and YASM do. Instead, MASM embeds RIP-relative addressing implicitly."
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D73227
This prevents the outlined functions from pulling in a lot of unnecessary code
in our downstream libraries/linker. Which stops outlining making codesize
worse in c++ code with no-exceptions.
Differential Revision: https://reviews.llvm.org/D57254
The abbreviations for a given compilation unit end with an entry
consisting of a 0 byte for the abbreviation code.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D82933
The pass to split atomic and non-atomic RISC-V pseudo-instructions was itself
split into two passes in D79635 / commit rG2cb0644f90b7, with the splitting of
non-atomic instructions being moved to the PreSched2 phase. A comment was
added to D79635 detailing a case where this caused problems, so this commit
moves the non-atomic split pass back to the PreEmitPass2 phase. This allows
the bulk of the changes from D79635 to remain committed, while addressing the
the reported problem (the pass split is now almost NFC). Once the root problem
is fixed we can move the (non-atomic) instruction splitting pass back to
earlier in the pipeline.
Summary: Support symbol with offset value as a VEMCExpr.
Reviewers: simoll, k-ishizaka
Reviewed By: simoll
Subscribers: hiraditya, llvm-commits
Tags: #llvm, #ve
Differential Revision: https://reviews.llvm.org/D82734
The pass to split atomic and non-atomic RISC-V pseudo-instructions was itself
split into two passes in D79635 / commit rG2cb0644f90b7, with the splitting of
non-atomic instructions being moved to the PreSched2 phase. A comment was
added to D79635 detailing a case where this caused problems, so this commit
moves the non-atomic split pass back to the PreEmitPass2 phase. This allows
the bulk of the changes from D79635 to remain committed, while addressing the
the reported problem (the pass split is now almost NFC). Once the root problem
is fixed we can move the (non-atomic) instruction splitting pass back to
earlier in the pipeline.
As per documentation of `hasPairLoad`:
"`RequiredAlignment` gives the minimal alignment constraints that must be met to be able to select this paired load."
In this sense, `0` is strictly equivalent to `1`. We make this obvious by using `Align` instead of unsigned.
There is only one implementor of this interface.
Differential Revision: https://reviews.llvm.org/D82958
It's perfectly valid to do certain DAG combines where we extract
subvectors from a concat vector when we have scalable vector types.
However, we can do this in a way that avoids generating compiler
warnings by replacing calls to getVectorNumElements() with
getVectorMinNumElements(). Due to the way subvector extracts are
designed to work with scalable vector types this is ok.
This eliminates some warnings from existing tests in this file:
llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
Differential Revision: https://reviews.llvm.org/D82655
The situation where the caller uses a TOC and the callee does not
but is marked as clobbers the TOC (st_other=1) was not being compiled
correctly if both functions where in the same object file.
The call site where we had `callee` was missing a nop after the call.
This is because it was assumed that since the two functions where in
the same DSO they would be sharing a TOC. This is not the case if the
callee uses PC Relative because in that case it may clobber the TOC.
This patch makes sure that we add the cnop correctly so that the
linker has a place to restore the TOC.
Reviewers: sfertile, NeHuang, saghir
Differential Revision: https://reviews.llvm.org/D81126
D82257/rG3521ecf1f8a3 was incorrectly sign-extending a constant vector from the lsb, this is fine if all the constant elements are 'allsignbits' in the active bits, but if only some of the elements are, then we are corrupting the constant values for those elements.
This fix ensures we sign extend from the msb of the active/demanded bits instead.
Summary:
This implements two hooks that attempt to avoid control flow for RISC-V. RISC-V
will lower SELECTs into control flow, which is not a great idea.
The hook `hasMultipleConditionRegisters()` turns off the following
DAGCombiner folds:
select(C0|C1, x, y) <=> select(C0, x, select(C1, x, y))
select(C0&C1, x, y) <=> select(C0, select(C1, x, y), y)
The second hook `setJumpIsExpensive` controls a flag that has a similar purpose
and is used in CodeGenPrepare and the SelectionDAGBuilder.
Both of these have the effect of ensuring more logic is done before fewer jumps.
Note: with the `B` extension, we may be able to lower select into a conditional
move instruction, so at some point these hooks will need to be guarded based on
enabled extensions.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D79268
Change imm with timm in pattern for SI_INIT_EXEC_LO and
remove regbank mappings for non register operands.
Differential Revision: https://reviews.llvm.org/D82885
Summary:
This is a fix for PR45009.
When working on D67492 I made DwarfExpression emit a single
DW_OP_entry_value operation covering the whole composite location
description that is produced if a register does not have a valid DWARF
number, and is instead composed of multiple register pieces. Looking
closer at the standard, this appears to not be valid DWARF. A
DW_OP_entry_value operation's block can only be a DWARF expression or a
register location description, so it appears to not be valid for it to
hold a composite location description like that.
See DWARFv5 sec. 2.5.1.7:
"The DW_OP_entry_value operation pushes the value that the described
location held upon entering the current subprogram. It has two
operands: an unsigned LEB128 length, followed by a block containing a
DWARF expression or a register location description (see Section
2.6.1.1.3 on page 39)."
Here is a dwarf-discuss mail thread regarding this:
http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2020-March/004610.html
There was not a strong consensus reached there, but people seem to lean
towards that operations specified under 2.6 (e.g. DW_OP_piece) may not
be part of a DWARF expression, and thus the DW_OP_entry_value operation
can't contain those.
Perhaps we instead want to emit a entry value operation per each
DW_OP_reg* operation, e.g.:
- DW_OP_entry_value(DW_OP_regx sub_reg0),
DW_OP_stack_value,
DW_OP_piece 8,
- DW_OP_entry_value(DW_OP_regx sub_reg1),
DW_OP_stack_value,
DW_OP_piece 8,
[...]
The question then becomes how the call site should look; should a
composite location description be emitted there, and we then leave it up
to the debugger to match those two composite location descriptions?
Another alternative could be to emit a call site parameter entry for
each sub-register, but firstly I'm unsure if that is even valid DWARF,
and secondly it seems like that would complicate the collection of call
site values quite a bit. As far as I can tell GCC does not emit any
entry values / call sites in these cases, so we do not have something to
compare with, but the former seems like the more reasonable approach.
Currently when trying to emit a call site entry for a parameter composed
of multiple DWARF registers a (DwarfRegs.size() == 1) assert is
triggered in addMachineRegExpression(). Until the call site
representation is figured out, and until there is use for these entry
values in practice, this commit simply stops the invalid DWARF from
being emitted.
Reviewers: djtodoro, vsk, aprantl
Reviewed By: djtodoro, vsk
Subscribers: jyknight, hiraditya, fedor.sergeev, jrtc27, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D75270
We currently lower SDIV to SDIV_MERGE_OP1. This forces the value
for inactive lanes in a way that can hamper register allocation,
however, the lowering has no requirement for inactive lanes.
Instead this patch replaces SDIV_MERGE_OP1 with SDIV_PRED thus
freeing the register allocator. Once done the only user of
SDIV_MERGE_OP1 is intrinsic lowering so I've removed the node
and perform ISel on the intrinsic directly. This also allows
us to implement MOVPRFX based zeroing in the same manner as SUB.
This patch also renames UDIV_MERGE_OP1 and [F]ADD_MERGE_OP1 for
the same reason but in the ADD cases the ISel code is already
as required.
Differential Revision: https://reviews.llvm.org/D82783
clang-cl passes -x86-asm-syntax=intel to the cc1 invocation so that
assembly listings produced by the /FA flag are printed in Intel dialect.
That flag however should not affect the *parsing* of inline assembly in
the program. (See r322652)
When compiling normally, AsmPrinter::emitInlineAsm is used for
assembling and defaults to At&t dialect. However, when compiling for
ThinLTO, the code which parses module level inline asm to find symbols
for the symbol table was failing to set the dialect. This patch fixes
that. (See the bug for more details.)
Differential revision: https://reviews.llvm.org/D82862
In case of more than wavesize CSR SGPR spills, lanes of reserved VGPR were getting
overwritten due to wrap around.
Reserve a VGPR (when NumVGPRSpillLanes = 0, WaveSize, 2*WaveSize, ..) and when one
of the two conditions is true:
1. One reserved VGPR being tracked by VGPRReservedForSGPRSpill is not yet reserved.
2. All spill lanes of reserved VGPR(s) are full and another spill lane is required.
Reviewed By: arsenm, kerbowa
Differential Revision: https://reviews.llvm.org/D82463
While validating live-out values, record instructions that look like
a reduction. This will comprise of a vector op (for now only vadd),
a vorr (vmov) which store the previous value of vadd and then a vpsel
in the exit block which is predicated upon a vctp. This vctp will
combine the last two iterations using the vmov and vadd into a vector
which can then be consumed by a vaddv.
Once we have determined that it's safe to perform tail-predication,
we need to change this sequence of instructions so that the
predication doesn't produce incorrect code. This involves changing
the register allocation of the vadd so it updates itself and the
predication on the final iteration will not update the falsely
predicated lanes. This mimics what the vmov, vctp and vpsel do and
so we then don't need any of those instructions.
Differential Revision: https://reviews.llvm.org/D75533
Condition `secondReg` is checked both in an outer and in an inner `if`
statement in static function `canCompareBeNewValueJump()` in file
`HexagonNewValueJump.cpp`. This patch removes the redundant inner check.
The issue was found using `clang-tidy` check under review
`misc-redundant-condition`. See https://reviews.llvm.org/D81272.
Differential Revision: https://reviews.llvm.org/D82556
Condition `LiteralCount` is checked both in an outer and in an inner
`if` statement in `SIInstrInfo::verifyInstruction()`. This patch removes
the redundant inner check.
The issue was found using `clang-tidy` check under review
`misc-redundant-condition`. See https://reviews.llvm.org/D81272.
Differential Revision: https://reviews.llvm.org/D82555
Andrii discovered a problem where a simple case similar to below
will generate wrong relocation kind:
enum { FIELD_EXISTENCE = 2, };
struct s1 { int a1; };
int test() {
struct s1 *v = 0;
return __builtin_preserve_field_info(v[0], FIELD_EXISTENCE);
}
The expected relocation kind should be FIELD_EXISTENCE, but
recorded reloc kind in the final object file is FIELD_BYTE_OFFSET,
which is incorrect.
This exposed a bug in generating access strings from intrinsics.
The current access string generation has two steps:
step 1: find the base struct/union type,
step 2: traverse members in the base type.
The current implementation relies on at lease one member access
in step 2 to get the correct relocation kind, which is true
in typical cases. But if there is no member accesses, the current
implementation falls to the default info kind FIELD_BYTE_OFFSET.
This is incorrect, we should still record the reloc kind
based on the user input. This patch fixed this issue by properly
recording the reloc kind in such cases.
Differential Revision: https://reviews.llvm.org/D82932
It's pretty silly to diagnose on a scalar copy but the build does that:
loop variable 'SibReg' of type 'const llvm::Register' creates a copy from type 'const llvm::Register' [-Wrange-loop-analysis]
With an undef operand, it's possible for getVRegDef to fail and return
null. This is an edge case very little code bothered to
consider. Proper gMIR should use G_IMPLICIT_DEF instead.
I initially tried to apply this restriction to all SSA MIR, so then
getVRegDef would never fail anywhere. However, ProcessImplicitDefs
does technically run while the function is in SSA. ProcessImplicitDefs
and DetectDeadLanes would need to either move, or a new pseudo-SSA
type of function property would need to be introduced.
Basically a NFC, but allows subclasses access to the entire PeelingModuloScheduleExpander
class. We are doing this to allow backends, particularly one that are not necessarily
upstreamed, to inherit from PeelingModuloScheduleExpander and access its basic structures.
Renames Info into LoopInfo for consistency in PeelingModuloScheduleExpander.
Differential Revision: https://reviews.llvm.org/D82673
There are two uses of TM (instance of TargetMachine) when checking options.
These will not work once we enable GlobalISel. This patch replaces those uses of
TM with Subtarget->getTargetMachine().
This function picks X86 opcode name based on type, masking,
and whether not a load or broadcast has been folded using multiple
switch statements. The contents of the switches mostly just vary in
a few characters in the instruction name. So use some macros to
build the instruction names to reduce the repetiveness.
This patch adds the td definitions and asm/disasm tests for the
following instructions:
XXSPLTIW
XXSPLTIDP
XXSPLTI32DX
XXPERMX
XXBLENDVB
XXBLENDVH
XXBLENDVW
XXBLENDVD
VSLDBI
VSRDBI
Differential Revision: https://reviews.llvm.org/D82896
It's messy to pattern-match, and completely unnecessary: scalar indexes
work equally well.
See also discussion on D81620 and D82061.
Differential Revision: https://reviews.llvm.org/D82430
The indexing was messed up, so the result was completely broken.
Shuffle constant exprs are rare in practice; without vscale types,
constant folding generally elminates them. So sort of hard to trip over.
Fixes regression from D72467.
(Recommitting after fix for memory leak.)
Differential Revision: https://reviews.llvm.org/D80330