MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer.
Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it.
Differential Revision: https://reviews.llvm.org/D55565
llvm-svn: 348975
A future patch may stop using MULX by default so use MIR to ensure we're always testing MULX.
Add the 32-bit case that we couldn't do in the 64-bit mode IR test due to it being promoted to a 64-bit mul.
llvm-svn: 348972
Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute.
Differential Revision: https://reviews.llvm.org/D50200
llvm-svn: 348971
Continue to present HSA metadata as YAML in ASM and when output by tools
(e.g. llvm-readobj), but encode it in Messagepack in the code object.
Differential Revision: https://reviews.llvm.org/D48179
llvm-svn: 348963
I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that.
I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful.
Differential Revision: https://reviews.llvm.org/D55414
llvm-svn: 348959
If a module has function references, but no functions
themselves, we may end up never calling runOnMachineFunction
and therefore would never initialize nvptxSubtarget field
which would eventually cause a crash.
Instead of relying on nvptxSubtarget being initialized by
one of the methods, retrieve subtarget info directly.
Differential Revision: https://reviews.llvm.org/D55580
llvm-svn: 348952
This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds.
That allows us to combine add ops with register moves and save some instructions. This is
another step towards allowing add truncation in generic DAGCombiner (see D54640).
Differential Revision: https://reviews.llvm.org/D55494
llvm-svn: 348946
I've extended the load/store optimizer to be able to produce dwordx3
loads and stores, This change allows many more load/stores to be combined,
and results in much more optimal code for our hardware.
Differential Revision: https://reviews.llvm.org/D54042
llvm-svn: 348937
If either of the operand elements are zero then we know the result element is going to be zero (even if the other element is undef).
Differential Revision: https://reviews.llvm.org/D55558
llvm-svn: 348926
Summary:
This patch provides a means to set Metadata section kind
for a global variable, if its explicit section name is
prefixed with ".AMDGPU.metadata."
This could be useful to make the global variable go to
an ELF section without any section flags set.
Reviewers: dstuttard, tpr, kzhuravl, nhaehnle, t-tye
Reviewed By: dstuttard, kzhuravl
Subscribers: llvm-commits, arsenm, jvesely, wdng, yaxunl, t-tye
Differential Revision: https://reviews.llvm.org/D55267
llvm-svn: 348922
Unfortunately we can't use TableGen for this because it doesn't yet
support predicates on the source pattern root. Therefore, add a bit of
handwritten code to the instruction selector to handle the most basic
cases.
Also mark them as legal and extract their legalizer test cases to a new
test file.
llvm-svn: 348920
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D54719
llvm-svn: 348912
Summary:
When doing X86CondBrFolding::analyzeCompare, it will meet the SUB32ri instruction as below to use the global address for its operand,
%733:gr32 = SUB32ri %62:gr32(tied-def 0), @img2buf_normal, implicit-def $eflags
JNE_1 %bb.41, implicit $eflags
so the assertion "assert(MI.getOperand(ValueIndex).isImm() && "Expecting Imm operand")" is not correct and change the assert to if make X86CondBrFolding::analyzeCompare return false as not finding the compare for this
Patch by Jianping Chen
Reviewers: smaslov, LuoYuanke, liutianle, Jianping
Reviewed By: Jianping
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D54250
llvm-svn: 348853
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 348843
Summary:
This patch supports `.eventtype` directive printing and parsing in the
same syntax with `.functype`.
Reviewers: aardappel, sbc100
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55353
llvm-svn: 348818
- Check if an operand is an immediate before calling getImm. Some operands
that take constant values can actually have global symbols or other
constant expressions.
- When a load-constant instruction can be folded into users, make sure to
only delete it when all users have been successfully converted.
llvm-svn: 348802
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.
This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.
Differential Revisions: https://reviews.llvm.org/D53629
llvm-svn: 348788
If all the demanded elements of the SimplifyDemandedVectorElts are known to be UNDEF, we can simplify to an ISD::UNDEF node.
Zero constant folding will be handled in a future patch - its a little trickier as we often have bitcasted zero values.
Differential Revision: https://reviews.llvm.org/D55511
llvm-svn: 348784
This commit changes which l1 flush instruction is used for AMDPAL and
MESA3d workloads to flush the entire l1 cache instead of just the
volatile lines.
Differential Revision: https://reviews.llvm.org/D55367
llvm-svn: 348771
Record the stack protector index in MachineFrameInfo when translating
Intrinsic::stackprotector similarly as is done by SelectionDAG when
processing the same intrinsic.
Setting this index allows the Prologue/Epilogue Insertion to recognize
that the stack protection is enabled. The pass can then make sure that
the stack protector comes before local variables on the stack and
assigns potentially vulnerable objects first so they are close to the
stack protector slot.
Differential Revision: https://reviews.llvm.org/D55418
llvm-svn: 348761
This triggers an assert when combining concat_vectors of a bitcast of
merge_values.
With asserts disabled, it fails to select:
fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8
0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1
0x7ff19d000b50: f64 = Register %1
In function: d
Differential Revision: https://reviews.llvm.org/D55507
llvm-svn: 348759
A new pass to manage the Mode register.
Currently this just manages the floating point double precision
rounding requirements, but is intended to be easily extended to
encompass all Mode register settings.
The immediate motivation comes from the requirement to use the
round-to-zero rounding mode for the 16 bit interpolation
instructions, where the rounding mode setting is shared between
16 and 64 bit operations.
llvm-svn: 348754
This is a fix for PR39896, where dbg.value's of SDNodes that have been
optimised out do not lead to "DBG_VALUE undef" instructions being created.
Such undef instructions are necessary to terminate earlier variable
ranges, otherwise variable values leak past the point where they're valid.
The "invalidated" flag of SDDbgValue is currently being abused to mean two
things:
* The corresponding SDNode is now invalid
* This SDDbgValue should not be emitted
Of which there are several legitimate combinations of meaning:
* The SDNode has been invalidated and we should emit "DBG_VALUE undef"
* The SDNode has been invalidated but the debug data was salvaged, don't
emit anything for this SDDbgValue
* This SDDbgValue has been emitted
This patch introduces distinct "Emitted" and "Invalidated" fields to the
SDDbgValue class, updates users accordingly, and generates "undef"
DBG_VALUEs for invalidated records. Awkwardly, there are circumstances
where we emit SDDbgValue's twice, specifically DebugInfo/X86/dbg-addr-dse.ll
which I've preserved.
Differential Revision: https://reviews.llvm.org/D55372
llvm-svn: 348751
Fixes https://bugs.llvm.org/show_bug.cgi?id=39926.
The size of the first copy was computed as
std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in
skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as
I can see, this should just be LdDisp2 - LdDisp1. The case where
LdDisp1 > LdDisp2 is already handled in the code above, in which case
LdDisp2 is set to LdDisp1 and this subtraction will evaluate to
Size1 = 0, which is the correct value to skip an overlapping copy.
Differential Revision: https://reviews.llvm.org/D55485
llvm-svn: 348750
Both intrinsics do the exact same thing so we really only need one.
Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency.
llvm-svn: 348737
Summary:
`llvm::AttributeList` and `llvm::AttributeSet` are immutable, and so methods
defined on these classes, such as `addAttribute`, return a new immutable
object with the attribute added. In https://reviews.llvm.org/D55217 I attempted
to annotate methods such as `addAttribute` with `LLVM_NODISCARD`, since
calling these methods has no side-effects, and so ignoring the result
that is returned is almost certainly a programmer error.
However, committing the change resulted in new warnings in the AMDGPU target.
The AMDGPU simplify libcalls pass added in https://reviews.llvm.org/D36436
attempts to add the readonly and nounwind attributes to simplified
library functions, but instead calls the `addAttribute` methods and
ignores the result.
Modify the simplify libcalls pass to actually add the nounwind and
readonly attributes. Also update the simplify libcalls test to assert
that these attributes are actually being set.
Reviewers: rampitec, vpykhtin, rnk
Reviewed By: rampitec
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D55435
llvm-svn: 348732
Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother.
This should go a long way towards fixing PR24545.
llvm-svn: 348727
The existing code tries to handle an undef operand while transforming an add to an LEA,
but it's incomplete because we will crash on the i16 test with the debug output shown below.
It's better to just give up instead. Really, GlobalIsel should have folded these before we
could get into trouble.
# Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected
bb.0 (%ir-block.0):
liveins: $edi
%1:gr32 = COPY killed $edi
%0:gr16 = COPY %1.sub_16bit:gr32
%5:gr64_nosp = IMPLICIT_DEF
%5.sub_16bit:gr64_nosp = COPY %0:gr16
%6:gr64_nosp = IMPLICIT_DEF
%6.sub_16bit:gr64_nosp = COPY %2:gr16
%4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg
%3:gr16 = COPY killed %4.sub_16bit:gr32
$ax = COPY killed %3:gr16
RET 0, implicit killed $ax
# End machine code for function add_undef_i16.
*** Bad machine code: Reading virtual register without a def ***
- function: add_undef_i16
- basic block: %bb.0 (0x7fe6cd83d940)
- instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16
- operand 1: %2:gr16
LLVM ERROR: Found 1 machine code errors.
Differential Revision: https://reviews.llvm.org/D54710
llvm-svn: 348722
The test file shows a case where the avoid store forwarding block
pass misses to copy a range (-1..1) when the load displacement
changes sign.
Baseline test for D55485.
llvm-svn: 348712
This is effectively re-committing the changes from:
rL347917 (D54640)
rL348195 (D55126)
...which were effectively reverted here:
rL348604
...because the code had a bug that could induce infinite looping
or eventual out-of-memory compilation.
The bug was that this code did not guard against transforming
opaque constants. More details are in the post-commit mailing
list thread for r347917. A reduced test for that is included
in the x86 bool-math.ll file. (I wasn't able to reduce a PPC
backend test for this, but it was almost the same pattern.)
Original commit message for r347917:
The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.
Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc
sequences that don't get folded in IR.
As the TODO comments suggest, there will be regressions if we extend this (for x86,
we mostly seem to be missing LEA opportunities, but there are likely vector folds
missing too). I think those should be considered existing bugs because this is the
same transform that we do as an IR canonicalization in instcombine. We just need
more tests to make those visible independent of this patch.
llvm-svn: 348706
It seems to be unexpectedly passing on some bots probably because it requires asserts to fail, but doesn't say that. But we already have a patch in review to make it not xfail so I'd rather just focus on getting it passing rather than trying to figure out an unexpected pass.
llvm-svn: 348661