This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.
It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.
llvm-svn: 349664
The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI.
This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect.
Differential Revision: https://reviews.llvm.org/D55870
llvm-svn: 349661
Fixes https://bugs.llvm.org/show_bug.cgi?id=38743
The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap.
But it currently looks only at the previous one, which will miss some cases that result in assert.
This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed.
Patch by Pengfei Wang
Differential Revision: https://reviews.llvm.org/D55642
llvm-svn: 349660
Now that we use the generic ISD opcodes, we can use the generic intrinsics directly as well. This fixes the poor fast-isel codegen by not expanding to an easily broken IR code sequence.
I'm intending to deal with the signed saturation equivalents as well.
Clang counterpart: https://reviews.llvm.org/D55879
Differential Revision: https://reviews.llvm.org/D55855
llvm-svn: 349630
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.
This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.
I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.
Differential Revision: https://reviews.llvm.org/D55822
llvm-svn: 349629
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.
SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.
Thanks to @dmgreen for catching this.
Differential Revision: https://reviews.llvm.org/D55883
llvm-svn: 349625
These are due to be upgraded soon, but good to replace them with generic llvm sadd_sat/ssub_sat intrinsics now.
The avx512 masked cases need doing as well but require a bit of tidyup first.
llvm-svn: 349621
Summary:
Using HI here makes no logical sense, since the dword is only
32 bits to begin with.
Current Mesa master does not look at the relocation type at all,
so this change is fine. Future Mesa will rely on this, however.
Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1
Reviewers: arsenm, rampitec, mareko
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D55369
llvm-svn: 349620
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.
This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.
I've updated SelectionDAG::simplifyShift to demonstrate its use.
Differential Revision: https://reviews.llvm.org/D55819
llvm-svn: 349616
These are already being autoupgraded, currently to an IR sequence, but best to replace them with generic llvm uadd_sat/usub_sat intrinsics (which D55855 will be doing shortly anyhow).
The avx512 masked cases need doing as well but require a bit of tidyup first.
llvm-svn: 349615
Summary:
Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged.
This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds.
Irreducible loop test has been extended based on a CTS failure detected for GFX9.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D55602
llvm-svn: 349611
All we have to do is mark it as legal.
This allows us to select a lot of new patterns handled by TableGen. This
patch adds tests for them and splits up the existing test file for
binary operators into 2 files, one for arithmetic ops and one for
logical ones.
llvm-svn: 349610
For type v4i32/v8ii16/v16i8, do following transforms:
(vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b)
(vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b)
Differential Revision: https://reviews.llvm.org/D55812
llvm-svn: 349599
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's. Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.
llvm-svn: 349552
We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there.
This addresses one issue from PR40090.
llvm-svn: 349531
Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and
AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert.
Author: FarhanaAleen
llvm-svn: 349529
Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic
ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for
@llvm.sadd.sat() and @llvm.ssub.sat() intrinsics.
This is a followup to D55787 and part of PR40056.
Differential Revision: https://reviews.llvm.org/D55833
llvm-svn: 349520
InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant.
This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling.
Fixes some of PR40053
Differential Revision: https://reviews.llvm.org/D55780
llvm-svn: 349519
Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel. Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE.
Reviewers: aditya_nandakumar, bogner
Reviewed By: bogner
Subscribers: rovka, kristof.beyls, javed.absar
Differential Revision: https://reviews.llvm.org/D55668
llvm-svn: 349514
Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM
and use integer type of correct size when creating arguments for
CLI.lowerCall.
Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64
on MIPS32.
Differential Revision: https://reviews.llvm.org/D55651
llvm-svn: 349499
Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes
UADDSAT and USUBSAT. As a side-effect, this also makes codegen for
the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable.
This only replaces use in the X86 backend, and does not move any of
the ADDUS/SUBUS X86 specific combines into generic codegen.
Differential Revision: https://reviews.llvm.org/D55787
llvm-svn: 349481
Add narrowScalar for G_AND and G_XOR.
Legalize G_AND G_OR and G_XOR for types other then s32
with clampScalar on MIPS32.
Differential Revision: https://reviews.llvm.org/D55362
llvm-svn: 349475
- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
string printing later on
Differential Revision: https://reviews.llvm.org/D55774
llvm-svn: 349472
For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well.
llvm-svn: 349466
This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same).
Test change is merely a rescheduling.
llvm-svn: 349459
The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.
At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
in registers.
- using intrinsics to mask of data in registers as indicated by the
programmer (see https://lwn.net/Articles/759423/).
For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
relative abundance of registers, one register is reserved (X16) to be
the taint register. X16 is expected to not clash with other register
reservation mechanisms with very high probability because:
. The AArch64 ABI doesn't guarantee X16 to be retained across any call.
. The only way to request X16 to be used as a programmer is through
inline assembly. In the rare case a function explicitly demands to
use X16/W16, this pass falls back to hardening against speculation
by inserting a DSB SYS/ISB barrier pair which will prevent control
flow speculation.
- It is easy to insert mask operations at this late stage as we have
mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
and contains all-zeros when miss-speculation is detected. Therefore, when
masking, an AND instruction (which only changes the register to be masked,
no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
select instruction (CSEL) to evaluate the flags that were also used to
make conditional branch direction decisions. Speculation of the CSEL
instruction can be limited with a CSDB instruction - so the combination of
CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
aren't speculated. When conditional branch direction gets miss-speculated,
the semantics of the inserted CSEL instruction is such that the taint
register will contain all zero bits.
One key requirement for this to work is that the conditional branch is
followed by an execution of the CSEL instruction, where the CSEL
instruction needs to use the same flags status as the conditional branch.
This means that the conditional branches must not be implemented as one
of the AArch64 conditional branches that do not use the flags as input
(CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
selectors to not produce these instructions when speculation hardening
is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
the taint register X16 to be encoded in the SP register as value 0.
Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
x86-slh-lfence option. This may be more useful for the intrinsics-based
approach than for the SLH approach to masking.
Note that this pass already inserts the full speculation barriers if the
function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
could be done for some indirect branches, such as switch jump tables.
Differential Revision: https://reviews.llvm.org/D54896
llvm-svn: 349456
The default still is dwarf, but SEH exceptions can now be enabled
optionally for the MinGW target.
Differential Revision: https://reviews.llvm.org/D55748
llvm-svn: 349451
Power9 VABSDU* instructions can be exploited for some special vselect sequences.
Check in the orignal test case here, later the exploitation patch will update this
and reviewers can check the differences easily.
llvm-svn: 349446
Improve the current vec_abs support on P9, generate ISD::ABS node for vector types,
combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns,
do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity.
Differential Revision: https://reviews.llvm.org/D54783
llvm-svn: 349437
Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction.
Fixes the poor codegen comments in D55768.
llvm-svn: 349407
Summary:
We use `variable_ops` in the tablegen defs to denote the list of
branch targets in `br_table`, but unlike other uses of `variable_ops`
(e.g. call) the these branch targets need to actually be encoded in the
instruction. The existing tables for `variable_ops` cause not operands
to be accepted by the assembly matcher.
Following the example of ARM:
2cc0a7da87/lib/Target/ARM/ARMInstrInfo.td (L550-L555)
we introduce a new operand type to capture this list, and we use the
same {} syntax as ARM as well to differentiate them from regular
integer operands.
Also removed definition and use of TSFlags in tablegen defs, since
`br_table` now has a non-variable_ops immediate operand, so the
previous logic of only the variable_ops arguments being labels didn't
make sense anymore.
Reviewers: dschuff, aheejin, sunfish
Subscribers: javed.absar, sbc100, jgravelle-google, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55401
llvm-svn: 349405
This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift.
This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users.
This appears to fix the case in PR39968.
llvm-svn: 349385
This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen.
I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well.
Differential Revision: https://reviews.llvm.org/D55768
llvm-svn: 349374
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.
llvm-svn: 349365
These features (fairly) recently got split out into their own feature, so we
should make CodeGen use them when available. The main change here is that the
check used to be based on the triple, but now it's based on CPU features.
llvm-svn: 349355
The Load/Store Optimizer runs before Machine Block Placement. At O3 the
Tail Duplication Threshold is set to 4 instructions and this can create
new opportunities for the Load/Store Optimizer. It seems worthwhile to
run it once again.
llvm-svn: 349338
Appended options -ppc-vsr-nums-as-vr and -ppc-asm-full-reg-names to get the
more descriptive output. Also removed useless function attributes.
llvm-svn: 349329
With some patch adopted for Power9 vabsd* insns, some CHECKs can't get the expected results.
But it's false alarm, we should update the case more robust.
llvm-svn: 349325
I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that.
The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel.
The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly.
llvm-svn: 349315
The transform performs a bitwise logic op in a wider type followed by
truncate when both inputs are truncated from the same source type:
logic_op (truncate x), (truncate y) --> truncate (logic_op x, y)
There are a bunch of other checks that should prevent doing this when
it might be harmful.
We already do this transform for scalars in this spot. The vector
limitation was shared with a check for the case when the operands are
extended. I'm not sure if that limit is needed either, but that would
be a separate patch.
Differential Revision: https://reviews.llvm.org/D55448
llvm-svn: 349303
Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type).
llvm-svn: 349298
Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard......
llvm-svn: 349285
Summary:
Make machine PHIs optimization to work for single value register taken from
several different copies. This is the first step to fix PR38917. This change
allows to get rid of redundant PHIs (see opt_phis2.mir test) to make
the subsequent optimizations (like CSE) possible and simpler.
For instance, before this patch the code like this:
%b = COPY %z
...
%a = PHI %bb1, %a; %bb2, %b
could be optimized to:
%a = %b
but the code like this:
%c = COPY %z
...
%b = COPY %z
...
%a = PHI %bb1, %a; %bb2, %b; %bb3, %c
would remain unchanged.
With this patch the latter case will be optimized:
%a = %z```.
Committed on behalf of: Anton Afanasyev anton.a.afanasyev@gmail.com
Reviewers: RKSimon, MatzeB
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54839
llvm-svn: 349271
Add an original test case for setb before the exploitation actually takes effect, later we can check the difference.
Differential Revision: https://reviews.llvm.org/D55696
llvm-svn: 349251
The change is an effort to split and refactor abandoned
D34708 into smaller parts.
Here the behaviour of unsupported instructions is changed
to match the behaviour of explicit intrinsics calls.
Currently LLVM crashes with:
> Assertion getInstruction() && "Not a call or invoke instruction!" failed.
With this patch LLVM produces a more sensible error message:
> Cannot select: ... i32 = ExternalSymbol'__foobar'
Author: Denys Zariaiev <denys.zariaiev@gmail.com>
Differential Revision: https://reviews.llvm.org/D55145
llvm-svn: 349213
Summary:
This allows us to register it with the MachineFunction delegate and be
notified automatically about erasure and creation of instructions. However,
we still need explicit notification for modifications such as those caused
by setReg() or replaceRegWith().
There is a catch with this though. The notification for creation is
delivered before any operands can be added. While appropriate for
scheduling combiner work. This is unfortunate for debug output since an
opcode by itself doesn't provide sufficient information on what happened.
As a result, the work list remembers the instructions (when debug output is
requested) and emits a more complete dump later.
Another nit is that the MachineFunction::Delegate provides const pointers
which is inconvenient since we want to use it to schedule future
modification. To resolve this GISelWorkList now has an optional pointer to
the MachineFunction which describes the scope of the work it is permitted
to schedule. If a given MachineInstr* is in this function then it is
permitted to schedule work to be performed on the MachineInstr's. An
alternative to this would be to remove the const from the
MachineFunction::Delegate interface, however delegates are not permitted
to modify the MachineInstr's they receive.
In addition to this, the observer has three interface changes.
* erasedInstr() is now erasingInstr() to indicate it is about to be erased
but still exists at the moment.
* changingInstr() and changedInstr() have been added to report changes
before and after they are made. This allows us to trace the changes
in the debug output.
* As a convenience changingAllUsesOfReg() and
finishedChangingAllUsesOfReg() will report changingInstr() and
changedInstr() for each use of a given register. This is primarily useful
for changes caused by MachineRegisterInfo::replaceRegWith()
With this in place, both combine rules have been updated to report their
changes to the observer.
Finally, make some cosmetic changes to the debug output and make Combiner
and CombinerHelp
Reviewers: aditya_nandakumar, bogner, volkan, rtereshin, javed.absar
Reviewed By: aditya_nandakumar
Subscribers: mgorny, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D52947
llvm-svn: 349167
Implement options in clang to enable recording the driver command-line
in an ELF section.
Implement a new special named metadata, llvm.commandline, to support
frontends embedding their command-line options in IR/ASM/ELF.
This differs from the GCC implementation in some key ways:
* In GCC there is only one command-line possible per compilation-unit,
in LLVM it mirrors llvm.ident and multiple are allowed.
* In GCC individual options are separated by NULL bytes, in LLVM entire
command-lines are separated by NULL bytes. The advantage of the GCC
approach is to clearly delineate options in the face of embedded
spaces. The advantage of the LLVM approach is to support merging
multiple command-lines unambiguously, while handling embedded spaces
with escaping.
Differential Revision: https://reviews.llvm.org/D54487
Clang Differential Revision: https://reviews.llvm.org/D54489
llvm-svn: 349155
It costs nothing to spill an IMPLICIT_DEF value (the only spill code that's
generated is a KILL of the value), so when creating split constraints if the
live-out value is IMPLICIT_DEF the exit constraint should be DontCare instead
of PrefReg.
Differential Revision: https://reviews.llvm.org/D55652
llvm-svn: 349151
Mark G_ADD, G_SUB, G_MUL, G_AND, G_OR and G_XOR as legal for both ARM
and Thumb2.
Extract the legalizer tests for these opcodes into another file.
Add tests for the instruction selector.
llvm-svn: 349142
This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from
number of uses to an opcode test for DELETED_NODE based on existing similar code.
Differential Revision: https://reviews.llvm.org/D55655
llvm-svn: 349058
There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches.
llvm-svn: 349052
Summary:
Constraining an integer value to a floating point register using "f"
causes an llvm_unreachable to trigger. This patch allows i32 integers
to be placed in a single precision float register and i64 integers to
be placed in a double precision float register. This matches the behavior
of GCC.
For other types the llvm_unreachable is removed to instead trigger an
error message that points out the offending line.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D51614
llvm-svn: 349045
When computing register allocation hints for a GRX32Bit register, make sure
that any of the hinted registers that are also copy hints are returned first
in the list.
Review: Ulrich Weigand.
llvm-svn: 349037
Summary:
Sometimes MIR-level passes create DILocations that were not present in the
LLVM-IR. For example, it may merge two DILocations together to produce a
DILocation that points to line 0.
Previously, the address of these DILocations were printed which prevented the
MIR from being read back into LLVM. With this patch, DILocations will use
metadata references where possible and fall back on serializing them inline like so:
MOV32mr %stack.0.x.addr, 1, _, 0, _, %0, debug-location !DILocation(line: 1, scope: !15)
Reviewers: aprantl, vsk, arphaman
Reviewed By: aprantl
Subscribers: probinson, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D55243
llvm-svn: 349035
Mark G_SEXT, G_ZEXT and G_ANYEXT to 32 bits as legal and add support for
them in the instruction selector. This uses handwritten code again
because the patterns that are generated with TableGen are tuned for what
the DAG combiner would produce and not for simple sext/zext nodes.
Luckily, we only need to update the opcodes to use the Thumb2 variants,
everything else can be reused from ARM.
llvm-svn: 349026
Adds support for the various RISC-V FMA instructions (fmadd, fmsub, fnmsub, fnmadd).
The criteria for choosing whether a fused add or subtract is used, as well as
whether the product is negated or not, is whether some of the arguments to the
llvm.fma.* intrinsic are negated or not. In the tests, extraneous fadd
instructions were added to avoid the negation being performed using a xor
trick, which prevented the proper FMA forms from being selected and thus
tested.
The FMA instruction patterns might seem incorrect (e.g., fnmadd: -rs1 * rs2 -
rs3), but they should be correct. The misleading names were inherited from
MIPS, where the negation happens after computing the sum.
The llvm.fmuladd.* intrinsics still do not generate RISC-V FMA instructions,
as that depends on TargetLowering::isFMAFasterthanFMulAndFAdd.
Some comments in the test files about what type of instructions are there
tested were updated, to better reflect the current content of those test
files.
Differential Revision: https://reviews.llvm.org/D54205
Patch by Luís Marques.
llvm-svn: 349023
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 349016
MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer.
Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it.
Differential Revision: https://reviews.llvm.org/D55565
llvm-svn: 348975
A future patch may stop using MULX by default so use MIR to ensure we're always testing MULX.
Add the 32-bit case that we couldn't do in the 64-bit mode IR test due to it being promoted to a 64-bit mul.
llvm-svn: 348972
Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute.
Differential Revision: https://reviews.llvm.org/D50200
llvm-svn: 348971
Continue to present HSA metadata as YAML in ASM and when output by tools
(e.g. llvm-readobj), but encode it in Messagepack in the code object.
Differential Revision: https://reviews.llvm.org/D48179
llvm-svn: 348963
I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that.
I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful.
Differential Revision: https://reviews.llvm.org/D55414
llvm-svn: 348959
If a module has function references, but no functions
themselves, we may end up never calling runOnMachineFunction
and therefore would never initialize nvptxSubtarget field
which would eventually cause a crash.
Instead of relying on nvptxSubtarget being initialized by
one of the methods, retrieve subtarget info directly.
Differential Revision: https://reviews.llvm.org/D55580
llvm-svn: 348952
This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds.
That allows us to combine add ops with register moves and save some instructions. This is
another step towards allowing add truncation in generic DAGCombiner (see D54640).
Differential Revision: https://reviews.llvm.org/D55494
llvm-svn: 348946
I've extended the load/store optimizer to be able to produce dwordx3
loads and stores, This change allows many more load/stores to be combined,
and results in much more optimal code for our hardware.
Differential Revision: https://reviews.llvm.org/D54042
llvm-svn: 348937
If either of the operand elements are zero then we know the result element is going to be zero (even if the other element is undef).
Differential Revision: https://reviews.llvm.org/D55558
llvm-svn: 348926
Summary:
This patch provides a means to set Metadata section kind
for a global variable, if its explicit section name is
prefixed with ".AMDGPU.metadata."
This could be useful to make the global variable go to
an ELF section without any section flags set.
Reviewers: dstuttard, tpr, kzhuravl, nhaehnle, t-tye
Reviewed By: dstuttard, kzhuravl
Subscribers: llvm-commits, arsenm, jvesely, wdng, yaxunl, t-tye
Differential Revision: https://reviews.llvm.org/D55267
llvm-svn: 348922
Unfortunately we can't use TableGen for this because it doesn't yet
support predicates on the source pattern root. Therefore, add a bit of
handwritten code to the instruction selector to handle the most basic
cases.
Also mark them as legal and extract their legalizer test cases to a new
test file.
llvm-svn: 348920
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D54719
llvm-svn: 348912
Summary:
When doing X86CondBrFolding::analyzeCompare, it will meet the SUB32ri instruction as below to use the global address for its operand,
%733:gr32 = SUB32ri %62:gr32(tied-def 0), @img2buf_normal, implicit-def $eflags
JNE_1 %bb.41, implicit $eflags
so the assertion "assert(MI.getOperand(ValueIndex).isImm() && "Expecting Imm operand")" is not correct and change the assert to if make X86CondBrFolding::analyzeCompare return false as not finding the compare for this
Patch by Jianping Chen
Reviewers: smaslov, LuoYuanke, liutianle, Jianping
Reviewed By: Jianping
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D54250
llvm-svn: 348853
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 348843
Summary:
This patch supports `.eventtype` directive printing and parsing in the
same syntax with `.functype`.
Reviewers: aardappel, sbc100
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55353
llvm-svn: 348818
- Check if an operand is an immediate before calling getImm. Some operands
that take constant values can actually have global symbols or other
constant expressions.
- When a load-constant instruction can be folded into users, make sure to
only delete it when all users have been successfully converted.
llvm-svn: 348802
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.
This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.
Differential Revisions: https://reviews.llvm.org/D53629
llvm-svn: 348788