This fixes a couple of BtVer2 missing instructions that weren't been handled in the override.
NOTE: There are still a lot of overrides that still need cleaning up!
llvm-svn: 331770
I've created the necessary classes but there are still a lot of overrides that need cleaning up.
NOTE: The Znver1 model was missing some div/idiv variants in the instregex patterns and wasn't setting the resource cycles at all in the overrides.
llvm-svn: 331767
This is a fix for PR30290: by marking all byval stack slots as being aliased,
the instruction scheduler is more conservative about rescheduling memory
accesses to such stack slots as an LLVM Value* might alias it. This fixes
errors such as in the patched test case, where reads and writes to a data
structure are illegally mixed.
This could be fixed better in the future with better analysis for the
instruction scheduler to know what Values alias what stack slots.
Differential Revision: https://reviews.llvm.org/D45022
llvm-svn: 331749
This patch adds a shadow stack fix when compiling
setjmp/longjmp with the shadow stack enabled. This
allows setjmp/longjmp to work correctly with CET.
Patch by mike.dvoretsky
Differential Revision: https://reviews.llvm.org/D46181
llvm-svn: 331748
Making sure we don't truncate / extend pointers, don't try to change
vector topology or bitcast vectors to scalars or back, and most
importantly, don't extend to a smaller type or truncate to a large
one.
Reviewers: qcolombet t.p.northover aditya_nandakumar
Reviewed By: qcolombet
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46490
llvm-svn: 331718
Every generic machine instruction must have generic virtual registers
only, that is, have a low-level type attached to each operand.
Previously MachineVerifier would catch a type missing on an operand
only if the previous operand for the the same type index exists and
have a type attached to it and it will report it as a type mismatch.
This is incosistent behaviour and a misleading error message.
This commit makes sure MachineVerifier explicitly checks that the
types are there for every operand and if not provides a
straightforward error message.
Reviewers: qcolombet t.p.northover bogner ab
Reviewed By: qcolombet
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46455
llvm-svn: 331694
This is an NFC pre-commit for the following "Checking that generic
instrs have LLTs on all vregs" commit.
This overloads MachineOperand::print to make it possible to print LLTs
with standalone machine operands.
This also overloads MachineVerifier::print(...MachineOperand...) with
an optional LLT using the newly introduced MachineOperand::print
variant; no actual calls added.
This also refactors MachineVerifier::visitMachineInstrBefore in the
parts dealing with all generic instructions (checking Selected
property, LLTs, and phys regs).
llvm-svn: 331693
Summary:
Split off from D46031.
The previous patch, D46493, completely disabled unfolding in case of immediates.
But we can do better:
{F6120274} {F6120277}
https://rise4fun.com/Alive/xJS
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D46494
llvm-svn: 331685
Summary:
Split off from D46031.
In masked merge case, this degrades IPC by decreasing instruction count.
{F6108777}
The next patch should be able to recover and improve this.
This also affects the transform @spatel have added in D27489 / rL289738,
and the test coverage for X86 was missing.
But after i have added it, and looked at the changes in MCA, i'm somewhat confused.
{F6093591} {F6093592} {F6093593}
I'd say this regression is an improvement, since `IPC` increased in that case?
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: andreadb, llvm-commits, spatel
Differential Revision: https://reviews.llvm.org/D46493
llvm-svn: 331684
Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions.
Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour.
llvm-svn: 331672
These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents.
Differential Revision: https://reviews.llvm.org/D46229
llvm-svn: 331659
Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained
and getting crufty
Differential Revision: https://reviews.llvm.org/D46448
llvm-svn: 331641
Instead of enabling it for non NDEBUG builds, use -verify-cfiinstrs to
run verifier in CFIInstrInserter. It defaults to false.
Differential Revision: https://reviews.llvm.org/D46444
llvm-svn: 331635
Summary:
Previously, all DS ops forced WQM in a pixel shader. That was a hack to
allow for graphics frontends using ds_swizzle to implement explicit
derivatives, on SI/CI at least where DPP is not available. But it forced
WQM for _any_ DS op.
With this commit, DS ops no longer force WQM. Both graphics frontends
(Mesa and LLPC) need to change to issue an explicit llvm.amdgcn.wqm
intrinsic call when calculating explicit derivatives.
The required Mesa change is: "amd/common: use llvm.amdgcn.wqm for
explicit derivatives".
Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D46051
Change-Id: I9b745b626fa91bbd66456e6cf41ee07eeea42f81
llvm-svn: 331633
WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions.
WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions.
This removes all InstrRW overrides for these instructions.
NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner.
NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80.
llvm-svn: 331629
Summary:
The legacy VRCPPS/VRSQRTPS instructions aren't available in 512-bit versions. The new increased precision versions are. So we can use those to implement v16f32 reciprocal estimates.
For KNL CPUs we can probably use VRCP28PS/VRSQRT28PS and avoid the NR step altogether, but I leave that for a future patch.
Reviewers: spatel
Reviewed By: spatel
Subscribers: RKSimon, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D46498
llvm-svn: 331606
As Roman Tereshin pointed out in https://reviews.llvm.org/D45541, the
-global-isel option is redundant when -run-pass is given. -global-isel sets up
the GlobalISel passes in the pass manager but -run-pass skips that entirely and
configures it's own pipeline.
llvm-svn: 331603
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
improve optimization of extends and truncates, this legality requirement
would spread without considerable care w.r.t when certain combines were
permitted.
* The SelectionDAG importer required some ugly and fragile pattern
rewriting to translate patterns into this style.
This patch changes the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.
Each extending load can be lowered by the legalizer into separate extends
and loads, however a target that supports s1 will need the any-extending
load to extend to at least s8 since LLVM does not represent memory accesses
smaller than 8 bit. The legalizer can widenScalar G_LOAD into an
any-extending load but sign/zero-extending loads need help from something
else like a combiner pass. A follow-up patch that adds combiner helpers for
for this will follow.
The new representation requires that the MMO correctly reflect the memory
access so this has been corrected in a couple tests. I've also moved the
extending loads to their own tests since they are (mostly) separate opcodes
now. Additionally, the re-write appears to have invalidated two tests from
select-with-no-legality-check.mir since the matcher table no longer contains
loads that result in s1's and they aren't legal in AArch64 anymore.
Depends on D45540
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar
Reviewed By: rtereshin
Subscribers: javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D45541
llvm-svn: 331601
Summary:
Split off form D46031.
It seems we don't want to transform the pattern if the `xor`'s are actually `not`'s.
In vector case, this breaks `andnpd` / `vandnps` patterns.
That being said, we may want to re-visit this `not` handling, maybe in D46073.
Reviewers: spatel, craig.topper, javed.absar
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46492
llvm-svn: 331595
Summary:
The current code cannot handle register class names like 'i32', which is
a valid register class name in WebAssembly. This patch removes special
handling for integer/scalar/pointer type parsing and treats them as
normal identifiers.
Reviewers: thegameg
Subscribers: jfb, dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D45948
llvm-svn: 331586
Summary: Providing the glue to map SDNode fast math sub flags to MachineInstr fast math sub flags.
Reviewers: spatel, arsenm, wristow
Reviewed By: spatel
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D46447
llvm-svn: 331567
- Predicate D16 patterns on this new feature
- Added this new feature to gfx900/2/4
Differential Revision: https://reviews.llvm.org/D46366
llvm-svn: 331551
Summary:
When checking if an instruction stores to a given frame index, check
that the instruction can write to memory before looking at the memory
operands list to avoid e.g. DBG_VALUE instructions that reference a
frame index preventing a load from that index from being hoisted.
Reviewers: dblaikie, MatzeB, qcolombet, reames, javed.absar
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46284
llvm-svn: 331549
Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage.
Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar
Reviewed By: spatel
Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng
Differential Revision: https://reviews.llvm.org/D45710
llvm-svn: 331547
This patch adds a custom lowering for ISD::MULH{S,U} used on divide by
constant optimization (DAGCombiner::BuildSDIV and DAGCombiner::BuildUDIV).
New patterns for smull and umull are added, so AArch64ISD::{S,U}MULL
can be correctly lowered to smull2 and umull2.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D46009
llvm-svn: 331522
Following the advice in review D45022, this currently tests for the broken llc
output where an instruction is mis-scheduled. This test is committed in advance
to improve the eventual fixing patch in D45022, making the bad behaviour that
that patch fixes clearer.
llvm-svn: 331514
Don't assume the alias of a defined reg is always already in the set.
As the test case in https://bugs.llvm.org/show_bug.cgi?id=36587 discovered,
it is wrong to assume that all the aliases of the defined register in the
*current function* is already present in the UsedPhysRegsMask.
This patch changes this so that any definition in the current function of a
phys-reg always results in all its aliases inserted into the set of defined
registers.
Review: Quentin Colombet
https://reviews.llvm.org/D45157
llvm-svn: 331509
We can't see all of the problems currently unless
we look at debug output when the global 'unsafe' is
on. It's a mess. This is another attempt to make
sure that D45710 is not making changes unintentionally.
llvm-svn: 331476
This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness.
llvm-svn: 331472
I'm choosing PPC out of convenience because it does
all of the transforms of interest in these tests by
default. There are multiple FMF problems shown in the
current checks. D45710 is proposing to fix part of
that.
llvm-svn: 331471
These tests are for DAGCombiner::foldSelectCCToShiftAnd().
Right now, they were only tested for AArch64,
but given the upcoming X86 changes to the hasAndNot(),
the test coverage needs to be added.
These tests originated from D27489 / rL289738
llvm-svn: 331454
By default LLVM thinks very large vectors get aligned to their size when
passed across functions. Unfortunately no-one told the ARM backend so it
doesn't trigger stack realignment and so accesses can cause the usual
misalignment issues (e.g. a data abort).
This changes the ABI alignment to the stack alignment, which in practice
(and as a bonus) also coincides with the alignment "natural" vectors get.
llvm-svn: 331451
Also retagged VDBPSADBW instructions as SchedWritePSADBW instead of SchedWriteVecIMul which matches the behaviour on SkylakeServer (the only thing that supports it...)
llvm-svn: 331445
Summary:
Machine Instruction flags for fast math support and MIR print support
Reviewers: spatel, arsenm
Reviewed By: arsenm
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D45781
llvm-svn: 331417
Sinking the and closer to a compare against zero is beneficial on PPC as it
allows us to emit record-form instructions. In the future, we may expand this
to a larger set of operations that feed compares against zero since PPC has
lots of record-form instructions.
Differential revision: https://reviews.llvm.org/D46060
llvm-svn: 331416
The CTR loops pass will insert the decrementing branch instruction in an exiting
block for the loop being transformed. However if that block is part of another
loop as well (whether a nested loop or with irreducible CFG), it is not valid
to use that exiting block. In fact, if the loop hass irreducible CFG, we don't
bother analyzing it and we just bail on the transformation. In practice, this
doesn't lead to a noticeable reduction in the number of loops transformed by
this pass.
Fixes https://bugs.llvm.org/show_bug.cgi?id=37229
Differential Revision: https://reviews.llvm.org/D46162
llvm-svn: 331410
Summary: performAddCombine should run after DAG is legalized; Otherwise generic optimization
in the DAGCombiner can optimize an addcarry+trunc into an addcarry instruction with
illegal types.
Author: FarhanaAleen
Reviewed By: rampitec
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D46337
llvm-svn: 331368
This adds a some more tests, and adds some notes to tests which are using
a suboptimal lowering.
The constants with suboptimal lowerings seem to be relatively rare in
practice, but it might be a fun project to work on improvements.
llvm-svn: 331304
The logic for this combine is almost identical to the logic for a
(sext (sextload x)) combine.
This commit factors out the logic so it can be shared by both combines,
and corrects the SDLoc assigned in the zext version of the combine.
Prior to this patch, for the given test case, we would apply the
location associated with the udiv instruction to instructions which
perform the load.
Part of: llvm.org/PR37262
llvm-svn: 331303
Prior to this patch, for the given test case, we would apply the
location associated with the sdiv instruction to instructions which
perform the load.
Part of: llvm.org/PR37262.
Differential Revision: https://reviews.llvm.org/D46222
llvm-svn: 331302
In DAGCombiner, we try to simplify this pattern:
([s|z]ext (load ...))
Conceptually, a new extload which is created while splitting the load
should have the same debug location as the load.
Making this change affects the IROrder of the new load, causing some
test case churn.
In practice, the new location is never different from the location of
the [s|z]ext, at least not during check-llvm or a stage2 build.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46156
llvm-svn: 331301
Setting the right SDLoc on a newly-created zextload fixes a line table
bug which resulted in non-linear stepping behavior.
Several backend tests contained CHECK lines which relied on the IROrder
inherited from the wrong SDLoc. This patch breaks that dependence where
feasbile and regenerates test cases where not.
In some cases, changing a node's IROrder may alter register allocation
and spill behavior. This can affect performance. I have chosen not to
prevent this by applying a "known good" IROrder to SDLocs, as this may
hide a more general bug in the scheduler, or cause regressions on other
test inputs.
rdar://33755881, Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D45995
llvm-svn: 331300
The previous version of this patch restricted the 'jal' instruction to MIPS and
microMIPSr3. microMIPS32r6 does not have this instruction and instead uses jal
as an alias for balc.
Original commit message:
> Reviewers: smaksimovic, atanasyan, abeserminji
>
> Differential Revision: https://reviews.llvm.org/D46114
>
llvm-svn: 331259
This patch fixes a bug introduced by revision 330778 (originally reviewed at:
https://reviews.llvm.org/D44782), where function isFrameLoadOpcode returned
the wrong number of bytes read for opcodes VMOVSSrm and VMOVSDrm.
This corrects that mistake, and extends the regression test to catch cases where
the dead stores should be removed.
Patch by Jeremy Morse.
Differential Revision: https://reviews.llvm.org/D46256
llvm-svn: 331252
Dead defs were being removed from the live set (in stepForward), but
registers clobbered by regmasks weren't (more specifically, they were
actually removed by removeRegsInMask, but then they were added back in).
llvm-svn: 331219
No need to waste space nor number MBBs differently if MF gets recreated.
Reviewers: qcolombet, stoklund, t.p.northover, bogner, javed.absar
Reviewed By: qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46078
llvm-svn: 331213
This provides an optimized implementation of SADDO/SSUBO/UADDO/USUBO
as well as ADDCARRY/SUBCARRY on top of the new CC implementation.
In particular, multi-word arithmetic now uses UADDO/ADDCARRY instead
of the old ADDC/ADDE logic, which means we no longer need to use
"glue" links for those instructions. This also allows making full
use of the memory-based instructions like ALSI, which couldn't be
recognized due to limitations in the DAG matcher previously.
Also, the llvm.sadd.with.overflow et.al. intrinsincs now expand to
directly using the ADD instructions and checking for a CC 3 result.
llvm-svn: 331203
Currently, an instruction setting the condition code is linked to
the instruction using the condition code via a "glue" link in the
SelectionDAG. This has a number of drawbacks; in particular, it
means the same CC cannot be used by multiple users. It also makes
it more difficult to efficiently implement SADDO et. al.
This patch changes the back-end to represent CC dependencies as
normal values during SelectionDAG matching, along the lines of
how this is handled in the X86 back-end already.
In addition to the core mechanics of updating all relevant patterns,
this requires a number of additional changes:
- We now need to be able to spill/restore a CC value into a GPR
if necessary. This means providing a copyPhysReg implementation
for moves involving CC, and defining getCrossCopyRegClass.
- Since we still prefer to avoid such spills, we provide an override
for IsProfitableToFold to avoid creating a merged LOAD / ICMP if
this would result in multiple users of the CC.
- combineCCMask no longer requires a single CC user, and no longer
need to be careful about preventing invalid glue/chain cycles.
- emitSelect needs to be more careful in marking CC live-in to
the basic block it generates. Also, we can now optimize the
case of multiple subsequent selects with the same condition
just like X86 does.
llvm-svn: 331202
There are two separate fixes here:
* The lowering code for non-extending loads should report UnableToLegalize instead of emitting the same instruction.
* The target should not be requesting lowering of non-extending loads.
llvm-svn: 331201
If we have LOCR instructions, select them directly from SelectionDAG
instead of first going through a pseudo instruction and then using
the custom inserter to emit the LOCR.
Provide Select pseudo-instructions for VR32/VR64 if we have vector
instructions, to avoid having to go through the first 16 FPRs
unnecessarily.
If we do not have LOCFHR, prefer using LOCR followed by a move
over a conditional branch.
llvm-svn: 331191
Previously these instructions were unselectable and instead were generated
through the instruction mapping tables.
Reviewers: atanasyan, smaksimovic, abeserminji
Differential Revision: https://reviews.llvm.org/D46055
llvm-svn: 331165
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
improve optimization of extends and truncates, this legality requirement
would spread without considerable care w.r.t when certain combines were
permitted.
* The SelectionDAG importer required some ugly and fragile pattern
rewriting to translate patterns into this style.
This patch begins changing the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.
This patch introduces the new generic instructions and new variation on
G_LOAD and adds lowering for them to convert back to the existing
representations.
Depends on D45466
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, aemerson, javed.absar
Reviewed By: aemerson
Subscribers: aemerson, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45540
llvm-svn: 331115
This commit makes it so that if you outline a def of some register, then the
call instruction created by the outliner actually reflects that the register
is defined by the call. It also makes it so that outlined functions don't
have the TracksLiveness property.
Outlined calls shouldn't break liveness assumptions that someone might make.
This also un-XFAILs the noredzone test, and updates the calls test.
llvm-svn: 331095
Summary:
D42479 (rL329525) enabled SDIV combine for pow2 non-splat vector
dividers. But when there is a 1 in a vector, the instruction sequence to
be generated involves shifting a value by the number of its bit widths,
which is undefined
(c64f4dbfe3/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L6000-L6006)).
Especially, in architectures that do not support vector instructions,
each of element in a vector will be computed separately using scalar
operations, and then the resulting value will be undef for '1' values
in a vector.
(All 1's vector is fine; only vectors mixed with 1 and others will be
affected.)
Reviewers: RKSimon, jgravelle-google
Subscribers: jfb, dschuff, sbc100, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D46161
llvm-svn: 331092
Summary:
Previously the flag intrinsics always used the index instructions even if a mask instruction also exists.
To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction.
Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs.
I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions.
Reviewers: chandlerc, RKSimon, spatel
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46202
llvm-svn: 331091
Extend the live-in check for all aliased registers so that we can
allow sinking Copy instructions when only implicit def is in successor's
live-in.
llvm-svn: 331072
Summary:
Currently only the memory size is supported but others can be added as
needed.
narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar
Reviewed By: aemerson
Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45466
llvm-svn: 331071
These branches were previously unanalyzable and unselectable. Add them and
recognize how to generate their inverses.
Reviewers: smaksimovic, atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D46113
llvm-svn: 331050
Put the first ldp at the end, so that the load-store optimizer can run
and merge the ldp and the add into a post-index ldp.
This didn't work in case no frame was needed and resulted in code size
regressions.
llvm-svn: 331044
This adds IR intrinsics for the AArch64 dot-product instructions introduced in
v8.2-A.
Differential revisioon: https://reviews.llvm.org/D46107
llvm-svn: 331036
This patch makes compiler does not fuse fmul and fadd/fsub into
fmadd/fmsub by default. Instead, -fp-contract=fast option can
be used when such behavior is desired.
Differential Revision: https://reviews.llvm.org/D46057
llvm-svn: 331033
This adds IR intrinsics for the ARM dot-product instructions introduced in
v8.2-A.
Differential revision: https://reviews.llvm.org/D46106
llvm-svn: 331032
Back when the R52 schedule was added in rL286949, there was no way
to enable machine schedules in ARM for specific cores. Since then a
target feature has been added. This enables the feature for R52,
removing the need to manually specify compiler flags.
llvm-svn: 331027
The program might have unusual expectations for functions; for example,
the Linux kernel's build system warns if it finds references from .text
to .init.data.
I'm not sure this is something we actually want to make any guarantees
about (there isn't any explicit rule that would disallow outlining
in this case), but we might want to be conservative anyway.
Differential Revision: https://reviews.llvm.org/D46091
llvm-svn: 331007
Summary:
Use the FP for scavenged spill slot accesses to prevent corruption of
the callee-save region when the SP is re-aligned.
Based on problem and patch reported by @paulwalker-arm
This is an alternative to solution proposed in D45770
Reviewers: t.p.northover, paulwalker-arm, thegameg, javed.absar
Subscribers: qcolombet, mcrosier, paulwalker-arm, kristof.beyls, rengolin, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46063
llvm-svn: 330976
As noted, the attribute name is subject to change once we have
the clang side implemented, but it's clear that we need some
kind of attribute-based predication here based on the discussion
for:
rL330437
llvm-svn: 330951
This is another preliminary step for disabling this transform as
discussed in the post-commit thread for:
rL330437
I'm using one of the names suggested there for the attribute, but
we can fix that up as needed once the clang side of this is sorted
out.
llvm-svn: 330950
There's no direct instruction for this, but it's trivially implemented
with two movs. Without this the code generator just dies when
encountering a shufflevector.
Differential Revision: https://reviews.llvm.org/D46116
llvm-svn: 330948
instructions.
These have special permission according to the x86 manual to read
unaligned memory, and this folding is done by ICC and GCC as well.
This corrects one of the issues identified in PR37246.
llvm-svn: 330896
comparison instructions (pcmp[ei]stri*).
These will help show improvements from fixes to PR37246.
I've not really covered the mask forms of this intrinsic as I don't have
as good of an intuition about the likely usage patterns there. Happy for
someone to extend this with tests covering the mask form.
llvm-svn: 330895
- Add "amdgpu-waitcnt-forcezero" to force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
- Add debug counters to control force emit of s_waitcnt instrs; debug counters:
si-insert-waitcnts-forceexp: force emit s_waitcnt expcnt(0) instrs
si-insert-waitcnts-forcevm: force emit s_waitcnt lgkmcnt(0) instrs
si-insert-waitcnts-forcelgkm: force emit s_waitcnt vmcnt(0) instrs
- Add some debug statements
Note that a variant of this patch was previously committed/reverted.
Differential Revision: https://reviews.llvm.org/D45888
llvm-svn: 330862
Debug var, expr and loc were only supported for non-fixed stack objects.
This patch adds the following fields to the "fixedStack:" entries, and
renames the ones from "stack:" to:
* debug-info-variable
* debug-info-expression
* debug-info-location
Differential Revision: https://reviews.llvm.org/D46032
llvm-svn: 330859
Previously we only formed MUL_IMM when we split a constant. This blocked load folding on those cases. We should also form MUL_IMM for 3/5/9 to favor LEA over load folding.
Differential Revision: https://reviews.llvm.org/D46040
llvm-svn: 330850
To do this:
1. Change GlobalAddress SDNode to TargetGlobalAddress to avoid legalizer
split the symbol.
2. Change ExternalSymbol SDNode to TargetExternalSymbol to avoid legalizer
split the symbol.
3. Let PseudoCALL match direct call with target operand TargetGlobalAddress
and TargetExternalSymbol.
Differential Revision: https://reviews.llvm.org/D44885
llvm-svn: 330827
ISel is currently picking 'JAL' over 'JAL_MM' for calling a function when
targeting microMIPS. A later patch will correct this behaviour.
This patch extends the mechanism for transforming instructions into their short
delay to recognise 'JAL_MM' for transforming into 'JALS_MM'.
llvm-svn: 330825
Before, the outliner would grab ADRPs that used LR/W30. This patch fixes
that by checking for explicit uses of those registers before the special-casing
for ADRPs.
This also adds a test that ensures that those sorts of ADRPs won't be outlined.
llvm-svn: 330783
We were previously prefering ZEXTLOAD over EXTLOAD if it is legal. This triggers during X86's promotion of i16->i32. Not sure about other targets.
Using ZEXTLOAD can prevent folding it to SEXTLOAD later if we were to promote a sign extended operand like we would need for SRA. However, X86 doesn't currently promote i16 SRA. I was looking into doing that which is how I found this issue.
This is also blocking our ability to fold 4 byte aligned EXTLOADs with "loadi32". This is what caused most of the test changes here.
Differential Revision: https://reviews.llvm.org/D45585#inline-402825
llvm-svn: 330781
Previously, _any_ store or load instruction was considered to be
operating on a spill if it had a frameindex as an operand, and thus
was fair game for optimisations such as "StackSlotColoring". This
usually works, except on architectures where spills can be partially
restored, for example on X86 where a spilt vector can have a single
component loaded (zeroing the rest of the target register). This can be
mis-interpreted and the zero extension unsoundly eliminated, see
pr30821.
To avoid this, this commit optionally provides the caller to
isLoadFromStackSlot and isStoreToStackSlot with the number of bytes
spilt/loaded by the given instruction. Optimisations can then determine
that a full spill followed by a partial load (or vice versa), for
example, cannot necessarily be commuted.
Patch by Jeremy Morse!
Differential Revision: https://reviews.llvm.org/D44782
llvm-svn: 330778
Summary: This is no longer used by mesa since its 18.0.0 release.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D45988
llvm-svn: 330775
The output of update_llc_test_checks.py on this test file has changed,
so the test file should be updated to minimize source changes in future
patches.
The test updates for this file appear to be limited to relaxations of
the form:
-; SSE2-NEXT: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill
+; SSE2-NEXT: movq %rdi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
This was suggested in https://reviews.llvm.org/D45995.
llvm-svn: 330758
If a packed inline constant is sign extended it must be truncated
after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented
as 0xFFFFFFFFBC000000 in the IR because the immediate is sign extended
to 64 bit. After the value shifted right by 16 to use it in a low part
with op_sel_hi it becomes 0xFFFFFFFFBC00 and does not qualify as inline
constant any longer.
Fixed the error and added verification code. Without the fix and with
the verification bug is causing pk_max_f16_literal.ll to fail.
Differential Revision: https://reviews.llvm.org/D45987
llvm-svn: 330752
Split off pinsr/pextr and extractps instructions.
(Mostly) fixes PR36887.
Note: It might be worth adding a WriteFInsertLd class as well in the future.
Differential Revision: https://reviews.llvm.org/D45929
llvm-svn: 330714
Summary:
If attribute "use-soft-float"="true" is set then X86ISelLowering.cpp sets
'Promote' action for ISD::SINT_TO_FP operation on type i32.
But 'Promote' action is not proper in this case since lib function
__floatsidf is available for casting from signed int to float type.
Thus Expand action is more suitable here.
The Expand action should be set for ISD::UINT_TO_FP for soft float as well.
If function attribute "use-soft-float"="true" is set then infinite looping
can happen in DAG combining, function visitSINT_TO_FP() replaces SINT_TO_FP
node with UINT_TO_FP node and function combineUIntToFP() replace vice versa in cycle.
The fix prevents it.
Patch by vrybalov
Differential Revision: https://reviews.llvm.org/D45572
llvm-svn: 330711
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.
The second part is platform independent and ensures that:
* CFI instructions do not affect code generation (they are not counted as
instructions when tail duplicating or tail merging)
* Unwind information remains correct when a function is modified by
different passes. This is done in a late pass by analyzing information
about cfa offset and cfa register in BBs and inserting additional CFI
directives where necessary.
Added CFIInstrInserter pass:
* analyzes each basic block to determine cfa offset and register are valid
at its entry and exit
* verifies that outgoing cfa offset and register of predecessor blocks match
incoming values of their successors
* inserts additional CFI directives at basic block beginning to correct the
rule for calculating CFA
Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.
Patch by Violeta Vukobrat.
Differential Revision: https://reviews.llvm.org/D42848
llvm-svn: 330706
Guard the MIPS64 variant correctly for i64, mark the MIPS32 version as not
in microMIPS and provide the microMIPS version.
Additionally, remove a related stale XFAIL'd test as bswap has its own test
case providing coverage.
Reviewers: smaksimovic, abeserminji, atanasyan
Differential Revision: https://reviews.llvm.org/D45816
llvm-svn: 330705
Summary:
The pass is supposed to scalarize such intrinsics if the target does not support
them natively, so if the scalarization does not happen instruction selection
crashes due to inability to lower these intrinsics.
Reviewers: andrew.w.kaylor, craig.topper
Reviewed By: andrew.w.kaylor
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45947
llvm-svn: 330700
As we're becoming stricter w/ respect to not allowing vregs having LLTs
and regclasses assigned both mid-globalisel pipeline, the number of
extra copies grows, some of which separate G_UNMERGE's from their
corresponding G_MERGE's, becoming a performance concern.
It's worth mentioning that we're already looking through copies while
combining legalization artifacts for every kind of artifact but
G_UNMERGE.
Reviewed By: aditya_nandakumar
Reviewers: ab, t.p.northover, volkan, javed.absar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45644
llvm-svn: 330660
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].
[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`)
Now, they would no longer be generated (see `@in`).
So we need to make sure that they are still generated.
If the mask is constant, we do nothing. InstCombine should have unfolded it.
Else, i use `hasAndNot()` TLI hook.
For now, only handle scalars.
https://rise4fun.com/Alive/bO6
----
I *really* don't like the code i wrote in `DAGCombiner::unfoldMaskedMerge()`.
It is super fragile. Is there something like IR Pattern Matchers for this?
Reviewers: spatel, craig.topper, RKSimon, javed.absar
Reviewed By: spatel
Subscribers: andreadb, courbet, kristof.beyls, javed.absar, rengolin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D45733
llvm-svn: 330646
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].
[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`)
Now, they would no longer be generated (see `@in`).
I'm guessing `llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp` should be able to unfold this.
Reviewers: spatel, craig.topper, RKSimon, javed.absar
Reviewed By: spatel
Subscribers: nemanjai, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45563
llvm-svn: 330645
This reland includes a check to prevent the DAG combiner from folding an
offset that is smaller than the existing one. This can cause oscillations
between two possible DAGs, which was the cause of the hang and later assertion
failure observed on the lnt-ctmark-aarch64-O3-flto bot.
http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/
Original commit message:
> This is a code size win in code that takes offseted addresses
> frequently, such as C++ constructors that typically need to compute
> an offseted address of a vtable. This reduces the size of Chromium
> for Android's .text section by 108KB.
Differential Revision: https://reviews.llvm.org/D45199
llvm-svn: 330630
This helps debug issues where selection-dag assigns the wrong location
to an instruction.
Differential Revision: https://reviews.llvm.org/D45913
llvm-svn: 330618
It's possible to validly spill the frame offset register
in a call sequence to a VGPR. There are definitely issues
with SGPR spilling to memory, so move the assert later.
llvm-svn: 330612
Summary:
See the new test case; this is really unlikely to happen with real code,
but I ran into this while attempting to bugpoint-reduce a different issue.
Change-Id: I9ade1dc1aa8fd9c4d9fc83661d7b80e310b5c4a6
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45885
llvm-svn: 330585
Vectorized loops with abs() returns incorrect results on POWER9. This patch fixes it.
For example the following code returns negative result if input values are negative though it sums up the absolute value of the inputs.
int vpx_satd_c(const int16_t *coeff, int length) {
int satd = 0;
for (int i = 0; i < length; ++i) satd += abs(coeff[i]);
return satd;
}
This problem causes test failures for libvpx.
For vector absolute and vector absolute difference on POWER9, LLVM generates VABSDUW (Vector Absolute Difference Unsigned Word) instruction or variants.
Since these instructions are for unsigned integers, we need adjustment for signed integers.
For abs(sub(a, b)), we generate VABSDUW(a+0x80000000, b+0x80000000). Otherwise, abs(sub(-1, 0)) returns 0xFFFFFFFF(=-1) instead of 1. For abs(a), we generate VABSDUW(a+0x80000000, 0x80000000).
Differential Revision: https://reviews.llvm.org/D45522
llvm-svn: 330497
In certain cases, the compiler might try to merge __stack_chk_guard with
another global variable. (Or someone could theoretically define
__stack_chk_guard as an alias.) In that case, make sure we don't crash.
Differential Revision: https://reviews.llvm.org/D45746
llvm-svn: 330495
Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes.
This unearthed a couple of things that are also handled in this patch:
(1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic
(2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015.
(3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops.
Differential Revision: https://reviews.llvm.org/D45629
llvm-svn: 330480
There was some unfortunate interaction between VSPLAT and BITCAST
related to the selection of constant vectors (coming from selecting
shuffles). Introduce VSPLATW that always splats a 32-bit word, and
can have arbitrary result type (to avoid BITCASTs of VSPLAT).
Clean up the previous selection of BITCAST/VSPLAT.
llvm-svn: 330471
Three new instructions:
umonitor - Sets up a linear address range to be
monitored by hardware and activates the monitor.
The address range should be a writeback memory
caching type.
umwait - A hint that allows the processor to
stop instruction execution and enter an
implementation-dependent optimized state
until occurrence of a class of events.
tpause - Directs the processor to enter an
implementation-dependent optimized state
until the TSC reaches the value in EDX:EAX.
Also modifying the description of the mfence
instruction, as the rep prefix (0xF3) was allowed
before, which would conflict with umonitor during
disassembly.
Before:
$ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble
.text
mfence
After:
$ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble
.text
umonitor %rax
Reviewers: craig.topper, zvi
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D45253
llvm-svn: 330462
First off, this is more correct than having the B. Second off, this was making
a bot upset. This fixes that.
Update the test to include -verify-machineinstrs as well to prevent stuff like
this slipping by non debug/assert builds in the future.
llvm-svn: 330459
This was originally committed at rL328921 and reverted at rL329920 to
investigate failures in Chrome. This time I've added to the ReleaseNotes
to warn users of the potential of exposing UB and let me repeat that
here for more exposure:
Optimization of floating-point casts is improved. This may cause surprising
results for code that is relying on undefined behavior. Code sanitizers can
be used to detect affected patterns such as this:
int main() {
float x = 4294967296.0f;
x = (float)((int)x);
printf("junk in the ftrunc: %f\n", x);
return 0;
}
$ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out
ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of
representable values of type 'int'
junk in the ftrunc: 0.000000
Original commit message:
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.
Differential Revision: https://reviews.llvm.org/D44909
llvm-svn: 330437
Summary:
Reading Atmel's AT697E errata document this does not seem like a valid
workaround. While the text only mentions SDIV, it says that the ICC flags
can be wrong, and those are only generated by SDIVcc. Verification on
hardware shows that simply replacing SDIV with SDIVcc does not avoid
the bug with negative operands.
This reverts r283727.
Reviewers: lero_chris, jyknight
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D45813
llvm-svn: 330397
Summary:
This fixes a case where the argument to a sendmsg intrinsic
ends up in a VGPR, for whatever reason.
The underlying performance issue is that a multiplication that
can be an s_mul_i32 is instead needlessly generated as
v_mul_u32_u24, but this is not addressed by this patch.
Change-Id: I61fd4034314d5acdf6074632c30b65364dfa7328
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45826
llvm-svn: 330393
Summary:
If a 64-bit register is used as an operand in inline assembly together
with a memory reference, the memory addressing will be wrong. The
addressing will be a single reg, instead of reg+reg or reg+imm. This
will generate a bad offset value or an exception in printMemOperand().
For example:
```
long long int val = 5;
long long int mem;
__asm__ volatile ("std %1, %0":"=m"(mem):"r"(val));
```
becomes:
```
std %i0, [%i2+589833]
```
The problem is that SelectInlineAsmMemoryOperand() is never called for
the memory references if one of the operands is a 64-bit register.
By calling SelectInlineAsmMemoryOperands() in tryInlineAsm() the Sparc
version of SelectInlineAsmMemoryOperand() gets called for each memory
reference.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D45761
llvm-svn: 330392
Silvermont and Goldmont have the same issue on popcnt as Sandy Bridge, Haswell, Broadwell, and Skylake. Believe it is fixed in Goldmont Plus.
llvm-svn: 330358
The XCHG16rr/XCHG32rr/XCHG64rr instructions should be 3 uops just like XCHG8rr. I believe they're just implemented as 3 move uops with a temporary register.
XADD is probably 2 moves and an add also using a temporary register.
Change the latency for both from 2 cycles to 3 cycles. Only 2 of the uops are serialized in their execution, the move into the temporary and the move out of the temporary. The move from one GPR to the other should be able to go in parallel with this if there are ALU resources available.
llvm-svn: 330349
We should also check that the "bottom" basic block of a loopis a successor of the "header" basic block, otherwise we don't propagate the information correctly when the CFG is complex. This fixes an important rendering problem with Wolfsentein 2, because of one vector-memory wait was missing.
Differential Revision: https://reviews.llvm.org/D43831
llvm-svn: 330337
These instructions lacked the correct predicates, were not marked
as loads and stores and lacked the proper instruction mapping information.
In the case of microMIPS sw(l|r)e (EVA) these instructions were using the load
EVA description.
Reviewers: abeserminji, smaksimovic, atanasyan
Differential Revision: https://reviews.llvm.org/D45626
llvm-svn: 330326
This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations. The patch also includes folding
of previously missing saturation patterns so that IR emits the same
machine instructions as the intrinsics.
Patch by tkrupa
Differential Revision: https://reviews.llvm.org/D44785
llvm-svn: 330322
This removes a bunch of unnecessary InstRW overrides. It also cleans up the missing information from the Sandy Bridge model. Other fixes to other models.
llvm-svn: 330308
This test case demonstrates that common subexpression elimination takes place
between code sequences for materialising constants. In particular, it
demonstrates that redundant lui aren't generated. This would capture a
regression if applying a patch such as D41949.
llvm-svn: 330291
The objdump tests interfere with update_llc_test_checks.py and can't be
automatically update them. Put the sanitify check for compression on the
codegen codepath into a separate file, and expand it to also include tests of
integer materialisation. This would catch changes such as those triggered by
D41949.
llvm-svn: 330288
Reverts rL330224, while issues with the C extension and missed common
subexpression elimination opportunities are addressed. Neither of these issues
are visible in current RISC-V backend unit tests, which clearly need
expanding.
llvm-svn: 330281
Legalize and emit code for converting unsigned HWord/Char to QP:
xscvsdqp
xscvudqp
Only covering patterns for unsigned forms cause we don't have part-word
sign-extending integer loads into VSX registers.
Differential Revision: https://reviews.llvm.org/D45494
llvm-svn: 330278