Commit Graph

1297 Commits

Author SHA1 Message Date
Simon Pilgrim 2482c51e99 [SystemZ] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349906
2018-12-21 14:50:54 +00:00
Ulrich Weigand 380bece7af [SystemZ] "Generic" vector assembler instructions shoud clobber CC
There are several vector instructions which may or may not set the
condition code register, depending on the value of an argument.

For codegen, we use two versions of the instruction, one that sets
CC and one that doesn't, which hard-code appropriate values of that
argument.  But we also have a "generic" version of the instruction
that is used for the assembler/disassembler.  These generic versions
should always be considered to clobber CC just to be safe.

llvm-svn: 349761
2018-12-20 14:24:17 +00:00
Ulrich Weigand 44d37ae38c [SystemZ] Make better use of VLLEZ
This patch fixes two deficiencies in current code that recognizes
the VLLEZ idiom:

- For the floating-point versions, we have ISel patterns that match
  on a bitconvert as the top node.  In more complex cases, that
  bitconvert may already have been merged into something else.
  Fix the patterns to match the inner nodes instead.

- For the 64-bit integer versions, depending on the surrounding code,
  we may get either a DAG tree based on JOIN_DWORDS or one based on
  INSERT_VECTOR_ELT.  Use a PatFrags to simply match both variants.

llvm-svn: 349749
2018-12-20 13:05:03 +00:00
Ulrich Weigand 8bb46b0f01 [SystemZ] Make better use of VGEF/VGEG
Current code in SystemZDAGToDAGISel::tryGather refuses to perform
any transformation if the Load SDNode has more than one use.  This
(erronously) counts uses of the chain result, which prevents the
optimization in many cases unnecessarily.  Fixed by this patch.

llvm-svn: 349748
2018-12-20 13:01:20 +00:00
Ulrich Weigand f43b510015 [SystemZ] Make better use of VLDEB
We already have special code (DAG combine support for FP_ROUND)
to recognize cases where we an use a vector version of VLEDB to
perform two floating-point truncates in parallel, but equivalent
support for VLEDB (vector floating-point extends) has been
missing so far.  This patch adds corresponding DAG combine
support for FP_EXTEND.

llvm-svn: 349746
2018-12-20 12:59:05 +00:00
Jonas Paulsson e79b1b986d [SystemZ] Pass copy-hinted regs first from getRegAllocationHints().
When computing register allocation hints for a GRX32Bit register, make sure
that any of the hinted registers that are also copy hints are returned first
in the list.

Review: Ulrich Weigand.
llvm-svn: 349037
2018-12-13 14:37:05 +00:00
Jonas Paulsson 896775c2d3 [SystemZ] Minor cleanup of SchedModels
Some fixes of a few InstRWs for z13 and z14.

Review: Ulrich Weigand
llvm-svn: 348917
2018-12-12 08:26:24 +00:00
David Green ca29c271d2 [Targets] Add errors for tiny and kernel codemodel on targets that don't support them
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.

Differential Revision: https://reviews.llvm.org/D50141

llvm-svn: 348585
2018-12-07 12:10:23 +00:00
Jonas Paulsson 8ae0f88b13 [SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test.
A loaded value with multiple users compared with 0 will become a load and
test single instruction. The load is not folded in this case (multiple
users), but the compare instruction is eliminated.

This patch returns 0 cost for the icmp in these cases.

Review: Ulrich Weigand
https://reviews.llvm.org/D55111

llvm-svn: 348141
2018-12-03 14:30:18 +00:00
Jonas Paulsson b1d014883c [SystemZ::TTI] i8/i16 operands extension costs revisited
Three minor changes to these extra costs:

* For ICmp instructions, instead of adding 2 all the time for extending each
  operand, this is only done if that operand is neither a load or an
  immediate.

* The operands extension costs for divides removed, because we now use a high
  cost already for the divide (20).

* The costs for lhsr/ashr extra costs removed as this did not seem useful.

Review: Ulrich Weigand
https://reviews.llvm.org/D55053

llvm-svn: 347961
2018-11-30 07:09:34 +00:00
Jonas Paulsson 06acb3a236 [SystemZ::TTI] Improve cost for compare of i64 with extended i32 load
CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value
in memory.

This patch makes such a load considered foldable and so gets a 0 cost.

Review: Ulrich Weigand
https://reviews.llvm.org/D54944

llvm-svn: 347735
2018-11-28 08:58:27 +00:00
Jonas Paulsson d6b7aca911 [SystemZ::TTI] Improve costs for i16 add, sub and mul against memory.
AH, SH and MH costs are already covered in the cases where LHS is 32 bits and
RHS is 16 bits of memory sign-extended to i32.

As these instructions are also used when LHS is i16, this patch recognizes
that the loads will get folded then as well.

Review: Ulrich Weigand
https://reviews.llvm.org/D54940

llvm-svn: 347734
2018-11-28 08:31:50 +00:00
Jonas Paulsson 011a503f25 [SystemZ::TTI] Improved cost values for comparison against memory.
Single instructions exist for i8 and i16 comparisons of memory against a
small immediate.

This patch makes sure that if the load in these cases has a single user (the
ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1.

Review: Ulrich Weigand
https://reviews.llvm.org/D54897

llvm-svn: 347733
2018-11-28 08:08:05 +00:00
Jonas Paulsson 5da8e432b9 [SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap.
Since byte-swapping loads and stores are supported, a 'load -> bswap' or
'bswap -> store' sequence should have the cost of one.

Review: Ulrich Weigand
https://reviews.llvm.org/D54870

llvm-svn: 347732
2018-11-28 07:52:34 +00:00
Than McIntosh 30c804bbb1 [CodeGen] Support custom format of stack maps
Summary:
Add a hook to the GCMetadataPrinter for emitting stack maps in
custom format. The hook will be called at stack map generation
time. The default stack map format is used if there is no hook.

For this to be useful a few data structures and accessors are
exposed from the StackMaps class, so the custom printer can
access the stack map data.

This patch authored by Cherry Zhang <cherryyz@google.com>.

Reviewers: thanm, apilipenko, reames

Reviewed By: reames

Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D53892

llvm-svn: 347584
2018-11-26 18:43:48 +00:00
Jonas Paulsson 96782c2c0b [SystemZTTIImpl] Give correct cost values for vector bswap intrinsics.
Implement getIntrinsicInstrCost() and return costs reflecting that bswap can
be done with a vperm per vector register.

Review: Ulrich Weigand
https://reviews.llvm.org/D54789

llvm-svn: 347445
2018-11-22 07:17:29 +00:00
Jonas Paulsson f9b2b5e67e [SystemZ] Increase the number of VLREPs
If a loaded value is replicated it is best to combine these two operations
into a VLREP (load and replicate), but isel will not produce this if the load
has other users as well.

This patch handles this by putting the other users of the load to use the
REPLICATE 0-element instead of the load. This way the load has only the
REPLICATE node as user, and we get a VLREP.

Review: Ulrich Weigand
https://reviews.llvm.org/D54264

llvm-svn: 346746
2018-11-13 08:37:09 +00:00
Jonas Paulsson 5cea85dd59 [SystemZ::TTI] Improve accuracy of costs for vector fp <-> int conversions
Improve getCastInstrCost() by respecting the different types of Src and Dst
for vector integer <-> fp conversions.

This means that extracting from integer becomes more expensive (by the
extraction penalty), and the extraction from fp becomes cheaper (no longer
has a false extraction penalty).

Review: Ulrich Weigand
https://reviews.llvm.org/D54423

llvm-svn: 346663
2018-11-12 15:32:27 +00:00
Jonas Paulsson c0ee028dc3 [SystemZ] Replicate the load with most uses in buildVector()
Iterate over all elements and count the number of uses among them for each
used load. Then make sure to REPLICATE the load which has the most uses in
order to minimize the number of needed element insertions.

Review: Ulrich Weigand
https://reviews.llvm.org/D54322

llvm-svn: 346637
2018-11-12 08:12:20 +00:00
Jonas Paulsson 458b7c0b39 [SystemZ] Avoid inserting same value after replication
A minor improvement of buildVector() that skips creating an
INSERT_VECTOR_ELT for a Value which has already been used for the
REPLICATE.

Review: Ulrich Weigand
https://reviews.llvm.org/D54315

llvm-svn: 346504
2018-11-09 15:44:28 +00:00
Jonas Paulsson 1993894c03 [SystemZ] Bugfix in shouldCoalesce()
It was discovered in randomized testing that the SystemZ implementation of
shouldCoalesce() could be caused to crash when subreg liveness was
enabled. This was because an undef use of the virtual register was copied
outside current MBB at the point of shouldCoalesce() being called. For more
details, see https://bugs.llvm.org/show_bug.cgi?id=39276.

This patch changes the check for MBB locality from livein/liveout checks to
do checks for all instructions of both intervals being inside MBB. This
avoids the cases with dead defs / undef uses outside MBB, which are not
affecting liveness in/out of MBB.

The original test case included as a reduced .mir test case.

Review: Ulrich Weigand
https://reviews.llvm.org/D54197

llvm-svn: 346406
2018-11-08 15:29:48 +00:00
Craig Topper 0b5f8169b0 [TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.

llvm-svn: 346180
2018-11-05 23:26:13 +00:00
Jonas Paulsson cced2a2775 [SystemZ::TTI] Improve cost handling of uint/sint to fp conversions.
Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load.

Since the load already does the extension, there is no extra cost (previously
returned 2).

Review: Ulrich Weigand
https://reviews.llvm.org/D54028

llvm-svn: 346009
2018-11-02 17:53:31 +00:00
Jonas Paulsson 79f2441eee [SystemZ] Rework getInterleavedMemoryOpCost()
Model this function more closely after the BasicTTIImpl version, with
separate handling of loads and stores. For loads, the set of actually loaded
vectors is checked.

This makes it more readable and just slightly more accurate generally.

Review: Ulrich Weigand
https://reviews.llvm.org/D53071

llvm-svn: 345998
2018-11-02 17:15:36 +00:00
Reid Kleckner 4dc0b1ac60 Fix clang -Wimplicit-fallthrough warnings across llvm, NFC
This patch should not introduce any behavior changes. It consists of
mostly one of two changes:
1. Replacing fall through comments with the LLVM_FALLTHROUGH macro
2. Inserting 'break' before falling through into a case block consisting
   of only 'break'.

We were already using this warning with GCC, but its warning behaves
slightly differently. In this patch, the following differences are
relevant:
1. GCC recognizes comments that say "fall through" as annotations, clang
   doesn't
2. GCC doesn't warn on "case N: foo(); default: break;", clang does
3. GCC doesn't warn when the case contains a switch, but falls through
   the outer case.

I will enable the warning separately in a follow-up patch so that it can
be cleanly reverted if necessary.

Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu

Differential Revision: https://reviews.llvm.org/D53950

llvm-svn: 345882
2018-11-01 19:54:45 +00:00
Jonas Paulsson 6749c24f40 [SystemZ::TTI] Recognize the higher cost of scalar i1 -> fp conversion
Scalar i1 to fp conversions are done with a branch sequence, so it should
have a higher cost.

Review: Ulrich Weigand
https://reviews.llvm.org/D53924

llvm-svn: 345818
2018-11-01 09:05:32 +00:00
Jonas Paulsson f15a53bc81 [SystemZ::TTI] Accurate costs for i1->double vector conversions
This factors out a new method getBoolVecToIntConversionCost() containing the
code for vector sext/zext of i1, in order to reuse it for i1 to double vector
conversions.

Review: Ulrich Weigand
https://reviews.llvm.org/D53923

llvm-svn: 345817
2018-11-01 09:01:51 +00:00
Dorit Nuzman 34da6dd696 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668

llvm-svn: 345705
2018-10-31 09:57:56 +00:00
Ulrich Weigand c5854b0adb [SystemZ] Simplify LRV/STRV ISD nodes
The LRV and STRV nodes carry an extra operand to indicate the
type of the memory access.  This is redundant, since the nodes
are actually of class MemIntrinsicNode and therefore hold that
same information already as MemoryVT.

NFC intended.

llvm-svn: 345618
2018-10-30 18:20:59 +00:00
Jonas Paulsson af8e036c29 [SystemZ] Improve isFoldableLoad() for Sub, SDiv and UDiv.
Sub, SDiv and UDiv are not commutative, so only the RHS operand can fold a
load. This patch adds a check for this.

Review: Ulrich Weigand
https://reviews.llvm.org/D53791

llvm-svn: 345596
2018-10-30 13:41:03 +00:00
Fangrui Song 065c3610ad [SystemZ] Fix -Wcovered-switch-default as coding standard regulates
llvm-svn: 345369
2018-10-26 06:59:08 +00:00
Fangrui Song 61ea8dae2e Add dependency from SystemZAsmParser to SystemZAsmPrinter after rL345349
This fixes -DBUILD_SHARED_LIBS=on build. The dependency is similar to that of X86's.

llvm-svn: 345358
2018-10-26 03:04:54 +00:00
Jonas Paulsson dda46307c2 [SystemZ] Implement SystemZOperand::print()
SystemZAsmParser can now handle -debug by printing the operands neatly to the
output stream. Before this patch this lead to an llvm_unreachable().

It seems that now '-mllvm -debug' does not cause any crashes anywhere (at
least not on SPEC).

Review: Ulrich Weigand
https://reviews.llvm.org/D53328

llvm-svn: 345349
2018-10-26 00:36:00 +00:00
Jonas Paulsson e2c5cbc164 [SystemZ] Pass the DAG pointer from SystemZAddressingMode::dump().
In order to print the IR slot number for the memory operand, the DAG pointer
must be passed to SDNode::dump().

The isel-debug.ll test updated to also check for the IR Value reference being
printed correctly.

Review: Ulrich Weigand
https://reviews.llvm.org/D53333

llvm-svn: 345347
2018-10-26 00:02:33 +00:00
Jonas Paulsson 2b280ea604 [SystemZ] NFC reformatting in SystemZTargetTransformInfo.cpp
Some lines more than 80 characters long reformatted.

llvm-svn: 345331
2018-10-25 22:53:27 +00:00
Jonas Paulsson b7caa809e1 [SystemZ] Improve getMemoryOpCost() to find foldable loads that are converted.
The SystemZ backend can do arithmetic of memory by loading and then extending
one of the operands. Similarly, a load + truncate can be folded into an
operand.

This patch improves the SystemZ TTI cost function to recognize this.

Review: Ulrich Weigand
https://reviews.llvm.org/D52692

llvm-svn: 345327
2018-10-25 22:28:25 +00:00
Jonas Paulsson 4645711a8d [SystemZ] Improve handling and cost estimates of vector integer div/rem
Enable the DAG optimization that converts vector div/rem with constants into
multiply+shifts sequences by expanding them early. This is needed since
ISD::SMUL_LOHI is 'Custom' lowered on SystemZ, and will therefore not be
available to BuildSDIV after legalization.

Better cost values for these instructions based on how they will be
implemented (a constant divisor is cheaper).

Review: Ulrich Weigand
https://reviews.llvm.org/D53196

llvm-svn: 345321
2018-10-25 21:47:22 +00:00
Thomas Lively 30f1d69115 [NFC] Rename minnan and maxnan to minimum and maximum
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.

Reviewers: arsenm, aheejin, dschuff, javed.absar

Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53112

llvm-svn: 345218
2018-10-24 22:49:55 +00:00
Dorit Nuzman 38bbf81ade recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman 5118c68cde revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman 8174368955 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
Jonas Paulsson bf66f38705 [SystemZ] Temporarily disable high VFs with integer div/rem.
Until mischeduler is clever enough to avoid spilling in a vectorized loop
with many (scalar) DLRs it is better to avoid high vectorization factors (8
and above).

llvm-svn: 344129
2018-10-10 09:30:29 +00:00
Jonas Paulsson 2c8b33770c [SystemZ] Take better care when computing needed vector registers in TTI.
A new function getNumVectorRegs() is better to use for the number of needed
vector registers instead of getNumberOfParts(). This is to make sure that the
number of vector registers (and typically operations) required for a vector
type is accurate.

getNumberOfParts() which was previously used works by splitting the vector
type until it is legal gives incorrect results for types with a non
power of two number of elements (rare).

A new static function getScalarSizeInBits() that also checks for a pointer
type and returns 64U for it since otherwise it gets a value of 0). Used in a
few places where Ty may be pointer.

Review: Ulrich Weigand
llvm-svn: 344115
2018-10-10 07:36:27 +00:00
Jonas Paulsson faad1b3056 [TargetRegisterInfo] Remove temporary hook enableMultipleCopyHints()
Finally all targets are enabling multiple regalloc hints, so the hook to
disable this can now be removed.

NFC.

Review: Simon Pilgrim
https://reviews.llvm.org/D52316

llvm-svn: 343851
2018-10-05 14:23:11 +00:00
Jonas Paulsson 77df2f2f38 [SystemZ] Adjust cost functions for subtargets that use LI + LOC instead of IPM
After recent improvements which makes better use of LOC instead of IPM, the
TTI cost functions also needs to be updated to reflect this.

This involves sext, zext and xor of i1.

The tests were updated so that for z13 the new costs are expected, while the
old costs are still checked for on zEC12.

Review: Ulrich Weigand
https://reviews.llvm.org/D51339

llvm-svn: 342207
2018-09-14 06:46:55 +00:00
Chandler Carruth c73c0307fe [MI] Change the array of `MachineMemOperand` pointers to be
a generically extensible collection of extra info attached to
a `MachineInstr`.

The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.

Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.

I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).

Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.

This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.

The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.

Differential Revision: https://reviews.llvm.org/D50701

llvm-svn: 339940
2018-08-16 21:30:05 +00:00
Krzysztof Parzyszek 2a119b9a98 [SystemZ] Replace subreg_r with subreg_h
Change
  subreg_r32  -> subreg_h32
  subreg_r64  -> subreg_h64
  subreg_hr32 -> subreg_hh32

The subregisters subreg_r32 and subreg_r64 were added to emphasize the
fact that modifying these subregisters may clobber the entire register.
This is not necessarily the case for subreg_h32, et al.

However, the ability to compose subreg_h64 with subreg_r32, and with
subreg_h32 and subreg_l32 at the same time makes the compositions be
treated as non-overlapping (leading to problems when tracking subreg
liveness). See D50468 for more details.

Differential Revision: https://reviews.llvm.org/D50725

llvm-svn: 339778
2018-08-15 15:21:23 +00:00
Jonas Paulsson d5a9c2d551 [SystemZ] New CL option to enable subreg liveness
This option is needed to enable subreg liveness tracking during register
allocation.

Review: Ulrich Weigand
https://reviews.llvm.org/D50779

llvm-svn: 339776
2018-08-15 15:04:49 +00:00
Chandler Carruth 66654b72c9 [SDAG] Remove the reliance on MI's allocation strategy for
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.

Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.

This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.

This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.

As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]

Differential Revision: https://reviews.llvm.org/D50680

llvm-svn: 339740
2018-08-14 23:30:32 +00:00
Jonas Paulsson 5ffb27b166 [SystemZ] Increase the amount of inlining.
Implement getInliningThresholdMultiplier() and have it return 3.

Review: Ulrich Weigand
llvm-svn: 339563
2018-08-13 13:31:30 +00:00