Commit Graph

1691 Commits

Author SHA1 Message Date
Simon Pilgrim 1231904c48 [CostModel][X86] Add explicit fcmp costs for pre-SSE42 targets
Typical throughputs: cmpss/cmpps = 1cy and cmpsd/cmppd = 2cy before the Core2 era

llvm-svn: 351684
2019-01-20 13:21:43 +00:00
Simon Pilgrim 60e5a3accb [CostModel][X86] Split icmp/fcmp costs tests and test all comparison codes
llvm-svn: 351682
2019-01-20 12:10:42 +00:00
Simon Pilgrim 5d7182ecb6 [CostModel][X86] Add masked load/store/gather/scatter tests for SSE2/SSE42/AVX1 targets
llvm-svn: 351681
2019-01-20 11:23:01 +00:00
Simon Pilgrim a8b009fd14 [CostModel][X86] Add non-constant vselect cost tests
Also add AVX512 costs at the same time

llvm-svn: 351680
2019-01-20 11:19:35 +00:00
Neil Henning 3ed09f8e0c [AMDGPU] Add some missing always-uniform values.
This commit adds some missing intrinsics into the isAlwaysUniform list
for the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D56845

llvm-svn: 351562
2019-01-18 16:39:27 +00:00
Nikita Popov d3b86b79fa Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Reapplying with updated SLPVectorizer tests.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351219
2019-01-15 18:43:41 +00:00
James Y Knight 693d39dd12 Remove irrelevant references to legacy git repositories from
compiler identification lines in test-cases.

(Doing so only because it's then easier to search for references which
are actually important and need fixing.)

llvm-svn: 351200
2019-01-15 16:18:52 +00:00
Nikita Popov 5885eec35a Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
This reverts commit r351125.

I missed test changes in an SLPVectorizer test, due to the cost model
changes. Reverting for now.

llvm-svn: 351129
2019-01-14 22:18:39 +00:00
Nikita Popov 8e9a8432a8 [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351125
2019-01-14 21:43:30 +00:00
Nikita Popov 9f6e9cf71b [ConstantFolding] Fold undef for integer intrinsics
This fixes https://bugs.llvm.org/show_bug.cgi?id=40110.

This implements handling of undef operands for integer intrinsics in
ConstantFolding, in particular for the bitcounting intrinsics (ctpop,
cttz, ctlz), the with.overflow intrinsics, the saturating math
intrinsics and the funnel shift intrinsics.

The undef behavior follows what InstSimplify does for the general cas
e of non-constant operands. For the bitcount intrinsics (where
InstSimplify doesn't do undef handling -- there cannot be a combination
of an undef + non-constant operand) I'm using a 0 result if the intrinsic
is defined for zero and undef otherwise.

Differential Revision: https://reviews.llvm.org/D55950

llvm-svn: 350971
2019-01-11 21:18:00 +00:00
Philip Pfaffe efb5ad1c58 [DA][NewPM] Add a printerpass and port the testsuite
The new-pm version of DA is untested. Testing requires a printer, so
add that and use it in the existing DA tests.

Differential Revision: https://reviews.llvm.org/D56386

llvm-svn: 350624
2019-01-08 14:06:58 +00:00
Michael Ferguson e39b614d1d [ValueTracking] Adjust comment in test
Adjusts a comment in this test to verify commit access.

llvm-svn: 350569
2019-01-07 21:02:22 +00:00
Simon Pilgrim c2054144ee [CostModel][X86] Fix SSE1 FADD/FSUB costs
Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4

Add the other costs so that we're not relying on the default "is legal/custom" cost logic.

llvm-svn: 350403
2019-01-04 16:55:57 +00:00
Ranjeet Singh 107dd2565c Revert patches 348835 and 348571 because they're
causing code size performance regressions.

llvm-svn: 350402
2019-01-04 16:39:10 +00:00
Simon Pilgrim 71d61567c0 [CostModel][X86] Add SSE1 fp cost tests
llvm-svn: 350401
2019-01-04 16:37:01 +00:00
Florian Hahn 7902405c42 [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset
GetPointerBaseWithConstantOffset include this code, where ByteOffset
and GEPOffset are both of type llvm::APInt :

  ByteOffset += GEPOffset.getSExtValue();

The problem with this line is that getSExtValue() returns an int64_t, but
the += matches an overload for uint64_t. The problem is that the resulting
APInt is no longer considered to be signed. That in turn causes assertion
failures later on if the relevant pointer type is > 64 bits in width and
the GEPOffset was negative.

Changing it to

  ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth());

resolves the issue and explicitly performs the sign-extending
or truncation. Additionally, instead of asserting later if the result
is > 64 bits, it breaks out of the loop in that case.

See also
 https://reviews.llvm.org/D24729
 https://reviews.llvm.org/D24772

This commit must be merged after D38662 in order for the test to pass.

Patch by Michael Ferguson <mpfergu@gmail.com>.

Reviewers: reames, sanjoy, hfinkel

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D38501

llvm-svn: 350395
2019-01-04 14:53:22 +00:00
Simon Pilgrim f0c533b7db [CostModel][X86] Add truncate cost tests to cover all legal destination types
We were only testing costs for legal source vector element counts

llvm-svn: 350323
2019-01-03 14:49:39 +00:00
Simon Pilgrim d824f99a6c [X86] Add ADD/SUB SSAT/USAT vector costs (PR40123)
Costs for real SSE2 instructions

llvm-svn: 350295
2019-01-03 11:38:42 +00:00
Simon Pilgrim 55ea89305d [X86] Add ADD/SUB SSAT/USAT cost tests (PR40123)
llvm-svn: 350293
2019-01-03 11:29:24 +00:00
Hal Finkel 4f2381440d [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).

Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).

After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).

As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.

Patch by me with contributions from Michael Ferguson (mferguson@cray.com).

Differential Revision: https://reviews.llvm.org/D38662

llvm-svn: 350220
2019-01-02 16:28:09 +00:00
Nikita Popov f17421e595 [ConstantFolding] Consolidate and extend bitcount intrinsic tests; NFC
Move constant folding tests into ConstantFolding/bitcount.ll and drop
various tests in other places. Add coverage for undefs.

llvm-svn: 349806
2018-12-20 19:46:52 +00:00
Nikita Popov 41a7b109c3 [ConstantFolding] Add tests for funnel shifts with undef operands; NFC
llvm-svn: 349803
2018-12-20 19:46:39 +00:00
Nikita Popov 2d6fa5d6b4 [ConstantFolding] Add tests for sat add/sub with undefs; NFC
llvm-svn: 349802
2018-12-20 19:46:35 +00:00
Nikita Popov f7ef6d6461 [ConstantFolding] Split up saturating add/sub tests; NFC
Split each test into a separate function.

llvm-svn: 349801
2018-12-20 19:46:30 +00:00
Michael Kruse 978ba61536 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

llvm-svn: 349725
2018-12-20 04:58:07 +00:00
Craig Topper c6bfb05762 [CostModel][X86] Don't count 2 shuffles on the last level of a pairwise arithmetic or min/max reduction
This is split from D55452 with the correct patch this time.

Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle.

Differential Revision: https://reviews.llvm.org/D55615

llvm-svn: 349072
2018-12-13 19:08:10 +00:00
Ranjeet Singh ce24c878d9 Cleanup test case by removing unused attribute dso_local
Attribute 'dso_local' generated in bitcode from compiling
original C file but isn't needed.

Differential Revision: https://reviews.llvm.org/D55521

llvm-svn: 348835
2018-12-11 09:32:49 +00:00
Craig Topper 84d62ce6c1 [CostModel][X86][AArch64] Adjust cost of the scalarization part of min/max reduction.
Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register?

Reviewers: RKSimon, spatel, ABataev

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D55480

llvm-svn: 348739
2018-12-10 06:58:58 +00:00
Craig Topper d1498ed8df [CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost
We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width.

So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops.

There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles.

Differential Revision: https://reviews.llvm.org/D55397

llvm-svn: 348621
2018-12-07 18:20:56 +00:00
Nikita Popov 110cf05203 Reapply "[DemandedBits][BDCE] Support vectors of integers"
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

Unlike the previous iteration of this patch, getDemandedBits() can now
again be called on arbirary (sized) instructions, even if they don't
have integer or vector of integer type. (For vector types the size of the
returned mask will now be the scalar size in bits though.)

The added LoopVectorize test case shows a case which triggered an
assertion failure with the previous attempt, because getDemandedBits()
was called on a pointer-typed instruction.

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348602
2018-12-07 15:38:13 +00:00
Ranjeet Singh 7a7132603b [IR] Don't assume all functions are 4 byte aligned
In some cases different alignments for function might be used to save
space e.g. thumb mode with -Oz will try to use 2 byte function
alignment. Similar patch that fixed this in other areas exists here
https://reviews.llvm.org/D46110

This was approved previously https://reviews.llvm.org/D55115 (r348215)
but when committed it caused failures on the sanitizer buildbots when
building llvm with clang (containing this patch). This is now fixed
because I've added a check to see if getting the parent module returns
null if it does then set the alignment to 0.

Differential Revision: https://reviews.llvm.org/D55115

llvm-svn: 348571
2018-12-07 08:34:59 +00:00
Nikita Popov 14ca9a8355 Revert "[DemandedBits][BDCE] Support vectors of integers"
This reverts commit r348549. Causing assertion failures during
clang build.

llvm-svn: 348558
2018-12-07 00:42:03 +00:00
Nikita Popov cf65b9207b [DemandedBits][BDCE] Support vectors of integers
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

The getDemandedBits() method can now only be called on instructions that
have integer or vector of integer type. Previously it could be called on
any sized instruction (even if it was not particularly useful). The size
of the return value is now always the scalar size in bits (while
previously it was the type size in bits).

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348549
2018-12-06 23:50:32 +00:00
Craig Topper 381b4fb0ab [X86] Remove -costmodel-reduxcost=true from the experimental vector reduction intrinsic tests as it appears to be unnecessary. NFC
I think this has something to do with matching reductions from extractelement, binops, and shuffles. But we're not matching here.

llvm-svn: 348340
2018-12-05 07:56:50 +00:00
Craig Topper b4719e5842 [X86] Add more cost model tests for vector reductions with narrow vector types. NFC
llvm-svn: 348339
2018-12-05 07:26:57 +00:00
Ranjeet Singh b393a516fb Reverting r348215
Causing failures on ubsan buildbot boxes.

llvm-svn: 348230
2018-12-04 02:03:53 +00:00
Ranjeet Singh f5d1b6413f [IR] Don't assume all functions are 4 byte aligned
In some cases different alignments for function might be used to save
space e.g. thumb mode with -Oz will try to use 2 byte function
alignment. Similar patch that fixed this in other areas exists here
https://reviews.llvm.org/D46110

Differential Revision: https://reviews.llvm.org/D55115

llvm-svn: 348215
2018-12-04 00:01:23 +00:00
Jonas Paulsson 8ae0f88b13 [SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test.
A loaded value with multiple users compared with 0 will become a load and
test single instruction. The load is not folded in this case (multiple
users), but the compare instruction is eliminated.

This patch returns 0 cost for the icmp in these cases.

Review: Ulrich Weigand
https://reviews.llvm.org/D55111

llvm-svn: 348141
2018-12-03 14:30:18 +00:00
Michal Gorny 014a6f930a [test] Fix ScalarEvolution test to allow __func__ with prototype
Fix ScalarEvolution/solve-quadratic.ll test to account for __func__
output listing the complete function prototype rather than just its
name, as it does on NetBSD.

Example Linux output:

  GetQuadraticEquation: addrec coeff bw: 4
  GetQuadraticEquation: equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2

Example NetBSD output:

  llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr*): addrec coeff bw: 4
  llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr*): equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2

Differential Revision: https://reviews.llvm.org/D55162

llvm-svn: 348096
2018-12-02 16:49:28 +00:00
Simon Pilgrim 102854f4d4 [TTI] Reduction costs only need to include a single extract element cost (REAPPLIED)
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 348076
2018-12-01 14:18:31 +00:00
Craig Topper 8e10e9423d [X86] Replace '-mcpu=skx' with -mattr=avx512f or -mattr=avx512bw in interleave/strided load/store cost model tests.
llvm-svn: 348056
2018-12-01 00:21:49 +00:00
Nicolai Haehnle 56d0ed2a50 [DA] GPUDivergenceAnalysis for unstructured GPU kernels
Summary:
This is patch #3 of the new DivergenceAnalysis

  <https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html>

The GPUDivergenceAnalysis is intended to eventually supersede the existing
LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces
incorrect results on unstructured Control-Flow Graphs:

  <https://bugs.llvm.org/show_bug.cgi?id=37185>

This patch adds the option -use-gpu-divergence-analysis to the
LegacyDivergenceAnalysis to turn it into a transparent wrapper for the
GPUDivergenceAnalysis.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D53493

llvm-svn: 348048
2018-11-30 22:55:20 +00:00
Jonas Paulsson b1d014883c [SystemZ::TTI] i8/i16 operands extension costs revisited
Three minor changes to these extra costs:

* For ICmp instructions, instead of adding 2 all the time for extending each
  operand, this is only done if that operand is neither a load or an
  immediate.

* The operands extension costs for divides removed, because we now use a high
  cost already for the divide (20).

* The costs for lhsr/ashr extra costs removed as this did not seem useful.

Review: Ulrich Weigand
https://reviews.llvm.org/D55053

llvm-svn: 347961
2018-11-30 07:09:34 +00:00
Craig Topper 81f1b4a361 [X86] Make X86TTIImpl::getCastInstrCost properly handle the case where AVX512 is enabled, but 512-bit vectors aren't legal.
Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal.

This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature.

Differential Revision: https://reviews.llvm.org/D54984

llvm-svn: 347786
2018-11-28 18:11:42 +00:00
Craig Topper d3bb036bc9 [X86] Add some cost model entries for sext/zext for avx512bw
This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst.

I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types.

Differential Revision: https://reviews.llvm.org/D54979

llvm-svn: 347785
2018-11-28 18:11:39 +00:00
Craig Topper f3b6f583e2 [X86] Add a combine for back to back VSRAI instructions
Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI

Differential Revision: https://reviews.llvm.org/D54959

llvm-svn: 347784
2018-11-28 18:03:38 +00:00
Jonas Paulsson 06acb3a236 [SystemZ::TTI] Improve cost for compare of i64 with extended i32 load
CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value
in memory.

This patch makes such a load considered foldable and so gets a 0 cost.

Review: Ulrich Weigand
https://reviews.llvm.org/D54944

llvm-svn: 347735
2018-11-28 08:58:27 +00:00
Jonas Paulsson d6b7aca911 [SystemZ::TTI] Improve costs for i16 add, sub and mul against memory.
AH, SH and MH costs are already covered in the cases where LHS is 32 bits and
RHS is 16 bits of memory sign-extended to i32.

As these instructions are also used when LHS is i16, this patch recognizes
that the loads will get folded then as well.

Review: Ulrich Weigand
https://reviews.llvm.org/D54940

llvm-svn: 347734
2018-11-28 08:31:50 +00:00
Jonas Paulsson 011a503f25 [SystemZ::TTI] Improved cost values for comparison against memory.
Single instructions exist for i8 and i16 comparisons of memory against a
small immediate.

This patch makes sure that if the load in these cases has a single user (the
ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1.

Review: Ulrich Weigand
https://reviews.llvm.org/D54897

llvm-svn: 347733
2018-11-28 08:08:05 +00:00
Jonas Paulsson 5da8e432b9 [SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap.
Since byte-swapping loads and stores are supported, a 'load -> bswap' or
'bswap -> store' sequence should have the cost of one.

Review: Ulrich Weigand
https://reviews.llvm.org/D54870

llvm-svn: 347732
2018-11-28 07:52:34 +00:00
Craig Topper 078e58da15 [X86] Add test cases to show that we don't properly take -mprefer-vector-width=256 and -min-legal-vector-width=256 into account when costing sext/zext.
The check lines marked AVX256 in the zext256/sext256 functions should be closer to the AVX values which would take into account a splitting cost.

llvm-svn: 347722
2018-11-28 00:33:34 +00:00
Craig Topper 5e7dcc65bd [X86] Add exhaustive cost model testing for sext/zext for all vector types we reasonably support. Add cost model tests for truncating to vXi1.
Our sext/zext cost modeling was somewhat incomplete. And had no coverage for the fact that avx512bw v32i16/v64i8 types return a scalarization cost.

Truncates are a whole different mess because isTruncateFree is returning true for vectors when it shouldn't and that's the fall back for anything not in the tables.

llvm-svn: 347719
2018-11-27 22:46:05 +00:00
Craig Topper e535babe4c [X86] Add cost model tests for experimental.vector.reduce.* with -x86-experimental-vector-widening-legalization
llvm-svn: 347697
2018-11-27 19:44:40 +00:00
Craig Topper 2f9e5c40a6 [X86] Add cost model test for masked load an store with -x86-experimental-vector-widening-legalization
llvm-svn: 347696
2018-11-27 19:44:36 +00:00
Craig Topper 69cf86e327 [X86] Add cost model tests for fp_to_int/int_to_fp with -x86-experimental-vector-widening-legalization
llvm-svn: 347695
2018-11-27 19:44:34 +00:00
Craig Topper 1a2f9f765e [X86] Add cost model tests for shifts with -x86-experimental-vector-widening-legalization.
llvm-svn: 347694
2018-11-27 19:44:30 +00:00
Vitaly Buka 42b050673e [stack-safety] Inter-Procedural Analysis implementation
Summary:
IPA is implemented as module pass which produce map from Function or Alias to
StackSafetyInfo for a single function.

From prototype by Evgenii Stepanov and Vlad Tsyrklevich.

Reviewers: eugenis, vlad.tsyrklevich, pcc, glider

Subscribers: hiraditya, mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D54543

llvm-svn: 347611
2018-11-26 23:05:58 +00:00
Vitaly Buka b8e6fa6638 [stack-safety] Empty local passes for Stack Safety Global Analysis
Reviewers: eugenis, vlad.tsyrklevich

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D54541

llvm-svn: 347610
2018-11-26 23:05:48 +00:00
Vitaly Buka fa98c074b7 [stack-safety] Local analysis implementation
Summary:
Analysis produces StackSafetyInfo which contains information with how allocas
and parameters were used in functions.

From prototype by Evgenii Stepanov and  Vlad Tsyrklevich.

Reviewers: eugenis, vlad.tsyrklevich, pcc, glider

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D54504

llvm-svn: 347603
2018-11-26 21:57:59 +00:00
Vitaly Buka 4493fe1c1b [stack-safety] Empty local passes for Stack Safety Local Analysis
Reviewers: eugenis, vlad.tsyrklevich

Subscribers: mgorny, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D54502

llvm-svn: 347602
2018-11-26 21:57:47 +00:00
Nikita Popov f94c8f0d1b [DemandedBits] Add support for funnel shifts
Add support for funnel shifts to the DemandedBits analysis. The
demanded bits of the first two operands can be determined if the
shift amount is constant. The demanded bits of the third operand
(shift amount) can be determined if the bitwidth is a power of two.

This is basically the same functionality as implemented in D54869
and D54478, but for DemandedBits rather than InstCombine.

Differential Revision: https://reviews.llvm.org/D54876

llvm-svn: 347561
2018-11-26 15:36:57 +00:00
Fedor Sergeev 8cd9d1b5ce Revert "[TTI] Reduction costs only need to include a single extract element cost"
This reverts commit r346970.
It was causing PR39774, a crash in slp-vectorizer on a rather simple loop
with just a bunch of 'and's in the body.

llvm-svn: 347541
2018-11-26 10:17:27 +00:00
Jonas Paulsson 96782c2c0b [SystemZTTIImpl] Give correct cost values for vector bswap intrinsics.
Implement getIntrinsicInstrCost() and return costs reflecting that bswap can
be done with a vperm per vector register.

Review: Ulrich Weigand
https://reviews.llvm.org/D54789

llvm-svn: 347445
2018-11-22 07:17:29 +00:00
John Regehr 3a1c9d55cc [LVI] run transfer function for binary operator even when the RHS isn't a constant
LVI was symbolically executing binary operators only when the RHS was
constant, missing the case where we have a ConstantRange for the RHS,
but not an actual constant. Tested using check-all and by
bootstrapping. Compile time is not impacted measurably.

Differential Revision: https://reviews.llvm.org/D19859

llvm-svn: 347379
2018-11-21 05:24:12 +00:00
Sanjay Patel efc3d1dfaa [ConstantFolding] Add support for saturating add/sub
Support saturating add/sub in constant folding, based on the APInt methods introduced in D54332.

Patch by: @nikic (Nikita Popov)

Differential Revision: https://reviews.llvm.org/D54531

llvm-svn: 347328
2018-11-20 17:05:55 +00:00
Anna Thomas 5e9215f02b [LV] Avoid vectorizing unsafe dependencies in uniform address
Summary:
Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address causing an unsafe dependency.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.

Fixes PR39653.

Reviewers: Ayal, efriedma

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D54538

llvm-svn: 347220
2018-11-19 15:39:59 +00:00
Simon Pilgrim 924f193419 [TTI] Reduction costs only need to include a single extract element cost
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 346970
2018-11-15 17:42:53 +00:00
Simon Pilgrim 2b166c5044 [TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue
llvm-svn: 346868
2018-11-14 15:04:08 +00:00
Simon Pilgrim cdb170794b [CostModel] Add generic expansion funnel shift cost support
Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost.

This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs.

llvm-svn: 346854
2018-11-14 12:24:50 +00:00
Simon Pilgrim e827fe09b3 [CostModel][X86] Fix constant vector XOP rights shifts
We'll constant fold these cases so they are as cheap as vector left shift cases.

Noticed while improving funnel shift costs.

llvm-svn: 346760
2018-11-13 16:40:10 +00:00
Simon Pilgrim 2fe1076a08 [CostModel][X86] Add more cost tests for funnel shifts
Added full uniform/constant coverage for funnel shifts + rotates

llvm-svn: 346754
2018-11-13 12:11:15 +00:00
Simon Pilgrim 93c64e5c76 [CostModel][X86] Add funnel shift rotation special case costs
When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same.

llvm-svn: 346688
2018-11-12 18:27:54 +00:00
Simon Pilgrim 49e93d2f0e [CostModel][X86] Add SHLD/SHRD scalar funnel shift costs
The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level

llvm-svn: 346683
2018-11-12 17:56:59 +00:00
Simon Pilgrim 095ea939c3 [CostModel][X86] Add some initial cost tests for funnel shifts
Still need to add full uniform/constant coverage but this is enough to check basic fshl/fshr cost handling

llvm-svn: 346670
2018-11-12 16:39:41 +00:00
Simon Pilgrim f4cd292ba2 [CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is aligned within the source vector
llvm-svn: 346664
2018-11-12 15:48:06 +00:00
Jonas Paulsson 5cea85dd59 [SystemZ::TTI] Improve accuracy of costs for vector fp <-> int conversions
Improve getCastInstrCost() by respecting the different types of Src and Dst
for vector integer <-> fp conversions.

This means that extracting from integer becomes more expensive (by the
extraction penalty), and the extraction from fp becomes cheaper (no longer
has a false extraction penalty).

Review: Ulrich Weigand
https://reviews.llvm.org/D54423

llvm-svn: 346663
2018-11-12 15:32:27 +00:00
Simon Pilgrim 47d38198eb [CostModel] Add more realistic SK_InsertSubvector generic costs.
Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.

llvm-svn: 346662
2018-11-12 15:20:24 +00:00
Simon Pilgrim 631f2bf51e [CostModel] Add more realistic SK_ExtractSubvector generic costs.
Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.

This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type.

llvm-svn: 346656
2018-11-12 14:25:23 +00:00
Simon Pilgrim fc8f1d7da7 [CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector
llvm-svn: 346538
2018-11-09 19:04:27 +00:00
Simon Pilgrim d0c71609c5 [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368)
Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks.

llvm-svn: 346510
2018-11-09 16:28:19 +00:00
Max Kazantsev 266c087b9d Return "[IndVars] Smart hard uses detection"
The patch has been reverted because it ended up prohibiting propagation
of a constant to exit value. For such values, we should skip all checks
related to hard uses because propagating a constant is always profitable.

Differential Revision: https://reviews.llvm.org/D53691

llvm-svn: 346397
2018-11-08 11:54:35 +00:00
Max Kazantsev e059f4452b Revert "[IndVars] Smart hard uses detection"
This reverts commit 2f425e9c7946b9d74e64ebbfa33c1caa36914402.

It seems that the check that we still should do the transform if we
know the result is constant is missing in this code. So the logic that
has been deleted by this change is still sometimes accidentally useful.
I revert the change to see what can be done about it. The motivating
case is the following:

@Y = global [400 x i16] zeroinitializer, align 1

define i16 @foo() {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %i = phi i16 [ 0, %entry ], [ %inc, %for.body ]

  %arrayidx = getelementptr inbounds [400 x i16], [400 x i16]* @Y, i16 0, i16 %i
  store i16 0, i16* %arrayidx, align 1
  %inc = add nuw nsw i16 %i, 1
  %cmp = icmp ult i16 %inc, 400
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  %inc.lcssa = phi i16 [ %inc, %for.body ]
  ret i16 %inc.lcssa
}

We should be able to figure out that the result is constant, but the patch
breaks it.

Differential Revision: https://reviews.llvm.org/D51584

llvm-svn: 346198
2018-11-06 02:02:05 +00:00
Jonas Paulsson cced2a2775 [SystemZ::TTI] Improve cost handling of uint/sint to fp conversions.
Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load.

Since the load already does the extension, there is no extra cost (previously
returned 2).

Review: Ulrich Weigand
https://reviews.llvm.org/D54028

llvm-svn: 346009
2018-11-02 17:53:31 +00:00
Easwaran Raman c5e1506ec8 [ProfileSummary] Add options to override hot and cold count thresholds.
Summary:
The hot and cold count thresholds are derived from the summary, but for
debugging purposes it is convenient to provide the actual thresholds.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54040

llvm-svn: 346005
2018-11-02 17:39:31 +00:00
Jonas Paulsson 6749c24f40 [SystemZ::TTI] Recognize the higher cost of scalar i1 -> fp conversion
Scalar i1 to fp conversions are done with a branch sequence, so it should
have a higher cost.

Review: Ulrich Weigand
https://reviews.llvm.org/D53924

llvm-svn: 345818
2018-11-01 09:05:32 +00:00
Jonas Paulsson f15a53bc81 [SystemZ::TTI] Accurate costs for i1->double vector conversions
This factors out a new method getBoolVecToIntConversionCost() containing the
code for vector sext/zext of i1, in order to reuse it for i1 to double vector
conversions.

Review: Ulrich Weigand
https://reviews.llvm.org/D53923

llvm-svn: 345817
2018-11-01 09:01:51 +00:00
Max Kazantsev 3d347bf545 [IndVars] Smart hard uses detection
When rewriting loop exit values, IndVars considers this transform not profitable if
the loop instruction has a loop user which it believes cannot be optimized away.
In current implementation only calls that immediately use the instruction are considered
as such.

This patch extends the definition of "hard" users to any side-effecting instructions
(which usually cannot be optimized away from the loop) and also allows handling
of not just immediate users, but use chains.

Differentlai Revision: https://reviews.llvm.org/D51584
Reviewed By: etherzhhb

llvm-svn: 345814
2018-11-01 06:47:01 +00:00
Max Kazantsev e0a2613aea [SCEV] Avoid redundant computations when doing AddRec merge
When we calculate a product of 2 AddRecs, we end up making quite massive
computations to deduce the operands of resulting AddRec. This process can
be optimized by computing all args of intermediate sum and then calling
`getAddExpr` once rather than calling `getAddExpr` with intermediate
result every time a new argument is computed.

Differential Revision: https://reviews.llvm.org/D53189
Reviewed By: rtereshin

llvm-svn: 345813
2018-11-01 06:18:27 +00:00
Simon Pilgrim 44a9a71d2a [TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)
Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.

Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination!

I've done my best to fix a number of vectorizer uses:

SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment.

LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta.

Differential Revision: https://reviews.llvm.org/D53573

llvm-svn: 345617
2018-10-30 18:10:02 +00:00
Jonas Paulsson af8e036c29 [SystemZ] Improve isFoldableLoad() for Sub, SDiv and UDiv.
Sub, SDiv and UDiv are not commutative, so only the RHS operand can fold a
load. This patch adds a check for this.

Review: Ulrich Weigand
https://reviews.llvm.org/D53791

llvm-svn: 345596
2018-10-30 13:41:03 +00:00
Craig Topper faf423ca74 [X86] Add -LABEL to some FileCheck checks. NFC
llvm-svn: 345407
2018-10-26 17:21:19 +00:00
Jonas Paulsson b7caa809e1 [SystemZ] Improve getMemoryOpCost() to find foldable loads that are converted.
The SystemZ backend can do arithmetic of memory by loading and then extending
one of the operands. Similarly, a load + truncate can be folded into an
operand.

This patch improves the SystemZ TTI cost function to recognize this.

Review: Ulrich Weigand
https://reviews.llvm.org/D52692

llvm-svn: 345327
2018-10-25 22:28:25 +00:00
Jonas Paulsson 4645711a8d [SystemZ] Improve handling and cost estimates of vector integer div/rem
Enable the DAG optimization that converts vector div/rem with constants into
multiply+shifts sequences by expanding them early. This is needed since
ISD::SMUL_LOHI is 'Custom' lowered on SystemZ, and will therefore not be
available to BuildSDIV after legalization.

Better cost values for these instructions based on how they will be
implemented (a constant divisor is cheaper).

Review: Ulrich Weigand
https://reviews.llvm.org/D53196

llvm-svn: 345321
2018-10-25 21:47:22 +00:00
Simon Pilgrim 53e8e145e9 [CostModel][X86] Add realistic vXi64 uitofp vXf64 costs
Match codegen improvements from D53649/rL345256

llvm-svn: 345263
2018-10-25 13:06:20 +00:00
Simon Pilgrim 0573b8d8b6 [CostModel][X86] Add realistic i64 uitofp f64 scalar costs
llvm-svn: 345261
2018-10-25 12:42:10 +00:00
Simon Pilgrim ac84005841 [CostModel][X86] Add vXi8 vector division by constants costs.
ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV.

llvm-svn: 345175
2018-10-24 18:44:12 +00:00
Simon Pilgrim 2cce074e8c [CostModel][X86] Enable non-uniform vector division by constants costs.
Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases.

llvm-svn: 345164
2018-10-24 17:30:29 +00:00
Simon Pilgrim f04a04c2b6 [TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no difference in lowering.
llvm-svn: 345048
2018-10-23 16:45:26 +00:00
Simon Pilgrim f1d8b7c49e [CostModel][X86] Add transpose shuffle cost tests
llvm-svn: 345045
2018-10-23 16:27:14 +00:00
Simon Pilgrim 9a2ee1ea2d Add BROADCAST shuffle cost tests.
Part of a lot of cleanup necessary before PR39368.

llvm-svn: 345025
2018-10-23 13:14:54 +00:00