Commit Graph

14894 Commits

Author SHA1 Message Date
Matt Arsenault 16295d521e InstCombine: Broaden copy-constant-to-alloca optimization
Consider any constant memory type, not just global constants. AMDGPU
kernel parameters are effectively global constants, but appear as
either reads from an intrinsic derived pointer or function argument.
2020-05-09 16:00:27 -04:00
Simon Pilgrim f8b09f7b52 [CodeGenPrepare][X86] Add x16i16, v32i8 and XOP vector shift by scalar amount tests
Helps improve test coverage of the XOP modes in X86TargetLowering::isVectorShiftByScalarCheap (and where we always return false for vXi8 vector shifts).
2020-05-09 20:47:42 +01:00
Sanjay Patel 0d2a0b44c8 [VectorCombine] scalarize binop of inserted elements into vector constants
As with the extractelement patterns that are currently in vector-combine,
there are going to be several possible variations on this theme. This
should be the clearest, simplest example.

Scalarization is the right direction for target-independent canonicalization,
and InstCombine has some of those folds already, but it doesn't do this.
I proposed a similar transform in D50992. Here in vector-combine, we can
check the cost model to be sure it's profitable, so there should be less risk.

Differential Revision: https://reviews.llvm.org/D79452
2020-05-08 16:31:12 -04:00
zoecarver f65f566aeb Re-commit: Mark values as trivially dead when their only use is a start or end lifetime intrinsic.
Summary:
If the only use of a value is a start or end lifetime intrinsic then mark the intrinsic as trivially dead. This should allow for that value to then be removed as well.

Currently, this only works for allocas, globals, and arguments.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79355
2020-05-08 12:24:10 -07:00
Sanjay Patel 1aa8cef97a [InstCombine] add/adjust tests for fpext of casted value; NFC 2020-05-08 15:22:36 -04:00
Wei Mi aa2ddfc73d [SampleFDO] For functions without profiles, provide an option to put
them in a special text section.

For sampleFDO, because the optimized build uses profile generated from
previous release, previously we couldn't tell a function without profile
was truely cold or just newly created so we had to treat them conservatively
and put them in .text section instead of .text.unlikely. The result was when
we persuing the best performance by locking .text.hot and .text in memory,
we wasted a lot of memory to keep cold functions inside.

In https://reviews.llvm.org/D66374, we introduced profile symbol list to
discriminate functions being cold versus functions being newly added.
This mechanism works quite well for regular use cases in AutoFDO. However,
in some case, we can only have a partial profile when optimizing a target.
The partial profile may be an aggregated profile collected from many targets.
The profile symbol list method used for regular sampleFDO profile is not
applicable to partial profile use case because it may be too large and
introduce many false positives.

To solve the problem for partial profile use case, we provide an option called
--profile-unknown-in-special-section. For functions without profile, we will
still treat them conservatively in compiler optimizations -- for example,
treat them as warm instead of cold in inliner. When we use profile info to
add section prefix for functions, we will discriminate functions known to be
not cold versus functions without profile (being unknown), and we will put
functions being unknown in a special text section called .text.unknown.
Runtime system will have the flexibility to decide where to put the special
section in order to achieve a balance between performance and memory saving.

Differential Revision: https://reviews.llvm.org/D62540
2020-05-08 11:18:09 -07:00
Sanjay Patel df5c9fdaac [InstCombine] add tests for known bits before FP casts; NFC 2020-05-08 13:44:32 -04:00
Matt Arsenault 78a43f10c7 AMDGPU: Don't assert on unknown address spaces
Assume unknown address spaces behave like some flavor of global
memory.
2020-05-08 12:57:27 -04:00
Benjamin Kramer f936457f80 Revert "Recommit "[LV] Induction Variable does not remain scalar under tail-folding.""
This reverts commit ae45b4dbe7. It
causes miscompilations, test case on the mailing list.
2020-05-08 14:49:10 +02:00
Nikita Popov 5a2265647e Reapply [InstSimplify] Remove known bits constant folding
No changes relative to last time, but after a mitigation for
an AMDGPU regression landed.

---

If SimplifyInstruction() does not succeed in simplifying the
instruction, it will compute the known bits of the instruction
in the hope that all bits are known and the instruction can be
folded to a constant. I have removed a similar optimization
from InstCombine in D75801, and would like to drop this one as well.

On average, we spend ~1% of total compile-time performing this
known bits calculation. However, if we introduce some additional
statistics for known bits computations and how many of them succeed
in simplifying the instruction we get (on test-suite):

    instsimplify.NumKnownBits: 216
    instsimplify.NumKnownBitsComputed: 13828375
    valuetracking.NumKnownBitsComputed: 45860806

Out of ~14M known bits calculations (accounting for approximately
one third of all known bits calculations), only 0.0015% succeed in
producing a constant. Those cases where we do succeed to compute
all known bits will get folded by other passes like InstCombine
later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test
show a hash difference after this change. On lencod we see an
improvement (a loop phi is optimized away), on the GCC torture
test a regression (a function return value is determined only
after IPSCCP, preventing propagation from a noinline function.)

There are various regressions in InstSimplify tests. However, all
of these cases are already handled by InstCombine, and corresponding
tests have already been added there.

Differential Revision: https://reviews.llvm.org/D79294
2020-05-08 10:24:53 +02:00
Diego Caballero f5224d437e [LoopFusion] Remove unreachable blocks from DT and LI after fusion
This patch removes FC0.ExitBlock and FC1GuardBlock from DT and LI
after fusion of guarded loops. They become unreachable and LI
verification failed when they happened to be inside another loop.

Reviewed By: kbarton

Differential Revision: https://reviews.llvm.org/D78679
2020-05-07 16:44:40 -07:00
Johannes Doerfert edf0391491 [Attributor][FIX] Record dependences for assumed dead abstract attributes
In a recent patch we introduced a problem with abstract attributes that
were assumed dead at some point. Since `Attributor::updateAA` was
introduced in 95e0d28b71, we did not
remember the dependence on the liveness AA when an abstract attribute
was assumed dead and therefore not updated.

Explicit reproducer added in liveness.ll.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 509242 (345483/s)
temporary memory allocations: 98666 (66937/s)
peak heap memory consumption: 18.60MB
peak RSS (including heaptrack overhead): 103.29MB
total memory leaked: 269.10KB
```

After:
```
calls to allocation functions: 529332 (355494/s)
temporary memory allocations: 102107 (68574/s)
peak heap memory consumption: 19.40MB
peak RSS (including heaptrack overhead): 102.79MB
total memory leaked: 269.10KB
```

Difference:
```
calls to allocation functions: 20090 (1339333/s)
temporary memory allocations: 3441 (229400/s)
peak heap memory consumption: 801.45KB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
2020-05-07 17:00:50 -05:00
Alina Sbirlea 6227f021ad [SimpleLoopUnswitch] Update DefaultExit condition to check unreachable is not empty.
Summary:
Update the check for the default exit block to not only check that the
terminator is not unreachable, but also check that unreachable block has
*only* the unreachable instruction.

Reviewers: chandlerc

Subscribers: hiraditya, uabelho, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78277
2020-05-07 13:48:30 -07:00
Sanjay Patel 5b48f7d2fc [VectorCombine] adjust test to make intent clearer; NFC
Create a non-zero result to show that the other lane is computed correctly.
2020-05-07 16:21:17 -04:00
Huihui Zhang e8ea1eb4c1 [NFC] Adjust test check lines for D78267.
This wasn't identified through buildbot before.
2020-05-07 13:20:15 -07:00
Huihui Zhang 1ec0cc0f02 [InstCombine][SVE] Fix visitExtractElementInst for scalable type.
Summary:
This patch fix the following issues with visitExtractElementInst:

      1. Restrict VectorUtils::findScalarElement to fixed-length vector.
         For scalable type, the number of elements in shuffle mask is
         unknown at compile-time.
      2. Fix out-of-range calculation for fixed-length vector.
      3. Skip scalable type when analysis rely on fixed number of elements.
      4. Add unit tests to check functionality of extractelement for scalable type.

Reviewers: sdesmalen, efriedma, spatel, nikic

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78267
2020-05-07 13:03:52 -07:00
Huihui Zhang 08c9c13749 [InstCombine][SVE] Fix visitInsertElementInst for scalable type.
Summary:
This patch fixes the following issues in visitInsertElementInst:

      1. Bail out for scalable type when analysis requires fixed size number of vector elements.
      2. Use cast<FixedVectorType> to get vector number of elements. This ensure assertion
          on scalable vector type.
      3. For scalable type, avoid folding a chain of insertelement into splat:
            insertelt(insertelt(insertelt(insertelt X, %k, 0), %k, 1), %k, 2) ...
              ->
            shufflevector(insertelt(X, %k, 0), undef, zero)
          The length of scalable vector is unknown at compile-time, therefore we don't know if
          given insertelement sequence is valid for splat.

Reviewers: sdesmalen, efriedma, spatel, nikic

Reviewed By: sdesmalen, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78895
2020-05-07 12:44:52 -07:00
Sanjay Patel 5d0f2fdfa5 [VectorCombine] add tests with undefs; NFC
Goes with D79452.
2020-05-07 15:28:26 -04:00
Sanjay Patel 02051c7f3a [SLP] add another bailout for load-combine patterns (2nd try)
The original patch (rG86dfbc676ebe) exposed an existing bug:
we could wrongly cast a constant expression to BinaryOperator
because the pattern matching allows that. This adds a check
for that case, and there's a reduced test case to verify no
crashing.

Original commit message:

This builds on the or-reduction bailout that was added with D67841.
We still do not have IR-level load combining, although that could
be a target-specific enhancement for -vector-combiner.

The heuristic is narrowly defined to catch the motivating case from
PR39538:
https://bugs.llvm.org/show_bug.cgi?id=39538
...while preserving existing functionality.

That is, there's an unmodified test of pure load/zext/store that is
not seen in this patch at llvm/test/Transforms/SLPVectorizer/X86/cast.ll.
That's the reason for the logic difference to require the 'or'
instructions. The chances that vectorization would actually help a
memory-bound sequence like that seem small, but it looks nicer with:

  vpmovzxwd     (%rsi), %xmm0
  vmovdqu       %xmm0, (%rdi)

rather than:

  movzwl        (%rsi), %eax
  movl  %eax, (%rdi)
  ...

In the motivating test, we avoid creating a vector mess that is
unrecoverable in the backend, and SDAG forms the expected bswap
instructions after load combining:

  movzbl (%rdi), %eax
  vmovd %eax, %xmm0
  movzbl 1(%rdi), %eax
  vmovd %eax, %xmm1
  movzbl 2(%rdi), %eax
  vpinsrb $4, 4(%rdi), %xmm0, %xmm0
  vpinsrb $8, 8(%rdi), %xmm0, %xmm0
  vpinsrb $12, 12(%rdi), %xmm0, %xmm0
  vmovd %eax, %xmm2
  movzbl 3(%rdi), %eax
  vpinsrb $1, 5(%rdi), %xmm1, %xmm1
  vpinsrb $2, 9(%rdi), %xmm1, %xmm1
  vpinsrb $3, 13(%rdi), %xmm1, %xmm1
  vpslld $24, %xmm0, %xmm0
  vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
  vpslld $16, %xmm1, %xmm1
  vpor %xmm0, %xmm1, %xmm0
  vpinsrb $1, 6(%rdi), %xmm2, %xmm1
  vmovd %eax, %xmm2
  vpinsrb $2, 10(%rdi), %xmm1, %xmm1
  vpinsrb $3, 14(%rdi), %xmm1, %xmm1
  vpinsrb $1, 7(%rdi), %xmm2, %xmm2
  vpinsrb $2, 11(%rdi), %xmm2, %xmm2
  vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
  vpinsrb $3, 15(%rdi), %xmm2, %xmm2
  vpslld $8, %xmm1, %xmm1
  vpmovzxbd %xmm2, %xmm2 # xmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[2],zero,zero,zero,xmm2[3],zero,zero,zero
  vpor %xmm2, %xmm1, %xmm1
  vpor %xmm1, %xmm0, %xmm0
  vmovdqu %xmm0, (%rsi)

  movl  (%rdi), %eax
  movl  4(%rdi), %ecx
  movl  8(%rdi), %edx
  movbel        %eax, (%rsi)
  movbel        %ecx, 4(%rsi)
  movl  12(%rdi), %ecx
  movbel        %edx, 8(%rsi)
  movbel        %ecx, 12(%rsi)

Differential Revision: https://reviews.llvm.org/D78997
2020-05-07 15:04:37 -04:00
Sanjay Patel 62ea77ec02 [SLP] add test for constant expression fake of load-combine pattern; NFC
This is a reduction of the test that caused D78997 to be reverted.
2020-05-07 15:04:37 -04:00
Hans Wennborg c54c6ee1a7 Revert "[SLP] add another bailout for load-combine patterns"
It caused asserts building Chromium, see discussion on
https://reviews.llvm.org/D78997

This reverts commit 86dfbc676e.
2020-05-07 16:31:52 +02:00
Sanjay Patel 666c61db79 [VectorCombine] add tests for insert into arbitrary constant; NFC
Goes with D79452.
2020-05-07 10:27:25 -04:00
Sjoerd Meijer ae45b4dbe7 Recommit "[LV] Induction Variable does not remain scalar under tail-folding."
With 3 llvm regr tests fixed/updated that I had missed.
2020-05-07 11:52:20 +01:00
Sjoerd Meijer 20d67ffeae Revert "[LV] Induction Variable does not remain scalar under tail-folding."
This reverts commit 617aa64c84.

while I investigate buildbot failures.
2020-05-07 09:29:56 +01:00
Sjoerd Meijer 617aa64c84 [LV] Induction Variable does not remain scalar under tail-folding.
If tail-folding of the scalar remainder loop is applied, the primary induction
variable is splat to a vector and used by the masked load/store vector
instructions, thus the IV does not remain scalar. Because we now mark
that the IV does not remain scalar for these cases, we don't emit the vector IV
if it is not used. Thus, the vectoriser produces less dead code.

Thanks to Ayal Zaks for the direction how to fix this.

Differential Revision: https://reviews.llvm.org/D78911
2020-05-07 09:15:23 +01:00
Whitney Tsang 0a52401ad6 [LoopUnrollAndJam] Changed safety checks to consider more than 2-levels
loop nest.

Summary: As discussed in https://reviews.llvm.org/D73129.

Example
Before unroll and jam:

for
  A
  for
    B
    for
      C
    D
  E
After unroll and jam (currently):

for
  A
  A'
  for
    B
    for
      C
    D
    B'
    for
      C'
    D'
  E
  E'
After unroll and jam (Ideal):

for
  A
  A'
  for
    B
    B'
    for
      C
      C'
    D
    D'
  E
  E'
This is the first patch to change unroll and jam to work in the ideal
way.
This patch change the safety checks needed to make sure is safe to
unroll and jam in the ideal way.

Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto
Reviewed By: Meinersbur
Subscribers: fhahn, hiraditya, zzheng, llvm-commits, anhtuyen, prithayan
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D76132
2020-05-06 21:47:44 +00:00
zoecarver 1998e796e9 Revert "Mark values as trivially dead when their only use is a start or end lifetime intrinsic."
This reverts commit 95aa28cc8f.
2020-05-06 11:07:22 -07:00
zoecarver 95aa28cc8f Mark values as trivially dead when their only use is a start or end lifetime intrinsic.
Summary:
If the only use of a value is a start or end lifetime intrinsic then mark the intrinsic as trivially dead. This should allow for that value to then be removed as well.

Currently, this only works for allocas, globals, and arguments.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79355
2020-05-06 10:58:08 -07:00
Sanjay Patel 2058c98715 [InstCombine] limit bitcast+insertelement transform to x86 MMX type
This is unusual for the general case because we are replacing
1 instruction with 2.

Splitting from a potential conflicting transform in D79171
2020-05-06 13:12:36 -04:00
Sanjay Patel e3eb297deb [VectorCombine] add tests for possible scalarization; NFC 2020-05-06 09:58:27 -04:00
Johannes Doerfert 094137a6c6 [Attributor][NFC] Avoid dependences on known information 2020-05-05 23:14:23 -05:00
Christopher Tetreault 855e02e799 [SVE] Fix invalid usage of getNumElements() in InstCombineMulDivRem
Summary:
getLogBase2 tries to iterate over the number of vector elements. Since
the number of elements of a scalable vector is unknown at compile time,
we must return null if the input type is scalable.

Identified by test LLVM.Transforms/InstCombine::nsw.ll

Reviewers: efriedma, fpetrogalli, kmclaughlin, spatel

Reviewed By: efriedma, fpetrogalli

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79197
2020-05-05 15:19:01 -07:00
Sanjay Patel a954b8a363 [ValueTracking] fix CannotBeNegativeZero() to disregard 'nsz' FMF
The 'nsz' flag is different than 'nnan' or 'ninf' in that it does not create poison.
Make that explicit in the LangRef and fix ValueTracking analysis that misinterpreted
the definition.

This manifests as bugs in InstSimplify shown in the test diffs and as discussed in
PR45778:
https://bugs.llvm.org/show_bug.cgi?id=45778

Differential Revision: https://reviews.llvm.org/D79422
2020-05-05 16:04:59 -04:00
Sanjay Patel 86dfbc676e [SLP] add another bailout for load-combine patterns
This builds on the or-reduction bailout that was added with D67841.
We still do not have IR-level load combining, although that could
be a target-specific enhancement for -vector-combiner.

The heuristic is narrowly defined to catch the motivating case from
PR39538:
https://bugs.llvm.org/show_bug.cgi?id=39538
...while preserving existing functionality.

That is, there's an unmodified test of pure load/zext/store that is
not seen in this patch at llvm/test/Transforms/SLPVectorizer/X86/cast.ll.
That's the reason for the logic difference to require the 'or'
instructions. The chances that vectorization would actually help a
memory-bound sequence like that seem small, but it looks nicer with:

  vpmovzxwd	(%rsi), %xmm0
  vmovdqu	%xmm0, (%rdi)

rather than:

  movzwl	(%rsi), %eax
  movl	%eax, (%rdi)
  ...

In the motivating test, we avoid creating a vector mess that is
unrecoverable in the backend, and SDAG forms the expected bswap
instructions after load combining:

  movzbl (%rdi), %eax
  vmovd %eax, %xmm0
  movzbl 1(%rdi), %eax
  vmovd %eax, %xmm1
  movzbl 2(%rdi), %eax
  vpinsrb $4, 4(%rdi), %xmm0, %xmm0
  vpinsrb $8, 8(%rdi), %xmm0, %xmm0
  vpinsrb $12, 12(%rdi), %xmm0, %xmm0
  vmovd %eax, %xmm2
  movzbl 3(%rdi), %eax
  vpinsrb $1, 5(%rdi), %xmm1, %xmm1
  vpinsrb $2, 9(%rdi), %xmm1, %xmm1
  vpinsrb $3, 13(%rdi), %xmm1, %xmm1
  vpslld $24, %xmm0, %xmm0
  vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
  vpslld $16, %xmm1, %xmm1
  vpor %xmm0, %xmm1, %xmm0
  vpinsrb $1, 6(%rdi), %xmm2, %xmm1
  vmovd %eax, %xmm2
  vpinsrb $2, 10(%rdi), %xmm1, %xmm1
  vpinsrb $3, 14(%rdi), %xmm1, %xmm1
  vpinsrb $1, 7(%rdi), %xmm2, %xmm2
  vpinsrb $2, 11(%rdi), %xmm2, %xmm2
  vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
  vpinsrb $3, 15(%rdi), %xmm2, %xmm2
  vpslld $8, %xmm1, %xmm1
  vpmovzxbd %xmm2, %xmm2 # xmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[2],zero,zero,zero,xmm2[3],zero,zero,zero
  vpor %xmm2, %xmm1, %xmm1
  vpor %xmm1, %xmm0, %xmm0
  vmovdqu %xmm0, (%rsi)

  movl	(%rdi), %eax
  movl	4(%rdi), %ecx
  movl	8(%rdi), %edx
  movbel	%eax, (%rsi)
  movbel	%ecx, 4(%rsi)
  movl	12(%rdi), %ecx
  movbel	%edx, 8(%rsi)
  movbel	%ecx, 12(%rsi)

Differential Revision: https://reviews.llvm.org/D78997
2020-05-05 12:44:38 -04:00
Jay Foad 22829ab5fa [InstCombine] Allow denormal C in pow(C,y) -> exp2(log2(C)*y)
We check that C is finite and strictly positive, but there's no need to
check that it's normal too. exp2 should be just as accurate on denormals
as pow is.

Differential Revision: https://reviews.llvm.org/D79413
2020-05-05 16:25:48 +01:00
Jay Foad 47f5066553 Precommit new test cases for D79413 [InstCombine] Allow denormal C in pow(C,y) -> exp2(log2(C)*y) 2020-05-05 16:08:09 +01:00
Sam Parker f35ccfa2af [NFC] Update tests
Run the update script on a couple of tests.
2020-05-05 15:28:40 +01:00
Jay Foad fa2783d79a [InstCombine] Remove hasOneUse check for pow(C,x) -> exp2(log2(C)*x)
I don't think there's any good reason not to do this transformation when
the pow has multiple uses.

Differential Revision: https://reviews.llvm.org/D79407
2020-05-05 14:46:08 +01:00
Simon Pilgrim 5c91aa6603 [InstCombine] Fold or(zext(bswap(x)),shl(zext(bswap(y)),bw/2)) -> bswap(or(zext(x),shl(zext(y), bw/2))
This adds a general combine that can be used to fold:

  or(zext(OP(x)), shl(zext(OP(y)),bw/2))
-->
  OP(or(zext(x), shl(zext(y),bw/2)))

Allowing us to widen 'concat-able' style or+zext patterns - I've just set this up for BSWAP but we could use this for other similar ops (BITREVERSE for instance).

We already do something similar for bitop(bswap(x),bswap(y)) --> bswap(bitop(x,y))

Fixes PR45715

Reviewed By: @lebedev.ri

Differential Revision: https://reviews.llvm.org/D79041
2020-05-05 12:30:10 +01:00
Sergey Dmitriev f637334df9 [CallGraphUpdater] Removed references to calles when deleting function
Summary: Otherwise we can get unaccounted references to call graph nodes.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79382
2020-05-04 18:59:47 -07:00
Simon Pilgrim 940061438e [InstCombine] Fold (mul(abs(x),abs(x))) -> (mul(x,x)) (PR39476)
This patch adds support for discarding integer absolutes (abs + nabs variants) from self-multiplications.

ABS Alive2: http://volta.cs.utah.edu:8080/z/rwcc8W
NABS Alive2: http://volta.cs.utah.edu:8080/z/jZXUwQ

This is an InstCombine version of D79304 - I'm not sure yet if we'll need that after this.

Reviewed By: @lebedev.ri and @xbolva00

Differential Revision: https://reviews.llvm.org/D79319
2020-05-04 15:21:52 +01:00
Jay Foad e737847b8f [SLC] Allow llvm.pow(x,2.0) -> x*x etc even if no pow() lib func
optimizePow does not create any new calls to pow, so it should work
regardless of whether the pow library function is available. This allows
it to optimize the llvm.pow intrinsic on targets with no math library.

Based on a patch by Tim Renouf.

Differential Revision: https://reviews.llvm.org/D68231
2020-05-04 10:54:07 +01:00
Simon Pilgrim 8e9a8dc185 [InstCombine] Add tests showing failure to fold mul(abs(x),abs(x)) -> mul(x,x) (PR39476)
Includes abs() and nabs() variants
2020-05-04 10:24:18 +01:00
Jay Foad 6c42814a26 Precommit test updates for D68231. 2020-05-04 09:55:59 +01:00
Johannes Doerfert 95e0d28b71 [Attributor] Remember only necessary dependences
Before we eagerly put dependences into the QueryMap as soon as we
encountered them (via `Attributor::getAAFor<>` or
`Attributor::recordDependence`). Now we will wait to see if the
dependence is useful, that is if the target is not already in a fixpoint
state at the end of the update. If so, there is no need to record the
dependence at all.

Due to the abstraction via `Attributor::updateAA` we will now also treat
the very first update (during attribute creation) as we do subsequent
updates.

Finally this resolves the problematic usage of QueriedNonFixAA.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 554675 (389245/s)
temporary memory allocations: 101574 (71280/s)
peak heap memory consumption: 28.46MB
peak RSS (including heaptrack overhead): 116.26MB
total memory leaked: 269.10KB
```

After:
```
calls to allocation functions: 512465 (345559/s)
temporary memory allocations: 98832 (66643/s)
peak heap memory consumption: 22.54MB
peak RSS (including heaptrack overhead): 106.58MB
total memory leaked: 269.10KB
```

Difference:
```
calls to allocation functions: -42210 (-727758/s)
temporary memory allocations: -2742 (-47275/s)
peak heap memory consumption: -5.92MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
2020-05-03 22:01:51 -05:00
Johannes Doerfert 231026a508 [Attributor] Inititialize "value attributes" w/ must-be-executed-context info
Attributes that only depend on the value (=bit pattern) can be
initialized from uses in the must-be-executed-context (MBEC). We did use
`AAComposeTwoGenericDeduction` and `AAFromMustBeExecutedContext` before
to do this for some positions of these attributes but not for all. This
was fairly complicated and also problematic as we did run it in every
`updateImpl` call even though we only use known information. The new
implementation removes `AAComposeTwoGenericDeduction`* and
`AAFromMustBeExecutedContext` in favor of a simple interface
`AddInformation::fromMBEContext(...)` which we call from the
`initialize` methods of the "value attribute" `Impl` classes, e.g.
`AANonNullImpl:initialize`.

There can be two types of test changes:
  1) Artifacts were we miss some information that was known before a
     global fixpoint was reached and therefore available in an update
     but not at the beginning.
  2) Deduction for values we did not derive via the MBEC before or which
     were not found as the `AAFromMustBeExecutedContext::updateImpl` was
     never invoked.

* An improved version of AAComposeTwoGenericDeduction can be found in
  D78718. Once we find a new use case that implementation will be able
  to handle "generic" AAs better.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 468428 (328952/s)
temporary memory allocations: 77480 (54410/s)
peak heap memory consumption: 32.71MB
peak RSS (including heaptrack overhead): 122.46MB
total memory leaked: 269.10KB
```

After:
```
calls to allocation functions: 554720 (351310/s)
temporary memory allocations: 101650 (64376/s)
peak heap memory consumption: 28.46MB
peak RSS (including heaptrack overhead): 116.75MB
total memory leaked: 269.10KB
```

Difference:
```
calls to allocation functions: 86292 (556722/s)
temporary memory allocations: 24170 (155935/s)
peak heap memory consumption: -4.25MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D78719
2020-05-03 21:41:22 -05:00
Johannes Doerfert 2f97b8b891 [Attributor][NFC] Proactively ask for `nocapure` on call site arguments
This minimizes test noise later on and is in line with other attributes
we derive proactively.
2020-05-03 21:38:06 -05:00
Sergey Dmitriev 0f70f73308 [Attributor] Bitcast constant to the returned value type if it has different type
Reviewers: jdoerfert, sstefan1, uenoku

Reviewed By: jdoerfert

Subscribers: hiraditya, uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79277
2020-05-03 11:46:13 -07:00
Nikita Popov 46ee652c70 Revert "[InstSimplify] Remove known bits constant folding"
This reverts commit 08556afc54.

This breaks some AMDGPU tests.
2020-05-03 20:45:10 +02:00
Nikita Popov 08556afc54 [InstSimplify] Remove known bits constant folding
If SimplifyInstruction() does not succeed in simplifying the
instruction, it will compute the known bits of the instruction
in the hope that all bits are known and the instruction can be
folded to a constant. I have removed a similar optimization
from InstCombine in D75801, and would like to drop this one as well.

On average, we spend ~1% of total compile-time performing this
known bits calculation. However, if we introduce some additional
statistics for known bits computations and how many of them succeed
in simplifying the instruction we get (on test-suite):

    instsimplify.NumKnownBits: 216
    instsimplify.NumKnownBitsComputed: 13828375
    valuetracking.NumKnownBitsComputed: 45860806

Out of ~14M known bits calculations (accounting for approximately
one third of all known bits calculations), only 0.0015% succeed in
producing a constant. Those cases where we do succeed to compute
all known bits will get folded by other passes like InstCombine
later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test
show a hash difference after this change. On lencod we see an
improvement (a loop phi is optimized away), on the GCC torture
test a regression (a function return value is determined only
after IPSCCP, preventing propagation from a noinline function.)

There are various regressions in InstSimplify tests. However, all
of these cases are already handled by InstCombine, and corresponding
tests have already been added there.

Differential Revision: https://reviews.llvm.org/D79294
2020-05-03 20:26:58 +02:00
Hongtao Yu 911e06f5eb [ICP] Handling must tail calls in indirect call promotion
Per the IR convention, a musttail call must precede a ret with an optional bitcast. This was violated by the indirect call promotion optimization which could result an IR like:

    ; <label>:2192:
      br i1 %2198, label %2199, label %2201, !dbg !226012, !prof !229483

    ; <label>:2199:                                   ; preds = %2192
      musttail call fastcc void @foo(i8* %2195), !dbg !226012
      br label %2202, !dbg !226012

    ; <label>:2201:                                   ; preds = %2192
      musttail call fastcc void %2197(i8* %2195), !dbg !226012
      br label %2202, !dbg !226012

    ; <label>:2202:                                   ; preds = %605, %2201, %2199
      ret void, !dbg !229485

This is being fixed in this change where the return statement goes together with the promoted indirect call. The code generated is like:

    ; <label>:2192:
      br i1 %2198, label %2199, label %2201, !dbg !226012, !prof !229483

    ; <label>:2199:                                   ; preds = %2192
      musttail call fastcc void @foo(i8* %2195), !dbg !226012
      ret void, !dbg !229485

    ; <label>:2201:                                   ; preds = %2192
      musttail call fastcc void %2197(i8* %2195), !dbg !226012
      ret void, !dbg !229485

Differential Revision: https://reviews.llvm.org/D79258
2020-05-03 10:42:22 -07:00
Sanjay Patel 682f0b366b [InstCombine] use select-of-constants with set/clear bit mask patterns
Cond ? (X & ~C) : (X | C) --> (X & ~C) | (Cond ? 0 : C)
Cond ? (X | C) : (X & ~C) --> (X & ~C) | (Cond ? C : 0)

The select-of-constants form results in better codegen.
There's an existing test diff that shows a transform that
results in an extra IR instruction, but that's an existing
problem.

This is motivated by code seen in LLVM itself - see PR37581:
https://bugs.llvm.org/show_bug.cgi?id=37581

define i8 @src(i8 %x, i8 %C, i1 %b)  {
  %notC = xor i8 %C, -1
  %and = and i8 %x, %notC
  %or = or i8 %x, %C
  %cond = select i1 %b, i8 %or, i8 %and
  ret i8 %cond
}

define i8 @tgt(i8 %x, i8 %C, i1 %b)  {
  %notC = xor i8 %C, -1
  %and = and i8 %x, %notC
  %mul = select i1 %b, i8 %C, i8 0
  %or = or i8 %mul, %and
  ret i8 %or
}

http://volta.cs.utah.edu:8080/z/Vt2WVm

Differential Revision: https://reviews.llvm.org/D78880
2020-05-03 09:44:43 -04:00
Nikita Popov 7c649b58f0 [InstCombine] Duplicate some InstSimplify tests (NFC)
Duplicate some tests in preparation for D79294.
2020-05-03 12:49:36 +02:00
Nikita Popov 60e9ee16b4 [MergeFuncs] Don't merge shufflevectors with different masks
When the shufflevector mask operand was converted into special
instruction data, the FunctionComparator was not updated to
account for this. As such, MergeFuncs will happily merge
shufflevectors with different masks.

This fixes https://bugs.llvm.org/show_bug.cgi?id=45773.

Differential Revision: https://reviews.llvm.org/D79261
2020-05-02 10:21:14 +02:00
Sanjay Patel 7fa150203f [InstCombine] fix miscompile from multi-use cttz/ctlz transform
PR45762:
https://bugs.llvm.org/show_bug.cgi?id=45762
2020-05-01 13:52:24 -04:00
Sanjay Patel 43b0e446fb [InstCombine] add test for faulty cttz fold (PR45762); NFC 2020-05-01 13:52:23 -04:00
Simon Pilgrim 4548e62ca4 [InstCombine] Additional 'concat of ORs' BSWAP/BITREVERSE tests for D79041 2020-05-01 18:05:24 +01:00
Sanjay Patel 57f0eed98d [InstSimplify] allow insertelement-with-undef fold if poison-safe
The more general fold was not poison-safe, so it was removed:
rG5486e00
...but it is ok to have this transform if analysis can determine
the vector contains no poison. The test shows a simple example
of that: constant integer elements are not poison.
2020-05-01 10:34:29 -04:00
Sanjay Patel c79a366ec0 [InstSimplify] update test; NFC
Missed this test diff when committing:
rG5486e00dc3
2020-05-01 10:06:56 -04:00
Sanjay Patel 5486e00dc3 [InstSimplify] remove poison-unsafe insertelement of undef value
PR45481:
https://bugs.llvm.org/show_bug.cgi?id=45481

SDAG has an identical transform to this, so there's little
chance of any real-world impact. OTOH, that means we are
effectively sweeping the bug out of sight because poison
exists in codegen too.
2020-05-01 09:22:05 -04:00
Sanjay Patel 5013a788f8 [InstCombine] adjust tests for pow(); NFC
D68231 would change this, but the existing test
doesn't cover what was probably intended (a libcall test).
2020-05-01 08:42:51 -04:00
Nikita Popov b74c6d2c9d [InlineFunction] Disable emission of alignment assumptions by default
In D74183 clang started emitting alignment for sret parameters
unconditionally. This caused a 1.5% compile-time regression on
tramp3d-v4. The reason is that we now generate many instance of IR like

    %ptrint = ptrtoint %class.GuardLayers* %guards_m to i64
    %maskedptr = and i64 %ptrint, 3
    %maskcond = icmp eq i64 %maskedptr, 0
    tail call void @llvm.assume(i1 %maskcond)

to preserve the alignment information during inlining. Based on IR
analysis, these assumptions also regress optimization. The attached
phase ordering test case illustrates two issues: One are instruction
count based optimization heuristics, which are affected by the four
additional instructions of the assumption. The other is blocking of
SROA due to ptrtoint casts (PR45763).

We already encountered the same problem in Rust, where we (unlike
Clang) generally prefer to emit alignment information absolutely
everywhere it is available. We were only able to do this after
hardcoding -preserve-alignment-assumptions-during-inlining=false,
because we were seeing significant optimization and compile-time
regressions otherwise.

This patch disables -preserve-alignment-assumptions-during-inlining
by default, because we should not be punishing people for adding
more alignment annotations.

Once the assume bundle work shakes out and we can represent (and use)
alignment assumptions using assume bundles, it should be possible to
re-enable this with reduced overhead.

Differential Revision: https://reviews.llvm.org/D76886
2020-04-30 23:12:54 +02:00
Masoud Ataei b4934ae44c [VFDatabase] Testsuite for scalar functions are vector functions with VF =1
Fixing test suite of the committed PR: https://reviews.llvm.org/D78054.
I am proposing to remove the PowerPC target triple in the test suite.

Reviewed by: @jsji, @fpetrogalli

Tags: LLVM

Differential Revision: https://reviews.llvm.org/D79124
2020-04-30 15:47:21 -04:00
Kirill Naumov 0383253cdf [InlineCost] Addressing a very strict assert check in CostAnnotationWriter::emitInstructionAnnot
The assert checks that every instruction must be annotated by this point while it is not
necessary. If the inlining process was interrupted because the threshold was reached, the rest
of the instructions would not be annotated which triggers the assert.
The added test shows the situation in which it can happen.
This is a recommit as the original commit fail due to the absence of REQUIRES: assert in the test.

Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D79107
2020-04-30 15:38:36 +00:00
Sanjay Patel 4a065a72ef [InstCombine] add tests for bitcast+inselt; NFC 2020-04-30 09:11:29 -04:00
Sanjay Patel 35fe2814cf [InstCombine] update auto-generated test checks; NFC 2020-04-30 08:39:02 -04:00
Sanjay Patel 2cfeaf3b2d [InstCombine] add tests for FP->int->FP->FP casting; NFC 2020-04-30 07:41:28 -04:00
Evgeniy Brevnov 3e68a66704 [BPI][NFC] Reuse post dominantor tree from analysis manager when available
Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager.

Reviewers: skatkov, taewookoh, yrouban

Reviewed By: skatkov

Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78987
2020-04-30 11:31:03 +07:00
Kirill Naumov 0fa793e798 Revert "[InlineCost] Addressing a very strict assert check in CostAnnotationWriter::emitInstructionAnnot"
This reverts commit 66947d05fd.
2020-04-29 22:00:51 +00:00
Kirill Naumov 66947d05fd [InlineCost] Addressing a very strict assert check in CostAnnotationWriter::emitInstructionAnnot
The assert checks that every instruction must be annotated by this point while it is not
necessary. If the inlining process was interrupted because the threshold was reached, the rest
of the instructions would not be annotated which triggers the assert.
The added test shows the situation in which it can happen.

Reviewed-By: mtrofin
Diff: https://reviews.llvm.org/D79107
2020-04-29 20:44:10 +00:00
Anh Tuyen Tran c7878ad231 [VFDatabase] Scalar functions are vector functions with VF =1
Summary:
Return scalar function when VF==1. The new trivial mapping scalar --> scalar when VF==1 to prevent false positive for "isVectorizable" query.

Author: masoud.ataei (Masoud Ataei)

Reviewers: Whitney (Whitney Tsang), fhahn (Florian Hahn), pjeeva01 (Jeeva P.), fpetrogalli (Francesco Petrogalli), rengolin (Renato Golin)

Reviewed By: fpetrogalli (Francesco Petrogalli)

Subscribers: hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D78054
2020-04-29 17:20:37 +00:00
Simon Pilgrim 090cae8491 [TTI] Add DemandedElts to getScalarizationOverhead
The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited.

This patch does 2 things:
1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern.
2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs.

This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing.

A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well.

Reviewed By: @craig.topper

Differential Revision: https://reviews.llvm.org/D78216
2020-04-29 12:00:38 +01:00
Florian Hahn e89379856a Recommit "[VPlan] Add & use VPValue operands for VPWidenRecipe (NFC)."
The crash that caused the original revert has been fixed in
a3c964a278. I also added a reduced version of the crash reproducer.

This reverts the revert commit 2107af9ccf.
2020-04-29 11:40:39 +01:00
Florian Hahn e018b8bbb0 [DSE,MSSA] Add multi-path tests with readnone throwing calls. 2020-04-29 10:30:05 +01:00
Simon Pilgrim 751a554f25 [InstCombine] Add PR45715 test case 2020-04-28 21:53:59 +01:00
Roman Lebedev a0004358a8
[InstCombine] Negator: 'or' with no common bits set is just 'add'
In `InstCombiner::visitAdd()`, we have
```
  // A+B --> A|B iff A and B have no bits set in common.
  if (haveNoCommonBitsSet(LHS, RHS, DL, &AC, &I, &DT))
    return BinaryOperator::CreateOr(LHS, RHS);
```
so we should handle such `or`'s here, too.
2020-04-28 19:16:32 +03:00
Roman Lebedev a5f22f2b0e
[NFC][InstCombine] Tests for negation of 'or' with no common bits set 2020-04-28 19:16:31 +03:00
Krzysztof Parzyszek 25a4b1904c Handle part-word LL/SC in atomic expansion pass
Differential Revision: https://reviews.llvm.org/D77213
2020-04-28 10:07:39 -05:00
Sanjay Patel 7a8c226ba8 [SLP] add test for partially vectorized bswap (PR39538); NFC 2020-04-27 17:29:27 -04:00
Sanjay Patel 54fe6c9599 [InstCombine] add tests for set/clear masked bits; NFC 2020-04-27 15:55:45 -04:00
Craig Topper 5eff75d86a [X86][CostModel] Improve costs for fp_to_uint/fp_to_sint for vXi8/vXi16/v2i32 results.
Differential Revision: https://reviews.llvm.org/D78893
2020-04-27 10:35:15 -07:00
Wei Mi 10b57ca690 [ProfileSummary] Add partial profile annotation on IR.
Profile and profile summary are usually read only once and then annotated
on IR. The profile summary metadata on IR should include the value of the
newly added partial profile flag, so that compilation phase like thinlto
postlink can get the full set of profile information.

Differential Revision: https://reviews.llvm.org/D78310
2020-04-27 08:34:15 -07:00
Ayal Zaks a3c964a278 [LV] Fix recording of BranchTakenCount for FoldTail
When folding tail, branch taken count is computed during initial VPlan execution
and recorded to be used by the compare computing the loop's mask. This recording
should directly set the State, instead of reusing Value2VPValue mapping which
serves original Values present prior to vectorization.
The branch taken count may be a constant Value, which may be used elsewhere in
the loop; trying to employ Value2VPValue for both leads to the issue reported in
https://reviews.llvm.org/D76992#inline-721028

Differential Revision: https://reviews.llvm.org/D78847
2020-04-26 20:13:10 +03:00
Sanjay Patel 3f10f1a5c7 [InstCombine] updated test comments; NFC
As suggested in review for:
rG4abab5c5ca7b
2020-04-26 11:11:00 -04:00
Florian Hahn 7d57d22baa [SCCP] Support ranges for loads and stores.
Integer ranges can be used for loaded/stored values. Note that widening
can be disabled for loads/stores, as we only rely on instructions that
cause continued increases to ranges to be widened (like binary
operators).

Reviewers: efriedma, mssimpso, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D78433
2020-04-26 13:16:47 +01:00
Florian Hahn c1c5c47e64 [SCCP] Add load/store test for integer ranges. 2020-04-26 13:16:47 +01:00
Sergey Dmitriev 67aed1469b [Attributor] Do not set 'returned' attribute for arguments that cannot be bitcasted to function result
Reviewers: jdoerfert, sstefan1, uenoku

Reviewed By: jdoerfert

Subscribers: hiraditya, uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78828
2020-04-25 09:49:40 -07:00
Sanjay Patel 4abab5c5ca [InstCombine] generalize canonicalization of masked equality comparisons
(X | MaskC) == C --> (X & ~MaskC) == C ^ MaskC
  (X | MaskC) != C --> (X & ~MaskC) != C ^ MaskC

We have more analyis for 'and' patterns and already lean this way
in the existing code, so this should be neutral or better in IR.

If this does not do as well in codegen, the problem already exists
and we should fix that based on target costs/heuristics.

http://volta.cs.utah.edu:8080/z/oP3ecL

define void @src(i8 %x, i8 %OrC, i8 %C, i1* %p0, i1* %p1) {
  %or = or i8 %x, %OrC
  %eq = icmp eq i8 %or, %C
  store i1 %eq, i1* %p0

  %ne = icmp ne i8 %or, %C
  store i1 %ne, i1* %p1
  ret void
}

define void @tgt(i8 %x, i8 %OrC, i8 %C, i1* %p0, i1* %p1) {
  %NotOrC = xor i8 %OrC, -1
  %a = and i8 %x, %NotOrC
  %NewC = xor i8 %C, %OrC
  %eq = icmp eq i8 %a, %NewC
  store i1 %eq, i1* %p0

  %ne = icmp ne i8 %a, %NewC
  store i1 %ne, i1* %p1
  ret void
}
2020-04-25 11:31:57 -04:00
Florian Hahn 46a04940e8 [DSE] Add stat for remaining stores after DSE.
Using the existing NumFastStores statistic can be misleading when
comparing the impact of DSE patches.

For example, consider the case where a store gets removed from a
function before it is inlined into another function. A less
powerful DSE might only remove the store from functions it has
been inlined into, which will result in more stores being removed, but
no difference in the actual number of stores after DSE.

The new stat provides the absolute number of stores surviving after
DSE.

Reviewers: dmgreen, bryant, asbirlea, jfb

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D78830
2020-04-25 16:12:55 +01:00
Sanjay Patel 9193644f77 [InstCombine] add tests for icmp with bitmask logic op; NFC 2020-04-25 10:57:29 -04:00
Juneyoung Lee f5677fe700 [ValueTracking] Let isGuaranteedNotToBeUndefOrPoison look into more constants/instructions
Summary:
This patch helps isGuaranteedNotToBeUndefOrPoison look into more constants and instructions (bitcast/alloca/gep/fcmp).

To deal with bitcast, Depth is added to isGuaranteedNotToBeUndefOrPoison.

This patch is splitted from https://reviews.llvm.org/D75808.

Checked with Alive2

Reviewers: reames, jdoerfert

Reviewed By: jdoerfert

Subscribers: sanwou01, spatel, llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76010
2020-04-25 23:29:54 +09:00
Florian Hahn 82ce334727 [ValueLattice] Merging unknown with empty CR is unknown.
Currently an unknown/undef value is marked as overdefined when merged
with an empty range. An empty range can occur in unreachable/dead code.
When merging the new unknown state (= no value known yet) with an empty
range, there still isn't any information about the value yet and we can
stay in unknown.

This gives a few nice improvements on the number of instructions removed
by IPSCCP:
Same hash: 170 (filtered out)
Remaining: 67
Metric: sccp.IPNumInstRemoved

Program                                        base     patch    diff
 test-suite...rks/FreeBench/mason/mason.test     3.00   6.00 100.0%
 test-suite...nchmarks/McCat/18-imp/imp.test     3.00   5.00 66.7%
 test-suite...C/CFP2000/179.art/179.art.test     2.00   3.00 50.0%
 test-suite...ijndael/security-rijndael.test     2.00   3.00 50.0%
 test-suite...ks/Prolangs-C/agrep/agrep.test    40.00  58.00 45.0%
 test-suite...ce/Applications/Burg/burg.test    26.00  37.00 42.3%
 test-suite...cCat/03-testtrie/testtrie.test     3.00   4.00 33.3%
 test-suite...Source/Benchmarks/sim/sim.test    29.00  36.00 24.1%
 test-suite.../Applications/spiff/spiff.test     9.00  11.00 22.2%
 test-suite...s/FreeBench/neural/neural.test     5.00   6.00 20.0%
 test-suite...pplications/treecc/treecc.test    66.00  79.00 19.7%
 test-suite...langs-C/football/football.test    85.00 101.00 18.8%
 test-suite...ce/Benchmarks/PAQ8p/paq8p.test    90.00 105.00 16.7%
 test-suite...oxyApps-C++/miniFE/miniFE.test    37.00  43.00 16.2%
 test-suite...rks/FreeBench/pifft/pifft.test    26.00  30.00 15.4%
 test-suite...lications/sqlite3/sqlite3.test   481.00  548.00  13.9%
 test-suite...marks/7zip/7zip-benchmark.test   4875.00 5522.00 13.3%
 test-suite.../CINT2000/176.gcc/176.gcc.test   1117.00 1197.00  7.2%
 test-suite...0.perlbench/400.perlbench.test   1618.00 1732.00  7.0%

Reviewers: efriedma, nikic, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D78667
2020-04-25 13:43:34 +01:00
Tyker e5f8a77c19 [AssumeBundles] Refactor asssume builder
Summary:
refactor assume bulider for the next patch.
the assume builder now generate only one assume per attribute kind and per value they are on. to do this it takes the highest. this is desirable because currently, for all attributes the higest value is the most valuable.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78013
2020-04-25 13:43:52 +02:00
Ehud Katz 64249f177e [CodeExtractor] Fix extraction of a value used only by intrinsics outside of region
We should only skip `lifetime` and `dbg` intrinsics when searching for users.
Other intrinsics are legit users that can't be ignored.

Without this fix, the testcase would result in an invalid IR. `memcpy`
will have a reference to the, now, external value (local to the
extracted loop function).

Fix PR42194

Differential Revision: https://reviews.llvm.org/D78749
2020-04-25 11:44:47 +03:00
Craig Topper e4a9190ad7 [X86][ArgumentPromotion] Allow Argument Promotion if caller and callee disagree on 512-bit vectors support if the arguments are scalar.
If one of caller/callee has disabled ZMM registers due to
prefer-vector-width=256, we were previously
disabling argument promotion as the ABI might be incompatible since
one side will split 512-bit vectors in this case.

But if we can see that the types are all scalar this shouldn't be
a problem.

This patch assumes that pointer element type reflects the type that
the argument will be promoted to.

Differential Revision: https://reviews.llvm.org/D78770
2020-04-24 15:47:02 -07:00
Tyker 42431da895 [AssumeBundles] Use assume bundles in isKnownNonZero
Summary: Use nonnull and dereferenceable from an assume bundle in isKnownNonZero

Reviewers: jdoerfert, nikic, lebedev.ri, reames, fhahn, sstefan1

Reviewed By: jdoerfert

Subscribers: fhahn, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76149
2020-04-24 20:41:51 +02:00
Sanjay Patel 238f00f6d3 [InstCombine] regenerate test checks; NFC
Values named 'tmp' can cause problems for the auto-generated check script.
2020-04-24 13:51:13 -04:00
Mircea Trofin c3770c5d6d [llvm][NFC] Factor out inlining pipeline as a module pipeline.
Summary:
This simplifies testing in scenarios where we want to set up module-wide
analyses for inlining. The patch enables treating inlining and its
function cleanups, as a module pass. The alternative would be for tests
to describe the pipeline, which is tedious and adds maintenance
overhead.

Reviewers: davidxl, dblaikie, jdoerfert, sstefan1

Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78512
2020-04-24 09:24:12 -07:00
Sanjay Patel e4175ff525 [InstCombine] intersect FMF when reassociating FP min/max intrinsics
As discussed in PR45478:
https://bugs.llvm.org/show_bug.cgi?id=45478
...propagating FMF from the outer (second) call is not correct,
so intersect them instead.
I suspect we could do better (see TODO comment), but mismatched
FMF is probably too rare to care about.

Differential Revision: https://reviews.llvm.org/D78631
2020-04-24 12:14:03 -04:00
Max Kazantsev 9cd4debd5a [LoopVectorize] Preserve CFG analyses if CFG wasn't modified
One of transforms the loop vectorizer makes is LCSSA formation. In some cases it
is the only transform it makes. We should not drop CFG analyzes if only LCSSA was
formed and no actual CFG changes was made.

We should think of expanding this logic to other passes as well, and maybe make
it a part of PM framework.

Reviewed By: Florian Hahn
Differential Revision: https://reviews.llvm.org/D78360
2020-04-24 17:22:24 +07:00