Commit Graph

18804 Commits

Author SHA1 Message Date
Sami Tolvanen 4474958d3a ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-06-22 10:01:55 -07:00
Joseph Huber 03d7e61c87 [OpenMP] Internalize functions in OpenMPOpt to improve IPO passes
Summary:
Currently the attributor needs to give up if a function has external linkage.
This means that the optimization introduced in D97818 will only apply to static
functions. This change uses the Attributor to internalize OpenMP device
routines by making a copy of each function with private linkage and replacing
the uses in the module with it. This allows for the optimization to be applied
to any regular function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102824
2021-06-22 12:38:10 -04:00
Joseph Huber 6fc51c9f7d [OpenMP] Replace GPU globalization calls with shared memory in the middle-end
Summary:
The changes introduced in D97680 create a simpler interface to code that needs
to be globalized. This interface is used to simplify the globalization calls in
the middle end. We can check any globalization call that is only called by a
single thread in the team and replace it with a static shared memory buffer.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97818
2021-06-22 11:55:44 -04:00
Nikita Popov e790d3667e [OpaquePtr] Handle addrspacecasts in InstCombine
This adds support for addrspace casts involving opaque pointers to
InstCombine, as well as the isEliminableCastPair() helper
(otherwise the assertion failure would just move there).

Add PointerType::hasSameElementTypeAs() to hide the element type
details.

Differential Revision: https://reviews.llvm.org/D104668
2021-06-22 17:45:30 +02:00
Jingu Kang 873ff5a728 [SimpleLoopUnswich] Fixa a bug on ComputeUnswitchedCost with partial unswitch
There was a bug from cost calculation for partially invariant unswitch.

The costs of non-duplicated blocks are substracted from the total LoopCost, so
anything that is duplicated should not be counted.

Differential Revision: https://reviews.llvm.org/D103816
2021-06-22 16:18:00 +01:00
Joseph Huber 68d133a3e8 [OpenMP] Simplify GPU memory globalization
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680
2021-06-22 10:52:46 -04:00
Rosie Sumpter b2f48cc914 [SLP][AArch64] Add SLP vectorizer tests for XOR and AND reductions. NFC
These regression tests show missed SLP vectorization opportunities,
which will be fixed in a future commit (see:
https://reviews.llvm.org/D104538).

Differential Revision: https://reviews.llvm.org/D104708
2021-06-22 15:16:02 +01:00
Nikita Popov e638a290f7 [ConstantFold] Delay fetching pointer element type
Don't do this while stipping pointer casts, instead fetch it at
the end. This improves compatibility with opaque pointers for the
case where the base object is not opaque.
2021-06-22 15:51:00 +02:00
Nikita Popov 87bdde4962 [ConstantFold] Skip bitcast -> GEP transform for opaque pointers
Same as with the InstCombine transform, this is not possible for
bitcasts involving opaque pointers, as GEP preserves opaqueness.
2021-06-22 15:50:55 +02:00
Florian Hahn d17798823c
[SCEV] Retain AddExpr flags when subtracting a foldable constant.
Currently we drop wrapping flags for expressions like (A + C1)<flags> - C2.

But we can retain flags under certain conditions:

* Adding a smaller constant is NUW if the original AddExpr was NUW.

* Adding a constant with the same sign and small magnitude is NSW, if the
  original AddExpr was NSW.

This can improve results after using `SimplifyICmpOperands`, which may
subtract one in order to use stricter predicates, as is the case for
`isKnownPredicate`.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D104319
2021-06-22 11:27:51 +01:00
Max Kazantsev 4c4f1ae93e Re-land "[LoopDeletion] Handle Phis with similar inputs from different blocks"
Patch was reverted due to a bug that existed before it and was exposed
by it. Returning after the underlying bug has been fixed.

Differential Revision: https://reviews.llvm.org/D103959
2021-06-22 12:28:46 +07:00
Max Kazantsev 575253887b [LoopDeletion] Require loop to have a predecessor when executing 1st iteration symbolically
Two predecessors break the further logic, and the loop may come to the
opt in non-canonicalized state.
2021-06-22 12:20:55 +07:00
Eli Friedman 8f3d16905d [ScalarEvolution] Ensure backedge-taken counts are not pointers.
A backedge-taken count doesn't refer to memory; returning a pointer type
is nonsense. So make sure we always return an integer.

The obvious way to do this would be to just convert the operands of the
icmp to integers, but that doesn't quite work out at the moment:
isLoopEntryGuardedByCond currently gets confused by ptrtoint operations.
So we perform the ptrtoint conversion late for lt/gt operations.

The test changes are mostly innocuous. The most interesting changes are
more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to
i64)) + %ptr)". This is expected: we can't fold this to zero because we
need to preserve the pointer base.

The call to isLoopEntryGuardedByCond in howFarToZero is less precise
because of ptrtoint operations; this shows up in the function
pr46786_c26_char in ptrtoint.ll. Fixing it here would require more
complex refactoring.  It should eventually be fixed by future
improvements to isImpliedCond.

See https://bugs.llvm.org/show_bug.cgi?id=46786 for context.

Differential Revision: https://reviews.llvm.org/D103656
2021-06-21 16:24:16 -07:00
Roman Lebedev 4cf74469a0
[NFC][SimplifyCFG] Add basic test for debuginfo preservation of `ret` tail merging 2021-06-21 23:56:54 +03:00
Roman Lebedev 3e98b88797
[NFC][SimplifyCFG] Fix tests to use FileCheck instead of grep 2021-06-21 23:56:54 +03:00
Alexey Bataev c5bbc737e8 [SLP][NFC]Rename functions in the tests, NFC. 2021-06-21 13:37:12 -07:00
Nikita Popov 39796e1ad0 Reapply [InstCombine] Don't try converting opaque pointer bitcast to GEP
Reapplied without changes -- this was reverted together with an
underlying patch.

-----

Bitcasts having opaque pointer source or result type cannot be
converted into a zero-index GEP, GEP source and result types
always have the same opaque-ness.
2021-06-21 22:15:56 +02:00
Nikita Popov e2c2124a4b Reapply [InstCombine] Extract bitcast -> gep transform
Relative to the original patch, an InstCombine test has been
added to show a previously missed pattern, and the Coroutine
test that resulted in the revert has been regenerated.

-----

Move this into a separate function, to make sure that early
returns do not accidentally skip other transforms. This previously
happened for the isSized() check, which skipped folds like
distributing a bitcast over a select.
2021-06-21 22:03:15 +02:00
Nikita Popov 403792f91e [InstCombine] Add test for bitcast of unsized pointer (NFC)
The bitcast should get folded into the select, but currently isn't
due to an incorrect early bailout.
2021-06-21 22:03:15 +02:00
Nikita Popov 6922ab73a5 Revert "[InstCombine] Extract bitcast -> gep transform"
This reverts commit d9f5d7b959.
This reverts commit 5780611d7e.

This causes a failure in Coroutine tests.
2021-06-21 21:34:17 +02:00
Alexey Bataev 908b753661 [SLP]Improve vectorization of PHI instructions.
Perform better analysis when trying to vectorize PHIs.
1. Do not try to vectorize vector PHIs.
2. Do deeper analysis for more profitable nodes for the vectorization.

Before we just tried to vectorize the PHIs of the same type. Patch
improves this and tries to vectorize PHIs with incoming values which
come from the same basic block, have the same and/or alternative
opcodes.

It allows to save the compile time and provides better vectorization
results in general.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103638
2021-06-21 12:26:24 -07:00
Nikita Popov 5780611d7e [InstCombine] Don't try converting opaque pointer bitcast to GEP
Bitcasts having opaque pointer source or result type cannot be
converted into a zero-index GEP, GEP source and result types
always have the same opaque-ness.
2021-06-21 21:24:50 +02:00
Philip Reames 0c09e5bd74 Split a test for ease of auto update 2021-06-21 11:02:26 -07:00
Jacob Hegna f86d1f99b3 Remove ML inlining model artifacts.
They are not conducive to being stored in git. Instead, we autogenerate
mock model artifacts for use in tests. Production models can be
specified with the cmake flag LLVM_INLINER_MODEL_PATH.

LLVM_INLINER_MODEL_PATH has two sentinel values:
 - download, which will download the most recent compatible model.
 - autogenerate, which will autogenerate a "fake" model for testing the
 model uptake infrastructure.

Differential Revision: https://reviews.llvm.org/D104251
2021-06-21 17:38:09 +00:00
Nathan Chancellor f52666985d
Revert "[LoopDeletion] Handle Phis with similar inputs from different blocks"
This reverts commit bb1dc876eb.

This patch causes an assertion failure when building an arm64 defconfig
Linux kernel.

See https://reviews.llvm.org/D103959 for a link to the original bug
report and a reduced reproducer.
2021-06-21 10:18:55 -07:00
Rosie Sumpter 2251f33bef [SLP][AArch64] Add SLP vectorizer regression test. NFC
This test is for a missed SLP vectorizer opportunity, reported here
https://bugs.llvm.org/show_bug.cgi?id=44593. This is due to a cost
modelling issue with vector reduction intrinsics which will be
fixed in a future commit (see https://reviews.llvm.org/D104538).
2021-06-21 16:31:00 +01:00
Sanjay Patel 64b2676ca8 [InstCombine] fold ctlz/cttz-of-select with 1 or more constant arms
Building on:
4c44b02d87
...and adding handling for the extra operand in these intrinsics.

This pattern is discussed in:
https://llvm.org/PR50140
2021-06-21 11:04:12 -04:00
Sjoerd Meijer 071dbaec87 [FuncSpec] Add minsize test. NFC. 2021-06-21 15:21:09 +01:00
Florian Hahn 05bb969014
[LoopIdiom] Add test case that involves adds with flags and zero exts.
Test coverage to ensure D104319 does not introduce a regression here.
2021-06-21 12:10:58 +01:00
Nikita Popov acefe0eaaf [Mem2Reg] Regenerate test checks (NFC) 2021-06-21 11:06:28 +02:00
Nikita Popov 80e0424b2c [Mem2Reg] Use poison for unreachable cases
Use poison instead of undef for cases dealing with unreachable
code. This still leaves the more interesting case of "load from
uninitialized memory" as undef.
2021-06-21 10:54:13 +02:00
Nikita Popov 00a88a81d2 [Mem2Reg] Regenerate test checks (NFC) 2021-06-21 10:47:59 +02:00
Juneyoung Lee c038845f58 [InstCombine] Fold icmp (select c,const,arg), null if icmp arg, null can be simplified
This patch folds icmp (select c,const,arg), null if icmp arg, null can be simplified.

Resolves llvm.org/pr48975.

Reviewed By: nikic, xbolva00

Differential Revision: https://reviews.llvm.org/D96663
2021-06-21 17:39:05 +09:00
Sjoerd Meijer 342bbb7832 [FuncSpec] Don't specialise functions with NoDuplicate instructions.
getSpecializationCost was returning INT_MAX for a case when specialisation
shouldn't happen, but this wasn't properly checked if specialisation was
forced.

Differential Revision: https://reviews.llvm.org/D104461
2021-06-21 09:02:11 +01:00
Max Kazantsev 3f2ff7cc8c [Test] Add some tests showing room for optimization exploiting undef and UB 2021-06-21 13:11:46 +07:00
Max Kazantsev bb1dc876eb [LoopDeletion] Handle Phis with similar inputs from different blocks
This patch lifts the requirement to have the only incoming live block
for Phis. There can be multiple live blocks if the same value comes to
phi from all of them.

Differential Revision: https://reviews.llvm.org/D103959
Reviewed By: nikic, lebedev.ri
2021-06-21 11:37:06 +07:00
Juneyoung Lee ce192ced2b [InstCombine] Use poison constant to represent the result of unreachable instrs
This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction.

This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll .

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104602
2021-06-21 09:58:44 +09:00
Dmitri Gribenko ffa252e8ce [GCOVProfiling][test] Ensure that 'opt' drops any files in a temp directory 2021-06-20 22:48:35 +02:00
Nikita Popov 1ae266f452 [LoopUnroll] Use smallest exact trip count from any exit
This is a more general alternative/extension to D102635. Rather than
handling the special case of "header exit with non-exiting latch",
this unrolls against the smallest exact trip count from any exit.
The latch exit is no longer treated as priviledged when it comes to
full unrolling.

The motivating case is in full-unroll-one-unpredictable-exit.ll.
Here the header exit is an IV-based exit, while the latch exit is
a data comparison. This kind of loop does not get rotated, because
the latch is already exiting, and loop rotation doesn't try to
distinguish IV-based/analyzable latches.

Differential Revision: https://reviews.llvm.org/D102982
2021-06-20 20:58:26 +02:00
David Green a24b02193a [DSE] Remove stores in the same loop iteration
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.

The test case this helps is from code like this, which can come up in
certain matrix operations:
  for(i=..)
    dst[i] = 0;
    for(j=..)
      dst[i] += src[i*n+j];

After LICM, this becomes:
for(i=..)
  dst[i] = 0;
  sum = 0;
  for(j=..)
    sum += src[i*n+j];
  dst[i] = sum;

The first store is dead, and with this patch is now removed.

Differntial Revision: https://reviews.llvm.org/D100464
2021-06-20 17:03:30 +01:00
Sanjay Patel 4c44b02d87 [InstCombine] fold ctpop-of-select with 1 or more constant arms
The general pattern is mentioned in:
https://llvm.org/PR50140
...but we need to do a bit more to handle intrinsics with extra operands
like ctlz/cttz.
2021-06-20 11:28:45 -04:00
Sanjay Patel 240acb0cff [InstCombine] avoid infinite loops with select folds of constant expressions
This pair of transforms was added recently with:
8591640379

And could lead to conflicting folds:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35399
2021-06-20 09:46:25 -04:00
Roman Lebedev c5b7335dc8
[SimplifyCFG] FoldTwoEntryPHINode(): don't fold if either block has it's address taken
Same as with HoistThenElseCodeToIf() (ad87761925).
2021-06-20 12:37:14 +03:00
Roman Lebedev ad87761925
[SimplifyCFG] HoistThenElseCodeToIf(): don't hoist if either block has it's address taken
This problem is exposed by D104598, after it tail-merges `ret` in
`@test_inline_constraint_S_label`, the verifier would start complaining
`invalid operand for inline asm constraint 'S'`.

Essentially, taking address of a block is mismodelled in IR.
It should probably be an explicit instruction, a first one in block,
that isn't identical to any other instruction of the same type,
so that it can't be hoisted.
2021-06-20 12:18:15 +03:00
Juneyoung Lee 09e8c0d5aa [InstSimplify] icmp poison, X -> poison
This adds a simple transformation from icmp with poison constant to poison.
Comparing poison with something else is poison, so this is okay.

https://alive2.llvm.org/ce/z/e8iReb
https://alive2.llvm.org/ce/z/q4MurY
2021-06-20 15:39:07 +09:00
Fangrui Song 8ea2a58a2e [llvm-profdata] Make diagnostics consistent with the (no capitalization, no period) style
The format is currently inconsistent. Use the https://llvm.org/docs/CodingStandards.html#error-and-warning-messages style.

And add `error:` or `warning:` to CHECK lines wherever appropriate.
2021-06-19 14:54:25 -07:00
Sanjay Patel 328b21a338 [InstCombine][test] add tests for select-of-bit-manip; NFC 2021-06-19 12:34:32 -04:00
Liqiang Tao 671a87104b [llvm][Inliner] Add an optional PriorityInlineOrder
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D104028
2021-06-19 10:17:32 +08:00
Guozhi Wei 575ba6f425 [InstCombine] Don't transform code if DoTransform is false
In patch https://reviews.llvm.org/D72396, it doesn't check DoTransform before transforming the code, and generates wrong result for the attached test case.

Differential Revision: https://reviews.llvm.org/D104567
2021-06-18 18:01:34 -07:00
Fangrui Song 3307240f05 [InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling
On ELF, the D1003372 optimization can apply to more cases. There are two
prerequisites for making `__profd_` private:

* `__profc_` keeps `__profd_` live under compiler/linker GC
* `__profd_` is not referenced by code

The first is satisfied because all counters/data are in a section group (either
`comdat any` or `comdat noduplicates`). The second requires that the function
does not use value profiling.

Regarding the second point: `__profd_` may be referenced by other text sections
due to inlining. There will be a linker error if a prevailing text section
references the non-prevailing local symbol.

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 4.2% smaller (1-169620032/177066968).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 2.5% smaller.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103717
2021-06-18 17:01:17 -07:00