Commit Graph

27868 Commits

Author SHA1 Message Date
Sanjay Patel 153da08a6c [InstCombine] hoist min/max intrinsics above select with constant op
This is an extension of the handling for unary intrinsics and
follows the logic that we use for binary ops.

We don't canonicalize to min/max intrinsics yet, but this might
help unlock other folds seen in D98152.
2021-06-27 10:02:23 -04:00
Nikita Popov 81fcdae68c [MemCpyOpt] Support opaque pointers 2021-06-27 15:52:38 +02:00
Nikita Popov a9129f8964 [LoadStoreVectorizer] Support opaque pointers
There are remaining redundant bitcasts.
2021-06-27 15:42:16 +02:00
Florian Hahn f1a6430272
[VPlan] Track both incoming values for first-order recurrence phis.
This patch updates VPWidenPHI recipes for first-order recurrences to
also track the incoming value from the back-edge. Similar to D99294,
which did the same for reductions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104197
2021-06-27 14:29:35 +01:00
Florian Hahn 7f36981977
[LV] Adjust trip count based on IsOrdered in widenPHIInstruction (NFC).
Suggested in D104197, avoids the early exit.
2021-06-26 13:13:25 +01:00
Andrew Browne 45f6d5522f [DFSan] Change shadow and origin memory layouts to match MSan.
Previously on x86_64:

  +--------------------+ 0x800000000000 (top of memory)
  | application memory |
  +--------------------+ 0x700000008000 (kAppAddr)
  |                    |
  |       unused       |
  |                    |
  +--------------------+ 0x300000000000 (kUnusedAddr)
  |       origin       |
  +--------------------+ 0x200000008000 (kOriginAddr)
  |       unused       |
  +--------------------+ 0x200000000000
  |   shadow memory    |
  +--------------------+ 0x100000008000 (kShadowAddr)
  |       unused       |
  +--------------------+ 0x000000010000
  | reserved by kernel |
  +--------------------+ 0x000000000000

  MEM_TO_SHADOW(mem) = mem & ~0x600000000000
  SHADOW_TO_ORIGIN(shadow) = kOriginAddr - kShadowAddr + shadow

Now for x86_64:

  +--------------------+ 0x800000000000 (top of memory)
  |    application 3   |
  +--------------------+ 0x700000000000
  |      invalid       |
  +--------------------+ 0x610000000000
  |      origin 1      |
  +--------------------+ 0x600000000000
  |    application 2   |
  +--------------------+ 0x510000000000
  |      shadow 1      |
  +--------------------+ 0x500000000000
  |      invalid       |
  +--------------------+ 0x400000000000
  |      origin 3      |
  +--------------------+ 0x300000000000
  |      shadow 3      |
  +--------------------+ 0x200000000000
  |      origin 2      |
  +--------------------+ 0x110000000000
  |      invalid       |
  +--------------------+ 0x100000000000
  |      shadow 2      |
  +--------------------+ 0x010000000000
  |    application 1   |
  +--------------------+ 0x000000000000

  MEM_TO_SHADOW(mem) = mem ^ 0x500000000000
  SHADOW_TO_ORIGIN(shadow) = shadow + 0x100000000000

Reviewed By: stephan.yichao.zhao, gbalats

Differential Revision: https://reviews.llvm.org/D104896
2021-06-25 17:00:38 -07:00
Nikita Popov fdd4c199a1 Revert "[InstCombine] Make indexed compare fold opaque ptr compatible"
This reverts commit 5cb20ef8a2.

Assertion failures with this patch were reported on
https://reviews.llvm.org/rG5cb20ef8a235, revert for now.
2021-06-26 00:32:59 +02:00
Eli Friedman 8d5bf0709d [NFC] Prefer ConstantRange::makeExactICmpRegion over makeAllowedICmpRegion
The implementation is identical, but it makes the semantics a bit more
obvious.
2021-06-25 14:43:13 -07:00
Juneyoung Lee 1605593440 [SimplifyLibCalls] Fix memchr opt to use CreateLogicalAnd
This fixes a bug at LibCallSimplifier::optimizeMemChr which does the following transformation:

```
// memchr("\r\n", C, 2) != nullptr -> (1 << C & ((1 << '\r') | (1 << '\n')))
// != 0
//   after bounds check.
```

As written above, a bounds check on C (whether it is less than integer bitwidth) is done before doing `1 << C` otherwise 1 << C will overflow.
If the bounds check is false, the result of (1 << C & ...) must not be used at all, otherwise the result of shift (which is poison) will contaminate the whole results.
A correct way to encode this is `select i1 (bounds check), (1 << C & ...), false`  because select does not allow the unused operand to contaminate the result.
However, this optimization was introducing `and (bounds check), (1 << C & ...)` which cannot do that.

The bug was found from compilation of this C++ code: https://reviews.llvm.org/rG2fd3037ac615#1007197

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104901
2021-06-26 05:59:35 +09:00
Joseph Huber 5ccb7424fa [OpenMP] Change OpenMPOpt to check openmp metadata
The metadata added in D102361 introduces a module flag that we can check
to determine if the module was compiled with `-fopenmp` enables. We can
now check for the precense of this instead of scanning the call graph
for OpenMP runtime functions.

Depends on D102361

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102423
2021-06-25 16:34:22 -04:00
Hongtao Yu 3638085ff0 [Coroutines] Define __coro_frame_ty in function scope
Types should be defined in function scope instead of a local lexical scope. Field types should be defined inside in its parent type scope.

We were seeing a type defined in a local scope causing trouble to the dwarf emitter where a context is required to be a funciton scope, a namespace or a global scope.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D104937
2021-06-25 12:33:20 -07:00
Florian Hahn cc5ee857f9
[LV] Doxygenize VectorizationFactor member comments (NFC).
Minor cleanup for follow-up patch.
2021-06-25 18:35:00 +01:00
Philip Reames 2cd23eb243 [instcombine] Fold overflow check using umulo to comparison
If we have a umul.with.overflow where the multiply result is not used and one of the operands is a constant, we can perform the overflow check cheaper with a comparison then by performing the multiply and extracting the overflow flag.

(Noticed when looking at the conditions SCEV emits for overflow checks.)

Differential Revision: https://reviews.llvm.org/D104665
2021-06-25 10:25:45 -07:00
Florian Hahn 91053e327c
[LV] Reflow comment for VectorizationCostTy (NFC). 2021-06-25 14:20:06 +01:00
Arthur Eubanks 1aa02b37e7 Revert "[BuildLibCalls/SimplifyLibCalls] Fix attributes on created CallInst instructions."
This reverts commit 1eda5453f2.

Causes https://crbug.com/1223647:
Incompatible argument and return types for 'returned' attribute
  tail call void @llvm.memset.p0i8.i64(i8* noalias noundef returned writeonly align 1 dereferenceable(255) %arraydecay, i8 0, i64 255, i1 false), !dbg !985
2021-06-24 19:24:34 -07:00
Nikita Popov 5cb20ef8a2 [InstCombine] Make indexed compare fold opaque ptr compatible
Rather than relying on pointer type equality (which, for a change,
is silently incorrect with opaque pointers) check that the GEP
source element types match.
2021-06-24 22:33:01 +02:00
Arthur Eubanks 7110510eca [WPD] Don't optimize calls more than once
WPD currently assumes that there is a one to one correspondence between
type test assume sequences and virtual calls. However, with
-fstrict-vtable-pointers this may not be true. This ends up causing
crashes when we try to optimize a virtual call more than once (
applyUniformRetValOpt()/applyUniqueRetValOpt()/applyVirtualConstProp()/applySingleImplDevirt()).

applySingleImplDevirt() actually didn't previous crash because it would
replace the devirtualized call with the same direct call. Adding an
assert that the call is indirect causes the corresponding test to crash
with the rest of the patch.

This makes Chrome successfully build with -fstrict-vtable-pointers + WPD.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D104798
2021-06-24 13:28:09 -07:00
Nikita Popov 8e0ff44bf8 [InstCombine] Make varargs cast transform compatible with opaque ptrs
The whole transform can be dropped once we have fully transitioned
to opaque pointers (as it's purpose is to remove no-op pointer
casts). For now, make sure that it handles opaque pointers correctly.
2021-06-24 21:57:05 +02:00
Jonas Paulsson 1eda5453f2 [BuildLibCalls/SimplifyLibCalls] Fix attributes on created CallInst instructions.
- When emitting libcalls, do not only pass the calling convention from the
  function prototype but also the attributes.

- Do not pass attributes from e.g. libc memcpy to llvm.memcpy.

Review: Reid Kleckner, Eli Friedman, Arthur Eubanks

Differential Revision: https://reviews.llvm.org/D103992
2021-06-24 14:47:24 -05:00
Roman Lebedev d064182612
[SimplifyCFG] Tail-merging all blocks with `resume` terminator
Similar to what we already do for `ret` terminators.
As noted by @rnk, clang seems to already generate a single `ret`/`resume`,
so this isn't likely to cause widespread changes.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104849
2021-06-24 21:25:06 +03:00
Florian Hahn 833bdbe93c
[LV] Support sinking recipe in replicate region after another region.
This patch handles sinking a replicate region after another replicate
region. In that case, we can connect the sink region after the target
region. This properly handles the case for which an assertion has been
added in 337d765282.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34842.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D103514
2021-06-24 13:58:42 +01:00
Stephen Tozer adace79652 [DebugInfo] Enable variadic debug value salvaging
This patch enables the salvaging of debug values that may be calculated
from more than one SSA value, such as with binary operators that do not
use a constant argument. The actual functionality for this behaviour is
added in a previous commit (c7270567), but with the ability to actually
emit the resulting debug values switched off.

The reason for this is that the prior patch has been reverted several
times due to issues discovered downstream, some time after the actual
landing of the patch. The patch in question is rather large and touches
several widely used header files, and all issues discovered are more
related to the handling of variadic debug values as a whole rather than
the details of the patch itself. Therefore, to minimize the build time
impact and risk of conflicts involved in any potential future
revert/reapply of that patch, this significantly smaller patch (that
touches no header files) will instead be used as the capstone to enable
variadic debug value salvaging.

The review linked to this patch is mostly implemented by the previous
commit, c7270567, but also contains the changes in this patch.

Differential Revision: https://reviews.llvm.org/D91722
2021-06-24 13:16:29 +01:00
Roman Lebedev 9c4c2f2472
[SimplifyCFG] Tail-merging all blocks with `ret` terminator
Based ontop of D104598, which is a NFCI-ish refactoring.
Here, a restriction, that only empty blocks can be merged, is lifted.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104597
2021-06-24 13:15:39 +03:00
Stephen Tozer c72705678c Partial Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This is a partial reapply of the original commit and the followup commit
that were previously reverted; this reapply also includes a small fix
for a potential source of non-determinism, but also has a small change
to turn off variadic debug value salvaging, to ensure that any future
revert/reapply steps to disable and renable this feature do not risk
causing conflicts.

Differential Revision: https://reviews.llvm.org/D91722

This reverts commit 386b66b2fc.
2021-06-24 09:46:38 +01:00
Zequan Wu 9393894331 Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This casues compiler crash: Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"'

This reverts commit e3d24b45b8.
2021-06-23 19:24:56 -07:00
Evgenii Stepanov 78f7e6d8d7 [hwasan] Respect llvm.asan.globals.
This enable no_sanitize C++ attribute to exclude globals from hwasan
testing, and automatically excludes other sanitizers' globals (such as
ubsan location descriptors).

Differential Revision: https://reviews.llvm.org/D104825
2021-06-23 18:37:00 -07:00
Nikita Popov 8321335fd8 [InstCombine] Use getFunctionType()
Avoid fetching pointer element type...
2021-06-23 20:28:34 +02:00
Sami Tolvanen e3d24b45b8 ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

This relands commit 4474958d3a
with a fix to a use-of-uninitialized-value error that tripped
MemorySanitizer.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-06-23 10:56:13 -07:00
Kuter Dinel 5d44d56f7d [Attributor] Derive AAFunctionReachability attribute.
This attribute uses Attributor's internal 'optimistic' call graph
information to answer queries about function call reachability.

Functions can become reachable over time as new call edges are
discovered.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104599
2021-06-23 20:43:10 +03:00
Nikita Popov 00d3f7cc3c [LAA] Make getPointersDiff() API compatible with opaque pointers
Make getPointersDiff() and sortPtrAccesses() compatible with opaque
pointers by explicitly passing in the element type instead of
determining it from the pointer element type.

The SLPVectorizer result is slightly non-optimal in that unnecessary
pointer bitcasts are added.

Differential Revision: https://reviews.llvm.org/D104784
2021-06-23 18:44:34 +02:00
Datta Nagraj ad0085d338 [InstCombine] Eliminate casts to optimize ctlz operation
If a ctlz operation is performed on higher datatype and then
downcasted, then this can be optimized by doing a ctlz operation
on a lower datatype and adding the difference bitsize to the result
of ctlz to provide the same output:

https://alive2.llvm.org/ce/z/8uup9M

The original problem is shown in
https://llvm.org/PR50173

Differential Revision: https://reviews.llvm.org/D103788
2021-06-23 11:19:12 -04:00
Sanjay Patel 1e9b6b89a7 [InstCombine] convert FP min/max with negated op to fabs
This is part of improving floating-point patterns seen in:
https://llvm.org/PR39480

We don't require any FMF because the 2 potential corner cases
(-0.0 and NaN) are correctly handled without FMF:
1. -0.0 is treated as strictly less than +0.0 with
   maximum/minimum, so fabs/fneg work as expected.
2. +/- 0.0 with maxnum/minnum is indeterminate, so
   transforming to fabs/fneg is more defined.
3. The sign of a NaN may be altered by this transform,
   but that is allowed in the default FP environment.

If there are FMF, they are propagated from the min/max call to
one or both new operands which seems to agree with Alive2:
https://alive2.llvm.org/ce/z/bem_xC
2021-06-23 10:41:39 -04:00
Roman Lebedev ff4b1d379f
[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging
This changes the approach taken to tail-merge the blocks
to always create a new block instead of trying to reuse some block,
and generalizes it to support dealing not with just the `ret` in the future.

This effectively lifts the CallBr restriction, although this isn't really intentional.
That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it.

Other restrictions of the transform remain.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104598
2021-06-23 14:33:18 +03:00
Joe Ellis 3c4dbf6ea9 [Verifier] Fail on overrunning and invalid indices for {insert,extract} vector intrinsics
With regards to overrunning, the langref (llvm/docs/LangRef.rst)
specifies:

   (llvm.experimental.vector.insert)
   Elements ``idx`` through (``idx`` + num_elements(``subvec``) - 1)
   must be valid ``vec`` indices. If this condition cannot be determined
   statically but is false at runtime, then the result vector is
   undefined.

   (llvm.experimental.vector.extract)
   Elements ``idx`` through (``idx`` + num_elements(result_type) - 1)
   must be valid vector indices. If this condition cannot be determined
   statically but is false at runtime, then the result vector is
   undefined.

For the non-mixed cases (e.g. inserting/extracting a scalable into/from
another scalable, or inserting/extracting a fixed into/from another
fixed), it is possible to statically check whether or not the above
conditions are met. This was previously missing from the verifier, and
if the conditions were found to be false, the result of the
insertion/extraction would be replaced with an undef.

With regards to invalid indices, the langref (llvm/docs/LangRef.rst)
specifies:

    (llvm.experimental.vector.insert)
    ``idx`` represents the starting element number at which ``subvec``
    will be inserted. ``idx`` must be a constant multiple of
    ``subvec``'s known minimum vector length.

    (llvm.experimental.vector.extract)
    The ``idx`` specifies the starting element number within ``vec``
    from which a subvector is extracted. ``idx`` must be a constant
    multiple of the known-minimum vector length of the result type.

Similarly, these conditions were not previously enforced in the
verifier. In some circumstances, invalid indices were permitted
silently, and in other circumstances, an undef was spawned where a
verifier error would have been preferred.

This commit adds verifier checks to enforce the constraints above.

Differential Revision: https://reviews.llvm.org/D104468
2021-06-23 10:33:22 +00:00
Max Kazantsev 842b4c83cb [LoopDeletion] Exploit undef Phi inputs when symbolically executing 1st iteration
Follow-up on Roman's idea expressed in D103959.
- If a Phi has undefined inputs from live blocks:
   - and no other inputs, assume it is undef itself;
   - and exactly one non-undef input, we can assume that all undefs are equal to this input.

Differential Revision: https://reviews.llvm.org/D104618
Reviewed By: lebedev.ri, nikic
2021-06-23 11:53:48 +07:00
Max Kazantsev b7d2c173eb [LSR] Filter out zero factors. PR50765
Zero factor leads to division by zero and failure of corresponding
assert as shown in PR50765. We should filter out such factors.

Differential Revision: https://reviews.llvm.org/D104702
Reviewed By: huihuiz, reames
2021-06-23 10:43:06 +07:00
Jon Roelofs 493d6928fe [Remarks] Make memsize remarks report as an analysis, not a missed opportunity.
Differential revision: https://reviews.llvm.org/D104078
2021-06-22 18:22:47 -07:00
Liqiang Tao a0d96fdd3a [llvm][Inliner] Make PriorityInlineOrder lazily updated
This patch makes PriorityInlineOrder lazily updated.
The PriorityInlineOrder would lazily update the desirability of a call site if it's decreasing.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D104654
2021-06-23 08:59:53 +08:00
Joseph Huber 1cfdcae653 [Attributor] Fix AAExecutionDomain returning true on invalid states
This patch fixes a problem with the AAExecutionDomain attributor not
checking if it is in a valid state. This can cause it to incorrectly
return that a block is executed in a single threaded context after the
attributor failed for any reason.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103186
2021-06-22 18:12:43 -04:00
Joseph Huber 44feacc736 [OpenMP] Change remaining globalization from an analysis remark to missed
After landing the globalization optimizations, the precense of globalization on
the device that was not put in shared or stack memory is a failed optimization
with performance consequences so it should indicate a missed remark.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104735
2021-06-22 16:52:06 -04:00
Nikita Popov 7bb7fa12e7 [OpaquePtr] Support changing load type in InstCombine
When the load type is changed to ptr, we need the load pointer type
to also be ptr, because it's not allowed to create a pointer to an
opaque pointer. This is achieved by adjusting the getPointerTo() API
to return an opaque pointer for an opaque pointer base type.

Differential Revision: https://reviews.llvm.org/D104718
2021-06-22 21:16:15 +02:00
Sami Tolvanen 33c9438f11 Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This reverts commit 4474958d3a.

Breaks check-llvm on Mac.
2021-06-22 12:10:58 -07:00
Joseph Huber ca1560da72 [OpenMP][NFC] Add new optimizations to OpenMPOpt comment header
Summary:
Adds mentions to the new globalization optimizations added to the OpenMPOpt comment header.
2021-06-22 14:40:31 -04:00
Joseph Huber b54ccab509 [Attributor] Add an option to increase the max number of iterations
Right now the Attributor defaults to 32 fixed point iterations unless it is set
explicitly by a command line flag. This patch allows this to be configured when
the attributor instance is created. The maximum is then increased in OpenMPOpt
if the target is a kernel. This is because the globalization analysis can result
in larger iteration counts due to many dependent instances running at once.

Depends on D102444

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104416
2021-06-22 14:38:25 -04:00
Sanjay Patel b1f6ef92ec [InstCombine] reduce code duplication for FP min/max with casts fold; NFC 2021-06-22 14:15:04 -04:00
Joseph Huber 30e36c9b3c [Attributor] Add interface to emit remarks in Attributor
Summary:
This patch adds support for the Attributor to emit remarks on behalf of some
other pass. The attributor can now optionally take a callback function that
returns an OptimizationRemarkEmitter object when given a Function pointer. If
this is availible then a remark will be emitted for the corresponding pass
name.

Depends on D102197

Reviewed By: sstefan1 thegameg

Differential Revision: https://reviews.llvm.org/D102444
2021-06-22 14:12:46 -04:00
Joseph Huber 7d69da71dd [OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls
Summary:
The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087.

Depends on D97818

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102197
2021-06-22 13:23:05 -04:00
Sami Tolvanen 4474958d3a ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-06-22 10:01:55 -07:00
Joseph Huber 03d7e61c87 [OpenMP] Internalize functions in OpenMPOpt to improve IPO passes
Summary:
Currently the attributor needs to give up if a function has external linkage.
This means that the optimization introduced in D97818 will only apply to static
functions. This change uses the Attributor to internalize OpenMP device
routines by making a copy of each function with private linkage and replacing
the uses in the module with it. This allows for the optimization to be applied
to any regular function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102824
2021-06-22 12:38:10 -04:00
Joseph Huber 6fc51c9f7d [OpenMP] Replace GPU globalization calls with shared memory in the middle-end
Summary:
The changes introduced in D97680 create a simpler interface to code that needs
to be globalized. This interface is used to simplify the globalization calls in
the middle end. We can check any globalization call that is only called by a
single thread in the team and replace it with a static shared memory buffer.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97818
2021-06-22 11:55:44 -04:00
Nikita Popov e790d3667e [OpaquePtr] Handle addrspacecasts in InstCombine
This adds support for addrspace casts involving opaque pointers to
InstCombine, as well as the isEliminableCastPair() helper
(otherwise the assertion failure would just move there).

Add PointerType::hasSameElementTypeAs() to hide the element type
details.

Differential Revision: https://reviews.llvm.org/D104668
2021-06-22 17:45:30 +02:00
Jingu Kang 873ff5a728 [SimpleLoopUnswich] Fixa a bug on ComputeUnswitchedCost with partial unswitch
There was a bug from cost calculation for partially invariant unswitch.

The costs of non-duplicated blocks are substracted from the total LoopCost, so
anything that is duplicated should not be counted.

Differential Revision: https://reviews.llvm.org/D103816
2021-06-22 16:18:00 +01:00
Joseph Huber 68d133a3e8 [OpenMP] Simplify GPU memory globalization
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680
2021-06-22 10:52:46 -04:00
Max Kazantsev 4c4f1ae93e Re-land "[LoopDeletion] Handle Phis with similar inputs from different blocks"
Patch was reverted due to a bug that existed before it and was exposed
by it. Returning after the underlying bug has been fixed.

Differential Revision: https://reviews.llvm.org/D103959
2021-06-22 12:28:46 +07:00
Max Kazantsev 575253887b [LoopDeletion] Require loop to have a predecessor when executing 1st iteration symbolically
Two predecessors break the further logic, and the loop may come to the
opt in non-canonicalized state.
2021-06-22 12:20:55 +07:00
Nikita Popov 39796e1ad0 Reapply [InstCombine] Don't try converting opaque pointer bitcast to GEP
Reapplied without changes -- this was reverted together with an
underlying patch.

-----

Bitcasts having opaque pointer source or result type cannot be
converted into a zero-index GEP, GEP source and result types
always have the same opaque-ness.
2021-06-21 22:15:56 +02:00
Nikita Popov e2c2124a4b Reapply [InstCombine] Extract bitcast -> gep transform
Relative to the original patch, an InstCombine test has been
added to show a previously missed pattern, and the Coroutine
test that resulted in the revert has been regenerated.

-----

Move this into a separate function, to make sure that early
returns do not accidentally skip other transforms. This previously
happened for the isSized() check, which skipped folds like
distributing a bitcast over a select.
2021-06-21 22:03:15 +02:00
Nikita Popov 6922ab73a5 Revert "[InstCombine] Extract bitcast -> gep transform"
This reverts commit d9f5d7b959.
This reverts commit 5780611d7e.

This causes a failure in Coroutine tests.
2021-06-21 21:34:17 +02:00
Nikita Popov 862313cf59 [LoopUnroll] Don't modify TripCount/TripMultiple in computeUnrollCount() (NFCI)
As these are no longer passed to UnrollLoop(), there is no need to
modify them in computeUnrollCount(). Make them non-reference parameters.

Differential Revision: https://reviews.llvm.org/D104590
2021-06-21 21:34:17 +02:00
Alexey Bataev 908b753661 [SLP]Improve vectorization of PHI instructions.
Perform better analysis when trying to vectorize PHIs.
1. Do not try to vectorize vector PHIs.
2. Do deeper analysis for more profitable nodes for the vectorization.

Before we just tried to vectorize the PHIs of the same type. Patch
improves this and tries to vectorize PHIs with incoming values which
come from the same basic block, have the same and/or alternative
opcodes.

It allows to save the compile time and provides better vectorization
results in general.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103638
2021-06-21 12:26:24 -07:00
Nikita Popov 5780611d7e [InstCombine] Don't try converting opaque pointer bitcast to GEP
Bitcasts having opaque pointer source or result type cannot be
converted into a zero-index GEP, GEP source and result types
always have the same opaque-ness.
2021-06-21 21:24:50 +02:00
Nikita Popov d9f5d7b959 [InstCombine] Extract bitcast -> gep transform
Move this into a separate function, to make sure that early
returns do not accidentally skip other transforms. There is
already one isSized() check that could run into this issue,
thus this change is not strictly NFC.
2021-06-21 21:24:50 +02:00
Nikita Popov a969bdc56f [InstCombine] Remove unnecessary addres space check (NFC)
It's not possible to bitcast between different address spaces,
and this is ensured by the IR verifier. As such, this bitcast to
addrspacecast canonicalization can never be hit.
2021-06-21 20:11:39 +02:00
Nathan Chancellor f52666985d
Revert "[LoopDeletion] Handle Phis with similar inputs from different blocks"
This reverts commit bb1dc876eb.

This patch causes an assertion failure when building an arm64 defconfig
Linux kernel.

See https://reviews.llvm.org/D103959 for a link to the original bug
report and a reduced reproducer.
2021-06-21 10:18:55 -07:00
Sanjay Patel 198b79caae [InstCombine] move bitmanipulation-of-select folds
This is no outwardly-visible-difference-intended,
but it is obviously better to have all transforms
for an intrinsic housed together since we already
have helper functions in place.

It is also potentially more efficient to zap a
simple pattern match before trying to do expensive
computeKnownBits() calls.
2021-06-21 11:32:16 -04:00
Sanjay Patel 64b2676ca8 [InstCombine] fold ctlz/cttz-of-select with 1 or more constant arms
Building on:
4c44b02d87
...and adding handling for the extra operand in these intrinsics.

This pattern is discussed in:
https://llvm.org/PR50140
2021-06-21 11:04:12 -04:00
Nikita Popov 80e0424b2c [Mem2Reg] Use poison for unreachable cases
Use poison instead of undef for cases dealing with unreachable
code. This still leaves the more interesting case of "load from
uninitialized memory" as undef.
2021-06-21 10:54:13 +02:00
Juneyoung Lee c038845f58 [InstCombine] Fold icmp (select c,const,arg), null if icmp arg, null can be simplified
This patch folds icmp (select c,const,arg), null if icmp arg, null can be simplified.

Resolves llvm.org/pr48975.

Reviewed By: nikic, xbolva00

Differential Revision: https://reviews.llvm.org/D96663
2021-06-21 17:39:05 +09:00
Sjoerd Meijer 342bbb7832 [FuncSpec] Don't specialise functions with NoDuplicate instructions.
getSpecializationCost was returning INT_MAX for a case when specialisation
shouldn't happen, but this wasn't properly checked if specialisation was
forced.

Differential Revision: https://reviews.llvm.org/D104461
2021-06-21 09:02:11 +01:00
Max Kazantsev bb1dc876eb [LoopDeletion] Handle Phis with similar inputs from different blocks
This patch lifts the requirement to have the only incoming live block
for Phis. There can be multiple live blocks if the same value comes to
phi from all of them.

Differential Revision: https://reviews.llvm.org/D103959
Reviewed By: nikic, lebedev.ri
2021-06-21 11:37:06 +07:00
Juneyoung Lee ce192ced2b [InstCombine] Use poison constant to represent the result of unreachable instrs
This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction.

This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll .

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104602
2021-06-21 09:58:44 +09:00
Nikita Popov 1ae266f452 [LoopUnroll] Use smallest exact trip count from any exit
This is a more general alternative/extension to D102635. Rather than
handling the special case of "header exit with non-exiting latch",
this unrolls against the smallest exact trip count from any exit.
The latch exit is no longer treated as priviledged when it comes to
full unrolling.

The motivating case is in full-unroll-one-unpredictable-exit.ll.
Here the header exit is an IV-based exit, while the latch exit is
a data comparison. This kind of loop does not get rotated, because
the latch is already exiting, and loop rotation doesn't try to
distinguish IV-based/analyzable latches.

Differential Revision: https://reviews.llvm.org/D102982
2021-06-20 20:58:26 +02:00
David Green a24b02193a [DSE] Remove stores in the same loop iteration
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.

The test case this helps is from code like this, which can come up in
certain matrix operations:
  for(i=..)
    dst[i] = 0;
    for(j=..)
      dst[i] += src[i*n+j];

After LICM, this becomes:
for(i=..)
  dst[i] = 0;
  sum = 0;
  for(j=..)
    sum += src[i*n+j];
  dst[i] = sum;

The first store is dead, and with this patch is now removed.

Differntial Revision: https://reviews.llvm.org/D100464
2021-06-20 17:03:30 +01:00
Sanjay Patel 4c44b02d87 [InstCombine] fold ctpop-of-select with 1 or more constant arms
The general pattern is mentioned in:
https://llvm.org/PR50140
...but we need to do a bit more to handle intrinsics with extra operands
like ctlz/cttz.
2021-06-20 11:28:45 -04:00
Sanjay Patel 240acb0cff [InstCombine] avoid infinite loops with select folds of constant expressions
This pair of transforms was added recently with:
8591640379

And could lead to conflicting folds:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35399
2021-06-20 09:46:25 -04:00
Roman Lebedev c5b7335dc8
[SimplifyCFG] FoldTwoEntryPHINode(): don't fold if either block has it's address taken
Same as with HoistThenElseCodeToIf() (ad87761925).
2021-06-20 12:37:14 +03:00
Roman Lebedev ad87761925
[SimplifyCFG] HoistThenElseCodeToIf(): don't hoist if either block has it's address taken
This problem is exposed by D104598, after it tail-merges `ret` in
`@test_inline_constraint_S_label`, the verifier would start complaining
`invalid operand for inline asm constraint 'S'`.

Essentially, taking address of a block is mismodelled in IR.
It should probably be an explicit instruction, a first one in block,
that isn't identical to any other instruction of the same type,
so that it can't be hoisted.
2021-06-20 12:18:15 +03:00
Nikita Popov 1bd4085e0b [LoopUnroll] Push runtime unrolling decision up into tryToUnrollLoop()
Currently, UnrollLoop() is passed an AllowRuntime flag and decides
itself whether runtime unrolling should be used or not. This patch
pushes the decision into the caller and allows us to eliminate the
ULO.TripCount and ULO.TripMultiple parameters.

Differential Revision: https://reviews.llvm.org/D104487
2021-06-19 09:25:57 +02:00
Liqiang Tao 671a87104b [llvm][Inliner] Add an optional PriorityInlineOrder
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D104028
2021-06-19 10:17:32 +08:00
Guozhi Wei 575ba6f425 [InstCombine] Don't transform code if DoTransform is false
In patch https://reviews.llvm.org/D72396, it doesn't check DoTransform before transforming the code, and generates wrong result for the attached test case.

Differential Revision: https://reviews.llvm.org/D104567
2021-06-18 18:01:34 -07:00
Fangrui Song 3307240f05 [InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling
On ELF, the D1003372 optimization can apply to more cases. There are two
prerequisites for making `__profd_` private:

* `__profc_` keeps `__profd_` live under compiler/linker GC
* `__profd_` is not referenced by code

The first is satisfied because all counters/data are in a section group (either
`comdat any` or `comdat noduplicates`). The second requires that the function
does not use value profiling.

Regarding the second point: `__profd_` may be referenced by other text sections
due to inlining. There will be a linker error if a prevailing text section
references the non-prevailing local symbol.

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 4.2% smaller (1-169620032/177066968).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 2.5% smaller.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103717
2021-06-18 17:01:17 -07:00
Hongtao Yu bd52495518 [CSSPGO] Undoing the concept of dangling pseudo probe
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477
2021-06-18 15:14:11 -07:00
Nikita Popov 3308205ae9 [LoopUnroll] Simplify optimization remarks
Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and
debug code. For debug code, print information about all exits.
For optimization remarks, only include the unroll count and the
type of unroll (complete, partial or runtime), but omit detailed
information about exit folding, now that more than one exit may
be folded.

Differential Revision: https://reviews.llvm.org/D104482
2021-06-18 23:47:03 +02:00
Nick Desaulniers bef2992861 [GCOVProfiling] don't profile Fn's w/ noprofile attribute
Similar to D104475, the Linux kernel would like to avoid compiler
generated code in certain functions. The no_profile function
attribute can be used in C to generate the the noprofile fn attr in IR.
Respect that from GCOVProfiling.

Link: https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D104257
2021-06-18 13:58:34 -07:00
Andrew Browne 14407332de [DFSan] Cleanup code for platforms other than Linux x86_64.
These other platforms are unsupported and untested.
They could be re-added later based on MSan code.

Reviewed By: gbalats, stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D104481
2021-06-18 11:21:46 -07:00
Liqiang Tao 93183a41b9 Revert D104028 "[llvm][Inliner] Add an optional PriorityInlineOrder" 2021-06-18 18:52:00 +08:00
Max Kazantsev de92287cf8 [LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration (try 3)
This patch handles one particular case of one-iteration loops for which SCEV
cannot straightforwardly prove BECount = 1. The idea of the optimization is to
symbolically execute conditional branches on the 1st iteration, moving in topoligical
order, and only visiting blocks that may be reached on the first iteration. If we find out
that we never reach header via the latch, then the backedge can be broken.

This implementation uses InstSimplify. SCEV version was rejected due to high
compile time impact.

Differential Revision: https://reviews.llvm.org/D102615
Reviewed By: nikic
2021-06-18 17:31:57 +07:00
Haojian Wu 3f5d53a525 [Attributor] Fix UB behavior on uninitalized bool variables.
Found by ASAN.
2021-06-18 11:49:42 +02:00
Daniil Seredkin 6643e51d79 [InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)
InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case.

Differential Revision: https://reviews.llvm.org/D104193
2021-06-18 16:28:06 +07:00
Liqiang Tao a740b707d1 [llvm][Inliner] Add an optional PriorityInlineOrder
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D104028
2021-06-18 16:55:38 +08:00
Max Kazantsev fa5eb22ad4 [NFC] Assert non-zero factor before division
This is to ensure that zero denominator leads to controlled
assertion failure rather than UB.
2021-06-18 15:50:50 +07:00
Haojian Wu 7670938bba [Attributor] Don't print the call-graph in a hard-coded file.
This looks like not a practical pattern in our codebase (it could fail
in some sandbox environement).

Instead we print it via standard output, and it is controled by the
-attributor-print-call-graph, this follows a similiar pattern of attributor-print-dep.
2021-06-18 09:38:07 +02:00
Daniil Seredkin 6de741de08 Revert "[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)"
This reverts commit 31053338c9.
2021-06-18 14:21:02 +07:00
Daniil Seredkin 31053338c9 [InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)
InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case.

Differential Revision: https://reviews.llvm.org/D104193
2021-06-18 14:12:00 +07:00
Fangrui Song 5798be8458 Revert D103717 "[InstrProfiling] Make __profd_ unconditionally private for ELF"
This reverts commit 76d0747e08.

If a group has `__llvm_prf_vals` due to static value profiler counters
(`NS!=0`), we cannot make `__llvm_prf_data` private, because a prevailing text
section may reference `__llvm_prf_data` and will cause a `relocation refers to a
discarded section` linker error.

Note: while a `__profc_` group is non-prevailing, it may be referenced by a
prevailing text section due to inlining.

```
group section [   66] `.group' [__profc__ZN5clang20EmitClangDeclContextERN4llvm12RecordKeeperERNS0_11raw_ostreamE] contains 4 sections:
   [Index]    Name
   [   67]   __llvm_prf_cnts
   [   68]   __llvm_prf_vals
   [   69]   __llvm_prf_data
   [   70]   .rela__llvm_prf_data
```
2021-06-17 23:38:17 -07:00
Johannes Doerfert 30c9d68ad9 [Attributor][FIX] Arguments of unknown functions can be undef
This should fix PR50683. The wrong assumption was that we
could always know what the callee is when we replace a call site
argument with undef. We wanted to know that to remove the `noundef`
that might be attached to the argument. Since no callee means we
did the propagation on the caller site, there is no need to remove
an attribute. It is only needed if we replace all uses and therefore
pass `undef` instead of the value that was passed in otherwise.
2021-06-18 01:07:53 -05:00
Johannes Doerfert 666dc6f126 [Attributor] Use a centralized value simplification interface
To allow outside AAs that simplify values we need to ensure all value
simplification goes through the Attributor, not AAValueSimplify (or any
of the other AAs we have already like AAPotentialValues). This patch
also introduces an interface for the outside AAs to register
simplification callbacks for an IRPosition. To make this work as
expected we have to pass IRPositions instead of Values in
AAValueSimplify, which makes sense by itself.
2021-06-18 01:07:53 -05:00
Johannes Doerfert d9194b6efb [Attributor] Introduce a helper do deal with constant type mismatches
If we simplify values we sometimes end up with type mismatches. If the
value is a constant we can often cast it though to still allow
propagation. The logic is now put into a helper and it replaces some
ad hoc things we did before.

This also introduces the AA namespace for abstract attribute related
functions and types.

Differential Revision: https://reviews.llvm.org/D103856
2021-06-18 01:07:52 -05:00
Johannes Doerfert 9959eee001 [Attributor] Make sure Heap2Stack works properly on a GPU target
If the target stack is not accessible between different running
"threads" we have to make sure not to create allocas for mallocs
that might be used by multiple "threads". The "use check" is
sufficient to prevent this but if we apply the "free check" we have
to make sure the pointer is not communicated to others before
the free is reached.

Differential Revision: https://reviews.llvm.org/D98608
2021-06-18 01:07:52 -05:00
Johannes Doerfert 9a23e673ca [OpenMP][NFC] Expose AAExecutionDomain and rename its getter
The initial use for AAExecutionDomain was to determine if a single
thread executes a block. While this is sometimes informative most
of the time, and for other reasons, we actually want to know if it
is the "initial thread". Thus, the thread that started execution on
the current device. The deduction needs to be adjusted in a follow
up as the methods we use right not are looking for the OpenMP thread
id which is resets whenever a thread enters a parallel region. What
we basically want is to look for `llvm.nvvm.read.ptx.sreg.ntid.x` and
equivalent functions.
2021-06-18 01:07:52 -05:00
Johannes Doerfert 8d7bace3b5 [Attributor][NFC] AAReachability is currently stateless, don't invalidate it
We invalidated AAReachabilityImpl directly which is not helpful and
confusing as we still used it regardless. We now avoid invalidating it
(not needed anyway) and add checks for the state. This has by itself no
actual effect but prepares for later extensions.
2021-06-18 01:07:51 -05:00
George Balatsouras c6b5a25eeb [dfsan] Replace dfs$ prefix with .dfsan suffix
The current naming scheme adds the `dfs$` prefix to all
DFSan-instrumented functions.  This breaks mangling and prevents stack
trace printers and other tools from automatically demangling function
names.

This new naming scheme is mangling-compatible, with the `.dfsan`
suffix being a vendor-specific suffix:
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-structure

With this fix, demangling utils would work out-of-the-box.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D104494
2021-06-17 22:42:47 -07:00
Xun Li 3522167efd [Coroutine] Properly deal with byval and noalias parameters
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=48857.
Previous attempts can be found in D104007 and D101980.
A lot of discussions can be found in those two patches.
To summarize the bug:
When Clang emits IR for coroutines, the first thing it does is to make a copy of every argument to the local stack, so that uses of the arguments in the function will all refer to the local copies instead of the arguments directly.
However, in some cases we find that arguments are still directly used:
When Clang emits IR for a function that has pass-by-value arguments, sometimes it emits an argument with byval attribute. A byval attribute is considered to be local to the function (just like alloca) and hence it can be easily determined that it does not alias other values. If in the IR there exists a memcpy from a byval argument to a local alloca, and then from that local alloca to another alloca, MemCpyOpt will optimize out the first memcpy because byval argument's content will not change. This causes issues because after a coroutine suspension, the byval argument may die outside of the function, and latter uses will lead to memory use-after-free.
This is only a problem for arguments with either byval attribute or noalias attribute, because only these two kinds are considered local. Arguments without these two attributes will be considered to alias coro_suspend and hence we won't have this problem. So we need to be able to deal with these two attributes in coroutines properly.
For noalias arguments, since coro_suspend may potentially change the value of any argument outside of the function, we simply shouldn't mark any argument in a coroutiune as noalias. This can be taken care of in CoroEarly pass.
For byval arguments, if such an argument needs to live across suspensions, we will have to copy their value content to the frame, not just the pointer.

Differential Revision: https://reviews.llvm.org/D104184
2021-06-17 19:06:10 -07:00
Roman Lebedev 84eeb82888
[NFC][SimpleLoopUnswitch] unswitchTrivialBranch(): add debug output explaining unswitching failure
It's not prohibitively verbose, and allows easier understanding
why certain unswitching ultimately wasn't performed.
2021-06-18 00:46:04 +03:00
Kuter Dinel eaf1b6810c [Attributor] Derive AACallEdges attribute
This attribute computes the optimistic live call edges using the attributor
liveness information. This attribute will be used for deriving a
inter-procedural function reachability attribute.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104059
2021-06-18 03:29:22 +03:00
Andrew Browne 39295e92f7 Revert "[DFSan] Cleanup code for platforms other than Linux x86_64."
This reverts commit 8441b993bd.

Buildbot failures.
2021-06-17 14:19:18 -07:00
Fangrui Song 76d0747e08 [InstrProfiling] Make __profd_ unconditionally private for ELF
For ELF, since all counters/data are in a section group (either `comdat any` or
`comdat noduplicates`), and the signature for `comdat any` is `__profc_`, the
D1003372 optimization prerequisite (linker GC cannot discard data variables
while the text section is retained) is always satisified, we can make __profd_
unconditionally private.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103717
2021-06-17 14:16:54 -07:00
Craig Topper 99e95856fb [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp.
This pass emits a floating point compare and a conditional branch,
but if strictfp is enabled we don't emit a constrained compare
intrinsic.

The backend also won't expand the readonly sqrt call this pass inserts
to a sqrt instruction under strictfp. So we end up with 2 libcalls as
seen here. https://godbolt.org/z/oax5zMEWd

Fix these things by disabling the pass.

Differential Revision: https://reviews.llvm.org/D104479
2021-06-17 14:15:12 -07:00
Andrew Browne 8441b993bd [DFSan] Cleanup code for platforms other than Linux x86_64.
These other platforms are unsupported and untested.
They could be re-added later based on MSan code.

Reviewed By: gbalats, stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D104481
2021-06-17 14:08:40 -07:00
Nikita Popov f7c54c4603 [LoopUnroll] Fold all exits based on known trip count/multiple
Fold all exits based on known trip count/multiple information from
SCEV. Previously only the latch exit or the single exit were folded.

This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple
entirely: They're still used to a) decide whether runtime unrolling
should be performed and b) for ORE remarks. However, the core
unrolling logic is independent of them now.

Differential Revision: https://reviews.llvm.org/D104203
2021-06-17 20:58:34 +02:00
Roman Lebedev 37dfc467ac
[NFC] LoopVectorizationCostModel::getMaximizedVFForTarget(): clarify debug msg
This really isn't talking about vectors in general,
but only about either fixed or scalable vectors,
and it's pretty confusing to see it state
that there aren't any vectors :)
2021-06-17 21:07:34 +03:00
hyeongyukim 69b0ed9a0a [InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210)
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.

```
   // assume that GV is an array of 4-byte elements
   GEP = gep GV, 0, Idx // this is accessing Idx * 4
   L = load GEP
   ICI = icmp eq L, value
 =>
   ICI = icmp eq Idx, NewIdx
```

The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.

Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.

This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.

```
   ...
 =>
   Idx' = and Idx, 0x3F..FF
   ICI = icmp eq Idx', NewIdx
```

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D99481
2021-06-17 19:46:17 +09:00
Sjoerd Meijer dcd23d875a [FuncSpec] Don't specialise functions with attribute NoDuplicate.
Differential Revision: https://reviews.llvm.org/D104378
2021-06-17 10:32:29 +01:00
Florian Hahn 80a403348b
[VPlan] Support PHIs as LastInst when inserting scalars in ::get().
At the moment, we create insertelement instructions directly after
LastInst when inserting scalar values in a vector in
VPTransformState::get.

This results in invalid IR when LastInst is a phi, followed by another
phi. In that case, the new instructions should be inserted just after
the last PHI node in the block.

At the moment, I don't think the problematic case can be triggered, but
it can happen once predicate regions are merged and multiple
VPredInstPHI recipes are in the same block (D100260).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104188
2021-06-17 09:36:44 +01:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Sjoerd Meijer 49ab3b1735 [FuncSpec] Statistics
Adds some bookkeeping for collecting the number of specialised functions and a
test for that.

Differential Revision: https://reviews.llvm.org/D104102
2021-06-16 09:11:51 +01:00
Evgeniy Brevnov 96cded5b79 [SLP] Incorrect handling of external scalar values
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D103954
2021-06-16 13:27:36 +07:00
Andrew Browne e652d99169 [DFSan][NFC] Fix shadowing variable name. 2021-06-15 22:58:22 -07:00
Rong Xu 82a0bb1afc [SampleFDO] Place the discriminator flag variable into the used list.
We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.

This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.

Here I add the variable to the use list in IR level's AddDiscriminators
pass. The machine level code is still keep in the case IR's
AddDiscriminators is not invoked. If this is the case, this just use
-Wl,--export-dynamic-symbol=__llvm_fs_discriminator__
to force the emit.

Differential Revision: https://reviews.llvm.org/D103988
2021-06-15 21:51:04 -07:00
Chuanqi Xu 86906304d8 [FuncSpec] Use std::pow instead of operator^
The original implementation calculating UserBonus uses operator ^, which means XOR in C++
language.
At the first glance of reviewing, I thought it should be power, my bad.
It doesn't make sense to use XOR here. So I believe it should be a
carelessness as I made.

Test Plan: check-all

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D104282
2021-06-16 10:13:21 +08:00
Andrew Browne af93157625 [DFSan] Handle landingpad inst explicitly as zero shadow.
Before this change, DFSan was relying fallback cases when getting origin
address.

Differential Revision: https://reviews.llvm.org/D104266
2021-06-15 18:28:20 -07:00
Vitaly Buka 6478ef61b1 [asan] Remove Asan, Ubsan support of RTEMS and Myriad
Differential Revision: https://reviews.llvm.org/D104279
2021-06-15 12:59:05 -07:00
Roman Lebedev e52364532a
[NewPM] Remove SpeculateAroundPHIs pass
Addition of this pass has been botched.
There is no particular reason why it had to be sold as an inseparable part
of new-pm transition. It was added when old-pm was still the default,
and very *very* few users were actually tracking new-pm,
so it's effects weren't measured.

Which means, some of the turnoil of the new-pm transition
are actually likely regressions due to this pass.

Likewise, there has been a number of post-commit feedback
(post new-pm switch), namely
* https://reviews.llvm.org/D37467#2787157 (regresses HW-loops)
* https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before)
* https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata)
and in the half year past, the pass authors (google) still haven't found time to respond to any of that.

Hereby it is proposed to backout the pass from the pipeline,
until someone who cares about it can address the issues reported,
and properly start the process of adding a new pass into the pipeline,
with proper performance evaluation.

Furthermore, neither google nor facebook reports any perf changes
from this change, so i'm dropping the pass completely.
It can always be re-reverted should/if anyone want to pick it up again.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D104099
2021-06-15 20:35:55 +03:00
Andrew Litteken 2c21278e74 [IROutliner] Adding DebugInfo handling for IR Outlined Functions
This adds support for functions outlined by the IR Outliner to be
recognized by the debugger.  The expected behavior is that it will
skip over the instructions included in that section.  This is due to the
fact that we can not say which of the original locations the
instructions originated from.

These functions will show up in the call stack, but you cannot step
through them.

Reviewers: paquette, vsk, djtodoro

Differential Revision: https://reviews.llvm.org/D87302
2021-06-15 10:57:08 -05:00
Florian Hahn f7fc8927c0
[LoopDeletion] Check for irreducible cycles when deleting loops.
Loops with irreducible cycles may loop infinitely. Those cannot be
removed, unless the loop/function is marked as mustprogress.

Also discussed in D103382.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104238
2021-06-15 12:56:12 +01:00
Neil Henning 1540da3b78 ABI breaking changes fixes.
This commit mostly just replaces bad uses of `NDEBUG` with uses of
`LLVM_ENABLE_ABI_BREAKING_CHANGES` - the safe way to include ABI
breaking changes (normally extra struct elements in headers).

Differential Revision: https://reviews.llvm.org/D104216
2021-06-15 11:08:13 +01:00
Vitaly Buka b8919fb0ea [NFC][sanitizer] clang-format some code 2021-06-14 18:05:22 -07:00
Huihui Zhang 1c096bf09f [SVE][LSR] Teach LSR to enable simple scaled-index addressing mode generation for SVE.
Currently, Loop strengh reduce is not handling loops with scalable stride very well.

Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance,
(refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added).

Memory accesses are incremented by "16*vscale", while induction variable is incremented
by "8*vscale". The scaling factor "2" needs to be extracted to build candidate formula
i.e., "reg(%in) + 2*reg({0,+,(8 * %vscale)}". So that addrec register reg({0,+,(8*vscale)})
can be reused among Address and ICmpZero LSRUses to enable optimal solution selection.

This patch allow LSR getExactSDiv to recognize special cases like "C1*X*Y /s C2*X*Y",
and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR
is missing candidate formula with proper scaled factor to leverage target scaled-index
addressing mode.

Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable
vector. But allow simple valid scale to pass through.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D103939
2021-06-14 16:42:34 -07:00
Matt Morehouse b87894a1d2 [HWASan] Enable globals support for LAM.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D104265
2021-06-14 14:20:44 -07:00
Sanjay Patel 8591640379 [InstCombine] add DeMorgan folds for logical ops in select form
We canonicalized to these select patterns (poison-safe logic)
with D101191, so we need to reduce 'not' ops when possible
as we would with 'and'/'or' instructions.

This is shown in a secondary example in:
https://llvm.org/PR50389

https://alive2.llvm.org/ce/z/BvsESh
2021-06-14 12:54:35 -04:00
Florian Hahn 96ca03493a
[VectorCombine] Limit scalarization to non-poison indices for now.
As Eli mentioned post-commit in D103378, the result of the freeze may
still be out-of-range according to Alive2. So for now, just limit the
transform to indices that are non-poison.
2021-06-14 16:40:14 +01:00
Jeroen Dobbelaere bb8ce25e88 Intrinsic::getName: require a Module argument
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.

This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.

Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99173
2021-06-14 14:52:29 +02:00
Aditya Kumar dcbbc69cc5 Calculate getTerminator only when necessary
Differential Revision: https://reviews.llvm.org/D104202
2021-06-13 20:16:07 -07:00
Simon Pilgrim 9efe89d82f BoundsChecking.cpp - tidy implicit header dependencies. NFCI.
We don't use <vector> but we do use std::pair (<utility>)
2021-06-13 17:08:15 +01:00
Simon Pilgrim 56541d1377 GVN.cpp - remove unused <vector> include. NFCI. 2021-06-13 14:06:32 +01:00
Simon Pilgrim c14fd171fe LoopUnrollAndJamPass.cpp - remove unused <vector> include. NFCI. 2021-06-13 14:06:32 +01:00
Sanjay Patel afd44bb6f2 [InstCombine] fold ctlz/cttz of bool types
https://alive2.llvm.org/ce/z/tX4pUT
2021-06-13 08:26:40 -04:00
Simon Pilgrim 2477b498f2 ArgumentPromotion.cpp - remove unused <string> include. NFCI. 2021-06-13 13:03:47 +01:00
Simon Pilgrim b013c58e82 VPlanSLP.cpp - tidy implicit header dependencies. NFCI.
We don't use std::string and std::vector, but we do use std::pair and std::max.
2021-06-13 12:37:17 +01:00
Xun Li fae7debadc [CHR] Don't run ControlHeightReduction if any BB has address taken
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50610.
In computed goto pattern, there are usually a list of basic blocks that are all targets of indirectbr instruction, and each basic block also has address taken and stored in a variable.
CHR pass could potentially clone these basic blocks, which would generate a cloned version of the indirectbr and clonved version of all basic blocks in the list.
However these basic blocks will not have their addresses taken and stored anywhere. So latter SimplifyCFG pass will simply remove all tehse cloned basic blocks, resulting in incorrect code.
To fix this, when searching for scopes, we skip scopes that contains BBs with addresses taken.
Added a few test cases.

Reviewed By: aeubanks, wenlei, hoy

Differential Revision: https://reviews.llvm.org/D103867
2021-06-12 10:29:53 -07:00
Kevin Athey 1d22596b2f [sanitizer] Remove numeric values from -asan-use-after-return flag. (NFC)
for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D104152
2021-06-11 15:14:51 -07:00
Kevin Athey e0b469ffa1 [clang-cl][sanitizer] Add -fsanitize-address-use-after-return to clang.
Also:
  - add driver test (fsanitize-use-after-return.c)
  - add basic IR test (asan-use-after-return.cpp)
  - (NFC) cleaned up logic for generating table of __asan_stack_malloc
    depending on flag.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D104076
2021-06-11 12:07:35 -07:00
Valery N Dmitriev 94a07c79cf [SLP][NFC] Fix condition that was supposed to save a bit of compile time.
It was found by chance revealing discrepancy between comment (few lines above),
the condition and how re-ordering of instruction is done inside the if statement
it guards. The condition was always evaluated to true.

Differential Revision: https://reviews.llvm.org/D104064
2021-06-11 10:08:55 -07:00
Adam Nemet e0efebb8eb [Matrix] In transpose opts, handle a^t * a^t
Without the fix the testcase crashes because we remove the same instruction
twice.

Differential Revision: https://reviews.llvm.org/D104127
2021-06-11 09:29:43 -07:00
Alexey Bataev a010d4230e [SLP]Allow reordering of insertelements.
After we added support for non-ordered insertelements, we can allow
their reordering.

Differential Revision: https://reviews.llvm.org/D104057
2021-06-11 08:47:41 -07:00
Matt Morehouse 0867edfc64 [HWASan] Add basic stack tagging support for LAM.
Adds the basic instrumentation needed for stack tagging.

Currently does not support stack short granules or TLS stack histories,
since a different code path is followed for the callback instrumentation
we use.

We may simply wait to support these two features until we switch to
a custom calling convention.

Patch By: xiangzhangllvm, morehouse

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D102901
2021-06-11 08:21:17 -07:00
Alexey Bataev 74af4bb1f4 [SLP]Remove unnecessary UndefValue in CreateShuffle.
No need to use UndefValue in CreateShuffle call.

Differential Revision: https://reviews.llvm.org/D104113
2021-06-11 08:08:30 -07:00
Sjoerd Meijer 9907746f5d Move Function Specialization to its correct location. NFC.
As a follow up of rGc4a0969b9c14, and as part of D104102, move it to
the IPO transformations directory.
2021-06-11 15:00:10 +01:00
Sanjay Patel 602ab24833 [SimplifyCFG] avoid crash on degenerate loop
The problematic code pattern in the test is based on:
https://llvm.org/PR50638

If the IfCond is itself the phi that we are trying to remove,
then the loop around line 2835 can end up with something like:
%cmp = select i1 %cmp, i1 false, i1 true

That can then lead to a use-after-free and assert (although
I'm still not seeing that locally in my release + asserts build).

I think this can only happen with unreachable code.

Differential Revision: https://reviews.llvm.org/D104063
2021-06-11 09:37:06 -04:00
Simon Pilgrim 61cdaf66fe [ADT] Remove APInt/APSInt toString() std::string variants
<string> is currently the highest impact header in a clang+llvm build:

https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html

One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.

This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.

Differential Revision: https://reviews.llvm.org/D103888
2021-06-11 13:19:15 +01:00
Roman Lebedev 20542b47d6
[VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper
This results in slightly more optimistic alignments in some cases
2021-06-11 12:47:10 +03:00
Roman Lebedev abc0e0125c
[NFC][VectorCombine] Extract computeAlignmentAfterScalarization() helper function 2021-06-11 12:47:09 +03:00
Simon Pilgrim 5e6bfb661e [Analysis] Pass RecurrenceDescriptor as const reference. NFCI.
We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.).

Differential Revision: https://reviews.llvm.org/D104029
2021-06-11 10:24:14 +01:00
Sjoerd Meijer c4a0969b9c Function Specialization Pass
This adds a function specialization pass to LLVM. Constant parameters
like function pointers and constant globals are propagated to the callee by
specializing the function.

This is a first version with a number of limitations:
- The pass is off by default, so needs to be enabled on the command line,
- It does not handle specialization of recursive functions,
- It does not yet handle constants and constant ranges,
- Only 1 argument per function is specialised,
- The cost-model could be further looked into, and perhaps related,
- We are not yet caching analysis results.

This is based on earlier work by Matthew Simpson (D36432) and Vinay Madhusudan.
More recently this was also discussed on the list, see:

https://lists.llvm.org/pipermail/llvm-dev/2021-March/149380.html.

The motivation for this work is that function specialisation often comes up as
a reason for performance differences of generated code between LLVM and GCC,
which has this enabled by default from optimisation level -O3 and up. And while
this certainly helps a few cpu benchmark cases, this also triggers in real
world codes and is thus a generally useful transformation to have in LLVM.

Function specialisation has great potential to increase compile-times and
code-size.  The summary from some investigations with this patch is:
- Compile-time increases for short compile jobs is high relatively, but the
  increase in absolute numbers still low.
- For longer compile-jobs, the extra compile time is around 1%, and very much
  in line with GCC.
- It is difficult to blame one thing for compile-time increases: it looks like
  everywhere a little bit more time is spent processing more functions and
  instructions.
- But the function specialisation pass itself is not very expensive; it doesn't
  show up very high in the profile of the optimisation passes.

The goal of this work is to reach parity with GCC which means that eventually
we would like to get this enabled by default. But first we would like to address
some of the limitations before that.

Differential Revision: https://reviews.llvm.org/D93838
2021-06-11 09:11:29 +01:00
Qiu Chaofan 2670c7dd5b [VectorCombine] Fix alignment in single element store
This fixes the concern in single element store scalarization that the
alignment of new store may be larger than it should be. It calculates
the largest alignment if index is constant, and a safe one if not.

Reviewed By: lebedev.ri, spatel

Differential Revision: https://reviews.llvm.org/D103419
2021-06-11 10:28:15 +08:00
Slava Nikolaev 119965865c LoadStoreVectorizer: support different operand orders in the add sequence match
First we refactor the code which does no wrapping add sequences
match: we need to allow different operand orders for
the key add instructions involved in the match.

Then we use the refactored code trying 4 variants of matching operands.

Originally the code relied on the fact that the matching operands
of the two last add instructions of memory index calculations
had the same LHS argument. But which operand is the same
in the two instructions is actually not essential, so now we allow
that to be any of LHS or RHS of each of the two instructions.
This increases the chances of vectorization to happen.

Reviewed By: volkan

Differential Revision: https://reviews.llvm.org/D103912
2021-06-10 16:31:35 -07:00
Andy Kaylor 41555eaf65 Preserve more MD_mem_parallel_loop_access and MD_access_group in SROA
SROA sometimes preserves MD_mem_parallel_loop_access and MD_access_group metadata on loads/stores, and sometimes fails to do so. This change adds copying of the MD after other CreateAlignedLoad/CreateAlignedStores. Also fix a case where the metadata was being copied from a load, rather than the store.

Added a LIT test to catch one case.

Patch by Mark Mendell

Differential Revision: https://reviews.llvm.org/D103254
2021-06-10 15:47:03 -07:00
Joachim Meyer 4f01122c3f [LV] Parallel annotated loop does not imply all loads can be hoisted.
As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads.
This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL.

The question remains why this was initially added and what the implications of removing this optimization would be.
Do we need an alternative mechanism to propagate the information about legality of if-conversion?
Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general?
I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous.

Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103907
2021-06-10 23:37:57 +02:00
Philip Reames 7629b2a09c [LI] Add a cover function for checking if a loop is mustprogress [nfc]
Essentially, the cover function simply combines the loop level check and the function level scope into one call.  This simplifies several callers and is (subjectively) less error prone.
2021-06-10 13:37:32 -07:00
Philip Reames b6ee5f2b1d Move code for checking loop metadata into Analysis [nfc]
I need the mustprogress loop metadata in ScalarEvolution and it makes sense to keep all the accessors for quering loop metadate together.
2021-06-10 13:01:22 -07:00
Alexey Bataev a893b44187 [SLP]Disable scheduling of insertelements.
There is no need to schedule insertelement instructions. The compiler
did not schedule them before it started support their vectorization and
it should not do it after. We pre-schedule them manually when finding
a build vector sequence.
Disabling scheduling of insertelement instructions improves compile
time and vectorization of the very large basic blocks by saving
scheduling budget for other instructions.

Differential Revision: https://reviews.llvm.org/D104026
2021-06-10 10:25:26 -07:00
Keith Smiley 026170d17d Fix range-loop-analysis warning
```
llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:19: warning: loop variable 'VF' of type 'const llvm::ElementCount' creates a copy from type 'const llvm::ElementCount' [-Wrange-loop-analysis]
  for (const auto VF : VFCandidates) {
                  ^
llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:8: note: use reference type 'const llvm::ElementCount &' to prevent copying
  for (const auto VF : VFCandidates) {
       ^~~~~~~~~~~~~~~
                  &
1 warning generated.
```

Differential Revision: https://reviews.llvm.org/D103970
2021-06-10 08:39:54 -07:00
Caroline Concatto 1ad52105eb [InstCombine] Add fold for extracting known elements from a stepvector
This patch allows folding stepvector + extract to the lane when the lane is
lower than the minimum size of the scalable vector. This fold is possible
because lane X of a stepvector is also X!
For instance, extracting element 3 of a <vscale x 4 x i64>stepvector is 3.

Differential Revision: https://reviews.llvm.org/D103153
2021-06-10 13:36:57 +01:00
Simon Pilgrim b01d393fc0 Fix MSVC int64_t -> uint64_t "narrowing conversion" warning. 2021-06-10 10:55:24 +01:00
Jon Roelofs f8f1c9c389 Annotate memcpy's of globals with info about the src/dst
Differential revision: https://reviews.llvm.org/D103994
2021-06-09 18:11:08 -07:00
Joseph Huber 4c9471581f [Attributor] Set floating point loads and stores as nofree in AANoFreeFloating
Summary:
The current implementation of AANoFreeFloating will incorrectly list floating
point loads and stores as may-free. This prevents other attributor instances
like HeapToStack from pushing some allocations to the stack.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103975
2021-06-09 16:16:37 -04:00
Leonard Chan 314c049142 [compiler-rt][hwasan] Decouple use of the TLS global for getting the shadow base and using the frame record feature
This allows for using the frame record feature (which uses __hwasan_tls)
independently from however the user wants to access the shadow base, which
prior was only usable if shadow wasn't accessed through the TLS variable or ifuncs.

Frame recording can be explicitly set according to ShadowMapping::WithFrameRecord
in ShadowMapping::init. Currently, it is only enabled on Fuchsia and if TLS is
used, so this should mimic the old behavior.

Added an extra case to prologue.ll that covers this new case.

Differential Revision: https://reviews.llvm.org/D103841
2021-06-09 12:55:19 -07:00
LemonBoy d3faef6eef [SROA] Avoid splitting loads/stores with irregular type
Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D99435
2021-06-09 16:36:58 +02:00
Alexey Bataev a0086add2e [SLP]Improve gathering of scalar elements.
1. Better sorting of scalars to be gathered. Trying to insert
   constants/arguments/instructions-out-of-loop at first and only then
   the instructions which are inside the loop. It improves hoisting of
   invariant insertelements instructions.
2. Better detection of shuffle candidates in gathering function.
3. The cost of insertelement for constants is 0.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103458
2021-06-09 05:23:21 -07:00
Nico Weber 205cde63c7 Revert "[SROA] Avoid splitting loads/stores with irregular type"
This reverts commit 905f4eb537.
Breaks check-llvm on most (all?) bots, see https://reviews.llvm.org/D99435
2021-06-09 06:32:58 -04:00
LemonBoy 905f4eb537 [SROA] Avoid splitting loads/stores with irregular type
Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D99435
2021-06-09 11:48:20 +02:00
Jingu Kang 8eee02020b [LoopBoundSplit] Ignore phi node which is not scevable
There was a bug in LoopBoundSplit. The pass should ignore phi node which is not
scevable.

Differential Revision: https://reviews.llvm.org/D103913
2021-06-09 09:44:36 +01:00
Kevin Athey af8c59e06d Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never|[runtime]|always).
In addition:
  - optionally add global flag to capture compile intent for UAR:
    __asan_detect_use_after_return_always.
    The global is a SANITIZER_WEAK_ATTRIBUTE.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D103304
2021-06-08 14:39:06 -07:00
Sanjay Patel d2012d965d [InstCombine] fix nsz (fast-math) propagation from fneg-of-select
As discussed in the post-commit comments for:
3cdd05e519

It seems to be safe to propagate all flags from the final fneg
except for 'nsz' to the new select:
https://alive2.llvm.org/ce/z/J_APDc

nsz has unique FMF semantics: it is not poison, it is only
"insignificant" in the calculation according to the LangRef.
2021-06-08 17:04:30 -04:00
David Green 297088d1ad Revert "[DSE] Remove stores in the same loop iteration"
Apparently non-dead stores are being removed, as noted in D100464.

This reverts commit 222aeb4d51.
2021-06-08 21:23:08 +01:00
Hans Wennborg 386b66b2fc Revert "3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands""
> This reapplies c0f3dfb9, which was reverted following the discovery of
> crashes on linux kernel and chromium builds - these issues have since
> been fixed, allowing this patch to re-land.

This reverts commit 36ec97f76a.

The change caused non-determinism in the compiler, see comments on the code
review at https://reviews.llvm.org/D91722.

Reverting to unbreak people's builds until that can be addressed.

This also reverts the follow-up "[DebugInfo] Limit the number of values
that may be referenced by a dbg.value" in
a0bd6105d8.
2021-06-08 14:54:08 +02:00
maekawatoshiki 09e92c607c [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.

Also, a crash problem on legacy pass manager is fixed.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D99149
2021-06-08 20:30:02 +09:00
Simon Pilgrim 596004a947 MemCpyOptimizer.cpp - hasUndefContentsMSSA - Pass DataLayout by reference. NFCI. 2021-06-08 10:41:02 +01:00
Kerry McLaughlin 14eeccfe9a [LoopVectorize] Don't use strict reductions when reordering is allowed
If the `-enable-strict-reductions` flag is set to true, then currently we will
always choose to vectorize the loop with strict in-order reductions. This is
not necessary where we allow the reordering of FP operations, such as
when loop hints are passed via metadata.

This patch moves useOrderedReductions so that we can also check whether
loop hints allow reordering, in which case we should use the default
behaviour of vectorizing with unordered reductions.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D103814
2021-06-08 10:39:29 +01:00
George Balatsouras 5b4dda550e [dfsan] Add full fast8 support
Complete support for fast8:
- amend shadow size and mapping in runtime
- remove fast16 mode and -dfsan-fast-16-labels flag
- remove legacy mode and make fast8 mode the default
- remove dfsan-fast-8-labels flag
- remove functions in dfsan interface only applicable to legacy
- remove legacy-related instrumentation code and tests
- update documentation.

Reviewed By: stephan.yichao.zhao, browneee

Differential Revision: https://reviews.llvm.org/D103745
2021-06-07 17:20:54 -07:00
Arthur Eubanks 47211fa889 Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry"
Needs to be discussed more.

This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46
This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1
This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23
This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd
2021-06-07 16:07:44 -07:00
Nikita Popov 8fdd7c2ff1 [LoopUnroll] Clamp unroll count to MaxTripCount
Unrolling with more iterations than MaxTripCount is pointless, as
those iterations can never be executed. As such, we clamp ULO.Count
to MaxTripCount if it is known. This means we no longer need to
consider iterations after MaxTripCount for exit folding, and the
CompletelyUnroll flag becomes independent of ULO.TripCount.

Differential Revision: https://reviews.llvm.org/D103748
2021-06-07 21:08:42 +02:00
Philip Reames c880d5e583 [RS4GC] Treat inttoptr as base pointer
This is a modified version of a patch by tolziplohu with a style change, and most importantly, a revised commit message.

inttoptr for a non-integral address space is currently ill defined in the LangRef.  Figuring out exactly what the dynamic semantics of such a cast would be is hard, and not yet settled.  Despite that, we still need to go ahead and implement something in RS4GC for a couple of reasons.

First, as a simple consistency argument.  We're apparently added support for constexpr inttoptrs a while back, and even have tests which exercised them.  Having a lack of constant folding trigger a crash during lowering is non-ideal.

Second, and more fundementally, the optimizer is allowed to insert undefined constructs in unreachable code.  At the same time, we can't assume that dynamically dead code is always pruned before lowering.  As a result, we must assume that inttoptrs can occur (even if completely ill defined) along dead paths.  We need the lowering to not crash.  The stackmaps produced can be garbage (as the assumption is the code is dynamically dead), but the lowering itself can't crash.

Differential Revision: https://reviews.llvm.org/D103492
2021-06-07 10:27:23 -07:00
Sanjay Patel 4675beaa21 [InstCombine] intersect nsz and ninf fast-math-flags (FMF) for fneg(fdiv) fold
https://alive2.llvm.org/ce/z/3KPvih

https://llvm.org/PR49654
2021-06-07 13:22:49 -04:00
Sanjay Patel 519e98cd9a [InstCombine] refactor match clauses; NFC
We need to adjust the FMF propagation on at least
one of these transforms as discussed in:
https://llvm.org/PR49654
...so this should make it easier to intersect flags.
2021-06-07 13:22:49 -04:00
Florian Hahn 1465e7770b
[VPlan] Print successors of VPRegionBlocks.
The non-DOT printing does not include the successors of VPregionBlocks.
This patch use the same style for printing successors as for
VPBasicBlock.

I think the printing of successors could be a bit improved further, as
at the moment it is hard to ensure a check line matches all successors.
But that can be done as follow-up.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D103515
2021-06-07 17:57:21 +01:00
Fraser Cormack ae3f6de3a8 [InstCombine] Support negation of scalable-vector splats
This patch is an extension of D103421. It allows the InstCombiner to
generate the negated form of integer scalable-vector splats. It can
technically handle fixed-length vectors too but those are completely
covered by the preceding logic.

This enables extra combining opportunities for scalable vector types.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103801
2021-06-07 15:14:00 +01:00
Daniil Seredkin 7736c1936a [InstCombine] Missed optimization for pow(x, y) * pow(x, z) with fast-math
If FP reassociation (fast-math) is allowed, then LLVM is free to do the
following transformation pow(x, y) * pow(x, z) -> pow(x, y + z).
This patch adds this transformation and tests for it.
See more https://bugs.llvm.org/show_bug.cgi?id=47205

It handles two cases

1. When operands of fmul are different instructions

%4 = call reassoc float @llvm.pow.f32(float %0, float %1)
%5 = call reassoc float @llvm.pow.f32(float %0, float %2)
%6 = fmul reassoc float %5, %4
-->
%3 = fadd reassoc float %1, %2
%4 = call reassoc float @llvm.pow.f32(float %0, float %3)

2. When operands of fmul are the same instruction

%4 = call reassoc float @llvm.pow.f32(float %0, float %1)
%5 = fmul reassoc float %4, %4
-->
%3 = fadd reassoc float %1, %1
%4 = call reassoc float @llvm.pow.f32(float %0, float %3)

Differential Revision: https://reviews.llvm.org/D102574
2021-06-07 08:08:05 -04:00
Liqiang Tao 4a0de622c3 [llvm] Add interface to order inlining
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order,
e.g. use queue or priority queue.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D103315
2021-06-07 18:27:55 +08:00
Jingu Kang a2a0ac42ab [SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV
This pass transforms loops that contain a conditional branch with induction
variable. For example, it transforms left code to right code:

                             newbound = min(n, c)
 while (iv < n) {            while(iv < newbound) {
   A                           A
   if (iv < c)                 B
     B                         C
   C                         }
 }                           if (iv != n) {
                               while (iv < n) {
                                 A
                                 C
                               }
                             }

Differential Revision: https://reviews.llvm.org/D102234
2021-06-07 10:55:25 +01:00
Florian Hahn 23c2f2e6b2
[LV] Mark increment of main vector loop induction variable as NUW.
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.

If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.

This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.

At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.

Note that this could probably be further improved by using information
from the original IV.

Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g

Part of a set of fixes required for PR50412.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D103255
2021-06-07 10:47:52 +01:00
maekawatoshiki 0a9d079931 Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass"
This reverts commit 2165360003.

To fix the crash problem in legacy pass manager
2021-06-07 01:26:47 +09:00
Simon Pilgrim 9ced408fe9 SimplifyCFG.cpp - remove dead early-return code added at rGcc63203908da. NFCI.
We've already checked that ScanIdx == 0 a few lines above.
2021-06-06 14:15:11 +01:00
Liqiang Tao 48252d7570 Revert "[llvm] Add interface to order inlining" 2021-06-06 14:45:03 +08:00
Liqiang Tao 478dc47292 [llvm] Add interface to order inlining
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order, i.e. use queue or priority queue.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D103315
2021-06-06 12:03:02 +08:00
Roman Lebedev e350494fb0
[NFC] Promote willNotOverflow() / getStrengthenedNoWrapFlagsFromBinOp() from IndVars into SCEV proper
We might want to use it when creating SCEV proper in createSCEV(),
now that we don't `forgetValue()` in `SimplifyIndvar::strengthenOverflowingOperation()`,
which might have caused us to loose some optimization potential.
2021-06-05 12:17:51 +03:00
Nikita Popov db45746821 [LoopUnroll] Separate peeling from unrolling
Loop peeling is currently performed as part of UnrollLoop().
Outside test scenarios, it is always performed with an unroll
count of 1. This means that unrolling doesn't actually do anything
apart from performing post-unroll simplification.

When testing, it's currently possible to specify both an explicit
peel count and an explicit unroll count. This doesn't perform any
sensible operation and may result in miscompiles, see
https://bugs.llvm.org/show_bug.cgi?id=45939.

This patch moves peeling from UnrollLoop() into tryToUnrollLoop(),
so that peeling does not also perform a susequent unroll. We only
run the post-unroll simplifications. Specifying both an explicit
peel count and unroll count is forbidden.

In the future, we may want to support both (non-PGO) peeling a
loop and unrolling it, but this needs to be done by first performing
the peel and then recalculating unrolling heuristics on a now
possibly analyzable loop.

Differential Revision: https://reviews.llvm.org/D103362
2021-06-05 10:32:00 +02:00
Vitaly Buka e3258b0894 Revert "Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never|[runtime]|always)."
Windows is still broken.

This reverts commit 927688a4cd.
2021-06-05 00:39:50 -07:00
Kevin Athey 927688a4cd Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never|[runtime]|always).
In addition:
  - optionally add global flag to capture compile intent for UAR:
    __asan_detect_use_after_return_always.
    The global is a SANITIZER_WEAK_ATTRIBUTE.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D103304
2021-06-05 00:26:10 -07:00
Fangrui Song 06e7de795b Fix some -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build 2021-06-04 23:34:43 -07:00