by `getTerminator()` calls instead be declared as `Instruction`.
This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).
llvm-svn: 344502
This is the last interesting usage in all of LLVM's headers. The
remaining usages in headers are the core typesystem bits (Core.h,
instruction types, and InstVisitor) and as the return of
`BasicBlock::getTerminator`. The latter is the big remaining API point
that I'll remove after mass updates to user code.
llvm-svn: 344501
This requires updating a number of .cpp files to adapt to the new API.
I've just systematically updated all uses of `TerminatorInst` within
these files te `Instruction` so thta I won't have to touch them again in
the future.
llvm-svn: 344498
LLVM APIs. There weren't very many.
We still have the instruction visitor, and APIs with TerminatorInst as
a return type or an output parameter.
llvm-svn: 344494
Landing this as a separate part of https://reviews.llvm.org/D50480, being a
seemingly unrelated change ([LV] Vectorizing loops of arbitrary trip count
without remainder under opt for size).
llvm-svn: 344483
This is part of the missing IR-level folding noted in D52912.
This should be ok as a canonicalization because the new shuffle mask can't
be any more complicated than the existing shuffle mask. If there's some
target where the shorter vector shuffle is not legal, it should just end up
expanding to something like the pair of shuffles that we're starting with here.
Differential Revision: https://reviews.llvm.org/D53037
llvm-svn: 344476
interleave-group
The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.
Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53011
llvm-svn: 344472
Summary:
GetOrCreateFunctionComdat is currently used in SanitizerCoverage,
where it's defined. I'm planing to use it in HWASAN as well,
so moving it into a common location.
NFC
Reviewers: morehouse
Reviewed By: morehouse
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53218
llvm-svn: 344433
Summary:
Linking with the /OPT:REF linker flag when building COFF files causes
the linker to strip SanitizerCoverage's constructors. Prevent this by
giving the constructors WeakODR linkage and by passing the linker a
directive to include sancov.module_ctor.
Include a test in compiler-rt to verify libFuzzer can be linked using
/OPT:REF
Reviewers: morehouse, rnk
Reviewed By: morehouse, rnk
Subscribers: rnk, morehouse, hiraditya
Differential Revision: https://reviews.llvm.org/D52119
llvm-svn: 344391
Summary:
Otherwise, at least on Mac, the linker eliminates unused symbols which
causes libFuzzer to error out due to a mismatch of the sizes of coverage tables.
Issue in Chromium: https://bugs.chromium.org/p/chromium/issues/detail?id=892167
Reviewers: morehouse, kcc, george.karpenkov
Reviewed By: morehouse
Subscribers: kubamracek, llvm-commits
Differential Revision: https://reviews.llvm.org/D53113
llvm-svn: 344345
Summary:
We have two copies of createPrivateGlobalForString (in asan and in esan).
This change merges them into one. NFC
Reviewers: vitalybuka
Reviewed By: vitalybuka
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53178
llvm-svn: 344314
This patch ports the legacy pass manager to the new one to take advantage of
the benefits of the new PM. This involved moving a lot of the declarations for
`AddressSantizer` to a header so that it can be publicly used via
PassRegistry.def which I believe contains all the passes managed by the new PM.
This patch essentially decouples the instrumentation from the legacy PM such
hat it can be used by both legacy and new PM infrastructure.
Differential Revision: https://reviews.llvm.org/D52739
llvm-svn: 344274
InstCombine keeps a worklist and assumes that optimizations don't
eraseFromParent() the instruction, which SimplifyLibCalls violates. This change
adds a new callback to SimplifyLibCalls to let clients specify their own hander
for erasing actions.
Differential Revision: https://reviews.llvm.org/D52729
llvm-svn: 344251
This is the umin alternative to the umax code from rL344237. We use
DeMorgans law on the umax case to bring us to the same thing on umin,
but using countLeadingOnes, not countLeadingZeros.
Differential Revision: https://reviews.llvm.org/D53036
llvm-svn: 344239
Use the demanded bits of umax(A,C) to prove we can just use A so long as the
lowest non-zero bit of DemandMask is higher than the highest non-zero bit of C
Differential Revision: https://reviews.llvm.org/D53033
llvm-svn: 344237
We assign indices sequentially for seen instructions, so we can just use
a vector and push back the seen instructions. No need for using a
DenseMap.
Reviewers: hsaito, rengolin, nadav, dcaballe
Reviewed By: rengolin
Differential Revision: https://reviews.llvm.org/D53089
llvm-svn: 344233
We can avoid doing some unnecessary work by skipping debug instructions
in a few loops. It also helps to ensure debug instructions do not
prevent vectorization, although I do not have any concrete test cases
for that.
Reviewers: rengolin, hsaito, dcaballe, aprantl, vsk
Reviewed By: rengolin, dcaballe
Differential Revision: https://reviews.llvm.org/D53091
llvm-svn: 344232
Summary:
Right now there is no hit counter on the line of function.
So the idea is add the line of the function to all the lines covered by the entry block.
Tests in compiler-rt/profile will be fixed in another patch: https://reviews.llvm.org/D49854
Reviewers: marco-c, davidxl
Reviewed By: marco-c
Subscribers: sylvestre.ledru, llvm-commits
Differential Revision: https://reviews.llvm.org/D49853
llvm-svn: 344228
There is a transform that may replace `lshr (x+1), 1` with `lshr x, 1` in case
if it can prove that the result will be the same. However the initial instruction
might have an `exact` flag set, and it now should be dropped unless we prove
that it may hold. Incorrectly set `exact` attribute may then produce poison.
Differential Revision: https://reviews.llvm.org/D53061
Reviewed By: sanjoy
llvm-svn: 344223
This can be used to preserve profiling information across codebase
changes that have widespread impact on mangled names, but across which
most profiling data should still be usable. For example, when switching
from libstdc++ to libc++, or from the old libstdc++ ABI to the new ABI,
or even from a 32-bit to a 64-bit build.
The user can provide a remapping file specifying parts of mangled names
that should be treated as equivalent (eg, std::__1 should be treated as
equivalent to std::__cxx11), and profile data will be treated as
applying to a particular function if its name is equivalent to the name
of a function in the profile data under the provided equivalences. See
the documentation change for a description of how this is configured.
Remapping is supported for both sample-based profiling and instruction
profiling. We do not support remapping indirect branch target
information, but all other profile data should be remapped
appropriately.
Support is only added for the new pass manager. If someone wants to also
add support for this for the old pass manager, doing so should be
straightforward.
This is the LLVM side of Clang r344199.
Reviewers: davidxl, tejohnson, dlj, erik.pilkington
Subscribers: mehdi_amini, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51249
llvm-svn: 344200
Moving away from UnknownSize is part of the effort to migrate us to
LocationSizes (e.g. the cleanup promised in D44748).
This doesn't entirely remove all of the uses of UnknownSize; some uses
require tweaks to assume that UnknownSize isn't just some kind of int.
This patch is intended to just be a trivial replacement for all places
where LocationSize::unknown() will Just Work.
llvm-svn: 344186
I've added a new test case that causes the scalarizer to try and use
dead-and-erased values - caused by the basic blocks not being in
domination order within the function. To fix this, instead of iterating
through the blocks in function order, I walk them in reverse post order.
Differential Revision: https://reviews.llvm.org/D52540
llvm-svn: 344128
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines.
Differential Revision: https://reviews.llvm.org/D52887
llvm-svn: 344120
There are places where we need to merge multiple LocationSizes of
different sizes into one, and get a sensible result.
There are other places where we want to optimize aggressively based on
the value of a LocationSizes (e.g. how can a store of four bytes be to
an area of storage that's only two bytes large?)
This patch makes LocationSize hold an 'imprecise' bit to note whether
the LocationSize can be treated as an upper-bound and lower-bound for
the size of a location, or just an upper-bound.
This concludes the series of patches leading up to this. The most recent
of which is r344108.
Fixes PR36228.
Differential Revision: https://reviews.llvm.org/D44748
llvm-svn: 344114
This is the second in a series of changes intended to make
https://reviews.llvm.org/D44748 more easily reviewable. Please see that
patch for more context. The first change being r344012.
Since I was requested to do all of this with post-commit review, this is
about as small as I can make this patch.
This patch makes LocationSize into an actual type that wraps a uint64_t;
users are required to call getValue() in order to get the size now. If
the LocationSize has an Unknown size (e.g. if LocSize ==
MemoryLocation::UnknownSize), getValue() will assert.
This also adds DenseMap specializations for LocationInfo, which required
taking two more values from the set of values LocationInfo can
represent. Hence, heavy users of multi-exabyte arrays or structs may
observe slightly lower-quality code as a result of this change.
The intent is for getValue()s to be very close to a corresponding
hasValue() (which is often spelled `!= MemoryLocation::UnknownSize`).
Sadly, small diff context appears to crop that out sometimes, and the
last change in DSE does require a bit of nonlocal reasoning about
control-flow. :/
This also removes an assert, since it's now redundant with the assert in
getValue().
llvm-svn: 344013
This is one of a series of changes intended to make
https://reviews.llvm.org/D44748 more easily reviewable. Please see that
patch for more context.
Since I was requested to do all of this with post-commit review, this is
about as small as I can make it (beyond committing changes to these few
files separately, but they're incredibly similar in spirit, so...)
On its own, this change doesn't make a great deal of sense. I plan on
having a follow-up Real Soon Now(TM) to make the bits here make more
sense. :)
In particular, the next change in this series is meant to make
LocationSize an actual type, which you have to call .getValue() on in
order to get at the uint64_t inside. Hence, this change refactors code
so that:
- we only need to call the soon-to-come getValue() once in most cases,
and
- said call to getValue() happens very closely to a piece of code that
checks if the LocationSize has a value (e.g. if it's != UnknownSize).
llvm-svn: 344012
In r339636 the alias analysis rules were changed with regards to tail calls
and byval arguments. Previously, tail calls were assumed not to alias
allocas from the current frame. This has been updated, to not assume this
for arguments with the byval attribute.
This patch aligns TailCallElim with the new rule. Tail marking can now be
more aggressive and mark more calls as tails, e.g.:
define void @test() {
%f = alloca %struct.foo
call void @bar(%struct.foo* byval %f)
ret void
}
define void @test2(%struct.foo* byval %f) {
call void @bar(%struct.foo* byval %f)
ret void
}
define void @test3(%struct.foo* byval %f) {
%agg.tmp = alloca %struct.foo
%0 = bitcast %struct.foo* %agg.tmp to i8*
%1 = bitcast %struct.foo* %f to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 40, i1 false)
call void @bar(%struct.foo* byval %agg.tmp)
ret void
}
The problematic case where a byval parameter is captured by a call is still
handled correctly, and will not be marked as a tail (see PR7272).
llvm-svn: 343986
Summary:
If we have a symbol with (linkonce|weak)_odr linkage, we do not want
to dead strip it even it is not prevailing.
IR level (linkonce|weak)_odr symbol can become non-prevailing when we mix
ELF objects and IR objects where the (linkonce|weak)_odr symbol in the ELF
object is prevailing and the ones in the IR objects are not. Stripping
them will prevent us from doing optimizations with them.
By not dead stripping them, We will convert these symbols to
available_externally linkage as a result of non-prevailing and eventually
dropping them after inlining.
I modified cache-prevailing.ll to use linkonce linkage as it is
testing whether cache prevailing bit is effective or not, not
we should treat linkonce_odr alive or not
Reviewers: tejohnson, pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D52893
llvm-svn: 343970
The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with. To fix this
I've:
- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads.
- Used that array to pass into the Intrinsic::getDeclaration call.
- Added a CreateUnaryIntrinsic to replace the most common use of
CreateIntrinsic where the type was auto-deduced from operand 0.
- Added a bunch more unit tests to test Create*Intrinsic calls that
weren't being tested (including the FMF flag that wasn't checked).
This was suggested as part of the AMDGPU specific atomic optimizer
review (https://reviews.llvm.org/D51969).
Differential Revision: https://reviews.llvm.org/D52087
llvm-svn: 343962
Currently running the @insertelem_after_gep function below through the InstCombine pass with opt produces invalid IR.
Input:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
%t1 = bitcast <16 x i32>* %t0 to [16 x i32]*
%t2 = addrspacecast [16 x i32]* %t1 to [16 x i32] addrspace(3)*
%t3 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %t2, i64 0, i64 0
%t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
call void @extern_vec_pointers_func(<16 x i32 addrspace(3)*> %t4)
ret void
}
```
Output:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
%t3 = getelementptr inbounds <16 x i32>, <16 x i32>* %t0, i64 0, i64 0
%t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
call void @my_extern_func(<16 x i32 addrspace(3)*> %t4)
ret void
}
```
Which although causes no complaints when produced, isn't valid IR as the insertelement use of the %t3 GEP expects an address space.
```
opt: /tmp/bad.ll:52:73: error: '%t3' defined with type 'i32*' but expected 'i32 addrspace(3)*'
%t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
```
I've fixed this by adding an addrspacecast after the GEP in the InstCombine pass, and including a check for this type mismatch to the verifier.
Reviewers: spatel, lebedev.ri
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52294
llvm-svn: 343956
At the point when we perform `emitTransformedIndex`, we have a broken IR (in
particular, we have Phis for which not every incoming value is properly set). On
such IR, it is illegal to create SCEV expressions, because their internal
simplification process may try to prove some predicates and break when it
stumbles across some broken IR.
The only purpose of using SCEV in this particular place is attempt to simplify
the generated code slightly. It seems that the result isn't worth it, because
some trivial cases (like addition of zero and multiplication by 1) can be
handled separately if needed, but more generally InstCombine is able to achieve
the goals we want to achieve by using SCEV.
This patch fixes a functional crash described in PR39160, and as side-effect it
also generates a bit smarter code in some simple cases. It also may cause some
optimality loss (i.e. we will now generate `mul` by power of `2` instead of
shift etc), but there is nothing what InstCombine could not handle later. In
case of dire need, we can support more trivial cases just in place.
Note that this patch only fixes one particular case of the general problem that
LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR.
The general solution, however, seems complex enough.
Differential Revision: https://reviews.llvm.org/D52881
Reviewed By: fhahn, hsaito
llvm-svn: 343954
Call getOperandInfo() instead of using (near) duplicated code in
LoopVectorizationCostModel::getInstructionCost().
This gets the OperandValueKind and OperandValueProperties values for a Value
passed as operand to an arithmetic instruction.
getOperandInfo() used to be a static method in TargetTransformInfo.cpp, but
is now instead a public member.
Review: Florian Hahn
https://reviews.llvm.org/D52883
llvm-svn: 343852
Summary:
At some point in the past the recursion in DominatesMergePoint used to pass null for AggressiveInsts as part of the recursion. It no longer does this. So there is no way for AggressiveInsts to be null.
This passes it by reference and removes the null check to make this explicit.
Reviewers: efriedma, reames
Reviewed By: efriedma
Subscribers: xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D52575
llvm-svn: 343828
We established the (unfortunately complicated) rules for UB/poison
propagation with vector ops in:
D48893
D48987
D49047
It's clear from the affected tests that we are potentially creating
poison where none existed before the transforms. For add/sub/mul,
the answer is simple: just drop the flags because the extra undef
vector lanes are generally more valuable for analysis and codegen.
llvm-svn: 343819
Summary:
The llvm::SimplifyCFG function creates a SimplifyCFGOpt object and calls run on it. There were numerous places reached from this run function that called back out llvm::SimplifyCFG which would create another SimplifyCFGOpt object. This is an inefficient use of stack space at minimum. We are also not passing along the LoopHeaders pointer passed into the outer llvm::SimplifyCFG call. So if its not null we lose it on the first recursion and get nullptr from there on.
This patch adds an outer loop around the main BasicBlock simplifying code and adds a flag to the SimplifyCFGOpt class that can be set by to request another iteration. I don't think we can iterate based just on the change flag alone since some of the simplifications delete a basic block entirely leaving nothing to iterate on.
Reviewers: bogner, eli.friedman, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52760
llvm-svn: 343816
This is a follow-up to rL343482 / D52439.
This was a pattern that initially caused the commit to be reverted because
the transform requires a bitcast as shown here.
llvm-svn: 343794
We're a long way from D50992 and D51553, but this is where we have to start.
We weren't back-propagating undefs into binop constant values for anything but
add/sub/mul/and/or/xor.
This is likely because we have to be careful about not introducing UB/poison
with div/rem/shift. But I suspect we already are getting the poison part wrong
for add/sub/mul (although it may not be possible to expose the bug currently
because we use SimplifyDemandedVectorElts from a limited set of opcodes).
See the discussion/implementation from D48987 and D49047.
This patch just enables functionality for FP ops because those do not have
UB/poison potential.
llvm-svn: 343727
1. Fix include ordering.
2. Improve variable name (width is bitwidth not number-of-elements).
3. Add local Opcode variable to reduce code duplication.
llvm-svn: 343694
Modified the testcases to use both pass managers
Use single commandline flag for both pass managers.
Differential Revision: https://reviews.llvm.org/D52708
Reviewers: sebpop, tejohnson, brzycki, SirishP
Reviewed By: tejohnson, brzycki
llvm-svn: 343662
This is an attempt to get out of a local-minimum that instcombine currently
gets stuck in. We essentially combine two optimisations at once, ~a - ~b = b-a
and min(~a, ~b) = ~max(a, b), only doing the transform if the result is at
least neutral. This involves using IsFreeToInvert, which has been expanded a
little to include selects that can be easily inverted.
This is trying to fix PR35875, using the ideas from Sanjay. It is a large
improvement to one of our rgb to cmy kernels.
Differential Revision: https://reviews.llvm.org/D52177
llvm-svn: 343569
getNumUses is linear in the number of uses. Since we're looking for a specific use count, we can use hasNUses which will stop as soon as it determines there are more than N uses instead of walking all of them.
llvm-svn: 343550
This reverts commit r342387 as it's showing significant performance
regressions in a number of benchmarks. Followed up with the
committer and original thread with an example and will get performance
numbers before recommitting.
llvm-svn: 343522
Summary:
This is a continuation of the fix for PR34627 "InstCombine assertion at vector gep/icmp folding". (I just realized bugpoint had fuzzed the original test for me, so I had fixed another trigger of the same assert in adjacent code in InstCombine.)
This patch avoids optimizing an icmp (to look only at the base pointers) when the resulting icmp would have a different type.
The patch adds a testcase and also cleans up and shrinks the pre-existing test for the adjacent assert trigger.
Reviewers: lebedev.ri, majnemer, spatel
Reviewed By: lebedev.ri
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52494
llvm-svn: 343486
This was originally committed at rL343407, but reverted at
rL343458 because it crashed trying to handle a case where
the destination type is FP. This version of the patch adds
a check for that possibility. Tests added at rL343480.
Original commit message:
This transform is requested for the backend in:
https://bugs.llvm.org/show_bug.cgi?id=39016
...but I figured it was worth doing in IR too, and it's probably
easier to implement here, so that's this patch.
In the simplest case, we are just truncating a scalar value. If the
extract index doesn't correspond to the LSBs of the scalar, then we
have to shift-right before the truncate. Endian-ness makes this tricky,
but hopefully the ASCII-art helps visualize the transform.
Differential Revision: https://reviews.llvm.org/D52439
llvm-svn: 343482
This caused Chromium builds to fail with "Illegal Trunc" assertion.
See https://crbug.com/890723 for repro.
> This transform is requested for the backend in:
> https://bugs.llvm.org/show_bug.cgi?id=39016
> ...but I figured it was worth doing in IR too, and it's probably
> easier to implement here, so that's this patch.
>
> In the simplest case, we are just truncating a scalar value. If the
> extract index doesn't correspond to the LSBs of the scalar, then we
> have to shift-right before the truncate. Endian-ness makes this tricky,
> but hopefully the ASCII-art helps visualize the transform.
>
> Differential Revision: https://reviews.llvm.org/D52439
llvm-svn: 343458
There are a few leftovers in rL343163 which span two lines. This commit
changes these llvm::sort(C.begin(), C.end, ...) to llvm::sort(C, ...)
llvm-svn: 343426
This transform is requested for the backend in:
https://bugs.llvm.org/show_bug.cgi?id=39016
...but I figured it was worth doing in IR too, and it's probably
easier to implement here, so that's this patch.
In the simplest case, we are just truncating a scalar value. If the
extract index doesn't correspond to the LSBs of the scalar, then we
have to shift-right before the truncate. Endian-ness makes this tricky,
but hopefully the ASCII-art helps visualize the transform.
Differential Revision: https://reviews.llvm.org/D52439
llvm-svn: 343407
As noted in post-commit comments for D52548, the limitation on
increasing vector length can be applied by opcode.
As a first step, this patch only allows insertelement to be
widened because that has no logical downsides for IR and has
little risk of pessimizing codegen.
This may cause PR39132 to go into hiding during a full compile,
but that bug is not fixed.
llvm-svn: 343406
Summary: This patch adds bindings to C and Go for addCoroutinePassesToExtensionPoints, which is used to add coroutine passes to the correct locations in PassManagerBuilder.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: mehdi_amini, modocache, llvm-commits
Differential Revision: https://reviews.llvm.org/D51642
llvm-svn: 343336
InstCombine would propagate shufflevector insts that had wider output vectors onto
predecessors, which would sometimes push undef's onto the divisor of a div/rem and
result in bad codegen.
I've fixed this by just banning propagating shufflevector back if the result of
the shufflevector is wider than the input vectors.
Patch by: @sheredom (Neil Henning)
Differential Revision: https://reviews.llvm.org/D52548
llvm-svn: 343329
This patch turns LoopInterchange into a loop pass. It now only
considers top-level loops and tries to move the innermost loop to the
optimal position within the loop nest. By only looking at top-level
loops, we might miss a few opportunities the function pass would get
(e.g. if we have a loop nest of 3 loops, in the function pass
we might process loops at level 1 and 2 and move the inner most loop to
level 1, and then we process loops at levels 0, 1, 2 and interchange
again, because we now have a different inner loop). But I think it would
be better to handle such cases by picking the best inner loop from the
start and avoid re-visiting the same loops again.
The biggest advantage of it being a function pass is that it interacts
nicely with the other loop passes. Without this patch, there are some
performance regressions on AArch64 with loop interchanging enabled,
where no loops were interchanged, but we missed out on some other loop
optimizations.
It also removes the SimplifyCFG run. We are just changing branches, so
the CFG should not be more complicated, besides the additional 'unique'
preheaders this pass might create.
Reviewers: chandlerc, efriedma, mcrosier, javed.absar, xbolva00
Reviewed By: xbolva00
Differential Revision: https://reviews.llvm.org/D51702
llvm-svn: 343308
When C is not zero and infinites are not allowed (C / X) > 0 is a sign
test. Depending on the sign of C, the predicate must be swapped.
E.g.:
foo(double X) {
if ((-2.0 / X) <= 0) ...
}
=>
foo(double X) {
if (X >= 0) ...
}
Patch by: @marels (Martin Elshuber)
Differential Revision: https://reviews.llvm.org/D51942
llvm-svn: 343228
Summary:
Add a dominance check to ensure that the possible devirtualizable
call is actually dominated by the type test/checked load intrinsic being
analyzed. With PGO, after indirect call promotion is performed during
the compile step, followed by inlining, we may have a type test in the
promoted and inlined sequence that allows an indirect call in that
sequence to be devirtualized. That indirect call (inserted by inlining
after promotion) will share the same vtable pointer as the fallback
indirect call that cannot be devirtualized.
Before this patch the code was incorrectly devirtualizing the fallback
indirect call.
See the new test and the example described there for more details.
Reviewers: pcc, vitalybuka
Subscribers: mehdi_amini, Prazek, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D52514
llvm-svn: 343226
This patch extends LoopInterchange to move LCSSA to the right place
after interchanging. This is required for LoopInterchange to become a
function pass.
An alternative to the manual moving of the PHIs, we could also re-form
the LCSSA phis for a set of interchanged loops, but that's more
expensive.
Reviewers: efriedma, mcrosier, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D52154
llvm-svn: 343132
This reverts commit bd7b44f35ee9fbe365eb25ce55437ea793b39346.
Reland r342994: disabled the optimization and explicitly enable it in test.
-mllvm -consthoist-min-num-to-rebase<unsigned>=0
[ConstHoist] Do not rebase single (or few) dependent constant
If an instance (InsertionPoint or IP) of Base constant A has only one or few
rebased constants depending on it, do NOT rebase. One extra ADD instruction is
required to materialize each rebased constant, assuming A and the rebased have
the same materialization cost.
Differential Revision: https://reviews.llvm.org/D52243
llvm-svn: 343053
Summary:
We are overly conservative in loop vectorizer with respect to stores to loop
invariant addresses.
More details in https://bugs.llvm.org/show_bug.cgi?id=38546
This is the first part of the fix where we start with vectorizing loop invariant
values to loop invariant addresses.
This also includes changes to ORE for stores to invariant address.
Reviewers: anemet, Ayal, mkuper, mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50665
llvm-svn: 343028
If an instance (InsertionPoint or IP) of Base constant A has only one or few
rebased constants depending on it, do NOT rebase. One extra ADD instruction is
required to materialize each rebased constant, assuming A and the rebased have
the same materialization cost.
Differential Revision: https://reviews.llvm.org/D52243
llvm-svn: 342994
The motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=33026
...has no shuffles now. This kind of pattern may occur during
vectorization when targets have lumpy ISAs like SSE/AVX.
llvm-svn: 342988
In this patch, I'm adding an extra check to the Latch's terminator in llvm::UnrollRuntimeLoopRemainder,
similar to how it is already done in the llvm::UnrollLoop.
The compiler would crash if this function is called with a malformed loop.
Patch by Rodrigo Caetano Rocha!
Differential Revision: https://reviews.llvm.org/D51486
llvm-svn: 342958
Summary:
Display a list of recent stack frames (not a stack trace!) when
tag-mismatch is detected on a stack address.
The implementation uses alignment tricks to get both the address of
the history buffer, and the base address of the shadow with a single
8-byte load. See the comment in hwasan_thread_list.h for more
details.
Developed in collaboration with Kostya Serebryany.
Reviewers: kcc
Subscribers: srhines, kubamracek, mgorny, hiraditya, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D52249
llvm-svn: 342923
Summary:
Display a list of recent stack frames (not a stack trace!) when
tag-mismatch is detected on a stack address.
The implementation uses alignment tricks to get both the address of
the history buffer, and the base address of the shadow with a single
8-byte load. See the comment in hwasan_thread_list.h for more
details.
Developed in collaboration with Kostya Serebryany.
Reviewers: kcc
Subscribers: srhines, kubamracek, mgorny, hiraditya, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D52249
llvm-svn: 342921
We can handle patterns where the elements have different
sizes, so refactoring ahead of trying to add another blob
within these clauses.
llvm-svn: 342918
'width' of a vector usually refers to the bit-width.
https://bugs.llvm.org/show_bug.cgi?id=39016
shows a case where we could extend this fold to handle
a case where the number of elements in the bitcasted
vector is not equal to the resulting value.
llvm-svn: 342902
DeadArgElim pass marks unused function arguments as ‘undef’ without updating
existing dbg.values referring to it. As a consequence the debug info
metadata in the final executable was wrong.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D51968
llvm-svn: 342871
Follow-up to rL342324 (D52059):
Missing optimizations with blendv are shown in:
https://bugs.llvm.org/show_bug.cgi?id=38814
This is an easier and more powerful solution than adding pattern matching for a few
special cases in the backend. The potential danger with this transform in IR is that
the condition value can get separated from the select, and the backend might not be
able to make a blendv out of it again.
llvm-svn: 342806
Summary: This restores the combine that was reverted in r341883. The infinite loop from the failing test no longer occurs due to changes from r342163.
Reviewers: spatel, dmgreen
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52070
llvm-svn: 342797
Support for vectorizing loops with secondary floating-point induction
variables was added in r276554. A primary integer IV is still required
for vectorization to be done. If an FP IV was found, but no integer IV
was found at all (primary or secondary), the attempt to vectorize still
went forward, causing a compiler-crash. This change abandons that
attempt when no integer IV is found. (Vectorizing FP-only cases like
this, rather than bailing out, is discussed as possible future work
in D52327.)
See PR38800 for more information.
Differential Revision: https://reviews.llvm.org/D52327
llvm-svn: 342786
Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.
As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll
Differential Revision: https://reviews.llvm.org/D52221
llvm-svn: 342722
Summary:
his code was in CGDecl.cpp and really belongs in LLVM's isBytewiseValue. Teach isBytewiseValue the tricks clang's isRepeatedBytePattern had, including merging undef properly, and recursing on more types.
clang part of this patch: D51752
Subscribers: dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51751
llvm-svn: 342709
Summary:
AvailableExternal was not handled in isDiscardableIfUnused when isDiscardableIfUnused
was added in r158476. Till it was handled in r247044. This is a NFC.
Reviewers: pcc, tejohnson
Reviewed By: tejohnson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52319
llvm-svn: 342684
Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@
The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.
Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
and access to them.
* PassInstrumentation class that handles instrumentation-point interfaces
that call into PassInstrumentationCallbacks.
* Callbacks accept StringRef which is just a name of the Pass right now.
There were some ideas to pass an opaque wrapper for the pointer to pass instance,
however it appears that pointer does not actually identify the instance
(adaptors and managers might have the same address with the pass they govern).
Hence it was decided to go simple for now and then later decide on what the proper
mental model of identifying a "pass in a phase of pipeline" is.
* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
on different IRUnits (e.g. Analyses).
* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
usual AnalysisManager::getResult. All pass managers were updated to run that
to get PassInstrumentation object for instrumentation calls.
* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
args out of a generic PassManager's extra args. This is the only way I was able to explicitly
run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
RepeatedPass::run.
TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
and then get rid of getAnalysisResult by improving RepeatedPass implementation.
* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
PassInstrumentationAnalysis. Callbacks registration should be performed directly
through PassInstrumentationCallbacks.
* new-pm tests updated to account for PassInstrumentationAnalysis being run
* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.
Made getName helper to return std::string (instead of StringRef initially) to fix
asan builtbot failures on CGSCC tests.
Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858
llvm-svn: 342664
Summary:
Before removing basic blocks that ipsccp has considered as dead
all uses of the basic block label must be removed. That is done
by calling ConstantFoldTerminator on the users. An exception
is when the branch condition is an undef value. In such
scenarios ipsccp is using some internal assumptions regarding
which edge in the control flow that should remain, while
ConstantFoldTerminator don't know how to fold the terminator.
The problem addressed here is related to ConstantFoldTerminator's
ability to rewrite a 'switch' into a conditional 'br'. In such
situations ConstantFoldTerminator returns true indicating that
the terminator has been rewritten. However, ipsccp treated the
true value as if the edge to the dead basic block had been
removed. So the code for resolving an undef branch condition
did not trigger, and we ended up with assertion that there were
uses remaining when deleting the basic block.
The solution is to resolve indeterminate branches before the
call to ConstantFoldTerminator.
Reviewers: efriedma, fhahn, davide
Reviewed By: fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52232
llvm-svn: 342632
Summary:
Some lines have a hit counter where they should not have one.
For example, in C++, some cleanup is adding at the end of a scope represented by a '}'.
So such a line has a hit counter where a user expects to not have one.
The goal of the patch is to add this information in DILocation which is used to get the covered lines in GCOVProfiling.cpp.
A following patch in clang will add this information when generating IR (https://reviews.llvm.org/D49916).
Reviewers: marco-c, davidxl, vsk, javed.absar, rnk
Reviewed By: rnk
Subscribers: eraman, xur, danielcdh, aprantl, rnk, dblaikie, #debug-info, vsk, llvm-commits, sylvestre.ledru
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D49915
llvm-svn: 342631
Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@
The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.
Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
and access to them.
* PassInstrumentation class that handles instrumentation-point interfaces
that call into PassInstrumentationCallbacks.
* Callbacks accept StringRef which is just a name of the Pass right now.
There were some ideas to pass an opaque wrapper for the pointer to pass instance,
however it appears that pointer does not actually identify the instance
(adaptors and managers might have the same address with the pass they govern).
Hence it was decided to go simple for now and then later decide on what the proper
mental model of identifying a "pass in a phase of pipeline" is.
* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
on different IRUnits (e.g. Analyses).
* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
usual AnalysisManager::getResult. All pass managers were updated to run that
to get PassInstrumentation object for instrumentation calls.
* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
args out of a generic PassManager's extra args. This is the only way I was able to explicitly
run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
RepeatedPass::run.
TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
and then get rid of getAnalysisResult by improving RepeatedPass implementation.
* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
PassInstrumentationAnalysis. Callbacks registration should be performed directly
through PassInstrumentationCallbacks.
* new-pm tests updated to account for PassInstrumentationAnalysis being run
* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.
Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858
llvm-svn: 342597
Summary:
Same as to D52146.
`((1 << y)+(-1))` is simply non-canoniacal version of `~(-1 << y)`: https://rise4fun.com/Alive/0vl
We can not canonicalize it due to the extra uses. But we can handle it here.
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52147
llvm-svn: 342547
Summary:
Two folds are happening here:
1. https://rise4fun.com/Alive/oaFX
2. And then `foldICmpWithHighBitMask()` (D52001): https://rise4fun.com/Alive/wsP4
This change doesn't just add the handling for eq/ne predicates,
it actually builds upon the previous `foldICmpWithLowBitMaskedVal()` work,
so **all** the 16 fold variants* are immediately supported.
I'm indeed only testing these two predicates.
I do not feel like re-proving all 16 folds*, because they were already proven
for the general case of constant with all-ones in low bits. So as long as
the mask produces all-ones in low bits, i'm pretty sure the fold is valid.
But required, i can re-prove, let me know.
* eq/ne are commutative - 4 folds; ult/ule/ugt/uge - are not commutative (the commuted variant is InstSimplified), 4 folds; slt/sle/sgt/sge are not commutative - 4 folds. 12 folds in total.
https://bugs.llvm.org/show_bug.cgi?id=38123https://bugs.llvm.org/show_bug.cgi?id=38708
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52146
llvm-svn: 342546
Summary:
Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@
The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.
Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
and access to them.
* PassInstrumentation class that handles instrumentation-point interfaces
that call into PassInstrumentationCallbacks.
* Callbacks accept StringRef which is just a name of the Pass right now.
There were some ideas to pass an opaque wrapper for the pointer to pass instance,
however it appears that pointer does not actually identify the instance
(adaptors and managers might have the same address with the pass they govern).
Hence it was decided to go simple for now and then later decide on what the proper
mental model of identifying a "pass in a phase of pipeline" is.
* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
on different IRUnits (e.g. Analyses).
* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
usual AnalysisManager::getResult. All pass managers were updated to run that
to get PassInstrumentation object for instrumentation calls.
* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
args out of a generic PassManager's extra args. This is the only way I was able to explicitly
run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
RepeatedPass::run.
TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
and then get rid of getAnalysisResult by improving RepeatedPass implementation.
* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
PassInstrumentationAnalysis. Callbacks registration should be performed directly
through PassInstrumentationCallbacks.
* new-pm tests updated to account for PassInstrumentationAnalysis being run
* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.
Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858
llvm-svn: 342544
This is still unsafe for long double, we will transform things into tanl
even if tanl is for another type. But that's for someone else to fix.
llvm-svn: 342542
When SimplifyCFG changes the PHI node into a select instruction, the debug information becomes ambiguous. It causes the debugger to display wrong variable value.
Differential Revision: https://reviews.llvm.org/D51976
llvm-svn: 342527
Summary:
Currently only the first function in the module is checked to
see if it has remarks enabled. If that first function is a declaration,
remarks will be incorrectly skipped. Change to look for the first
non-empty function.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51556
llvm-svn: 342477
Summary:
Adds LLVMAddUnifyFunctionExitNodesPass to expose
createUnifyFunctionExitNodesPass to the C and OCaml APIs.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52212
llvm-svn: 342476
A piece of logic in rewriteLoopExitValues has a weird check on number of
users which allowed an unprofitable transform in case if an instruction has
more than 6 users.
Differential Revision: https://reviews.llvm.org/D51404
Reviewed By: etherzhhb
llvm-svn: 342444
This was checking the hardcoded address space 0 for the stack.
Additionally, this should be checking for legality with
the adjusted alignment, so defer the alignment check.
Also try to split if the unaligned access isn't allowed.
llvm-svn: 342442
Summary:
EarlyCSE can make IR changes that will leave MemorySSA with accesses claiming to be optimized, but for which a subsequent MemorySSA run will yield a different optimized result.
Due to relying on AA queries, we can't fix this in general, unless we recompute MemorySSA.
Adding some tests to track this and a basic verify for future potential failures.
Reviewers: george.burgess.iv, gberry
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D51960
llvm-svn: 342422
Rebase rL341954 since https://bugs.llvm.org/show_bug.cgi?id=38912
has been fixed by rL342055.
Precommit testing performed:
* Overnight runs of csmith comparing the output between programs
compiled with gvn-hoist enabled/disabled.
* Bootstrap builds of clang with UbSan/ASan configurations.
llvm-svn: 342387
Summary:
If the sub doesn't overflow in the original type we can move it above the sext/zext.
This is similar to what we do for add. The overflow checking for sub is currently weaker than add, so the test cases are constructed for what is supported.
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52075
llvm-svn: 342335
Missing optimizations with blendv are shown in:
https://bugs.llvm.org/show_bug.cgi?id=38814
If this works, it's an easier and more powerful solution than adding pattern matching
for a few special cases in the backend. The potential danger with this transform in IR
is that the condition value can get separated from the select, and the backend might
not be able to make a blendv out of it again. I don't think that's too likely, but
I've kept this patch minimal with a 'TODO', so we can test that theory in the wild
before expanding the transform.
Differential Revision: https://reviews.llvm.org/D52059
llvm-svn: 342324
Similar to rL342278:
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.
D52075 should be able to use this code too rather than
duplicating all of the logic.
llvm-svn: 342292
The patch saves a function offset table which maps function name index to the
offset of its function profile to the start of the binary profile. By using
the function offset table, for those function profiles which will not be used
when compiling a module, the profile reader does't have to read them. For
profile size around 10~20M, it saves ~10% compile time.
Differential Revision: https://reviews.llvm.org/D51863
llvm-svn: 342283
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.
llvm-svn: 342278
The test used to fail with an invalid phi node: the two predecessors were outlined
and the SSA representation was left invalid. The patch adds the exit block to the
cold region.
llvm-svn: 342277
remove duplicate entries from isSingleEntrySingleExit: the Entry block is
already added by the loop over the dominance frontier.
Remove the heuristic from isOutlineCandidate that a region is too small when it
only contains a basic block. With this change we now grow regions starting from
a block and we continue adding to the ValidColdRegion. Check the heuristic just
before code generation.
llvm-svn: 342276
Also fix a problem in forward propagation:
const TerminatorInst *TI = It->getTerminator();
was set outside the while loop that iterates over It.
llvm-svn: 342275
As preparation for LoopInterchange becoming a loop pass, it needs to
preserve ScalarEvolution. Even though interchanging should not change
the trip count of the loop, it modifies loop entry, latch and exit
blocks.
I added -verify-scev to some loop interchange tests, but the verification does
not catch problems caused by missing invalidation of SE in loop interchange, as
the trip counts themselves do not change. So there might be potential to
make the SE verification covering more stuff in the future.
Reviewers: mkazantsev, efriedma, karthikthecool
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D52026
llvm-svn: 342209
Summary:
[VPlan] Implement vector code generation support for simple outer loops.
Context: Patch Series #1 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following:
- force vector code generation using explicit vectorize_width
- add conservative early returns in cost model and other places for VPlanNativePath
- add code for setting up outer loop inductions
- support for widening non-induction PHIs that can result from inner loops and uniform conditional branches
- support for generating uniform inner branches
We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer.
Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal
Reviewed By: fhahn, hsaito
Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits
Differential Revision: https://reviews.llvm.org/D50820
llvm-svn: 342197
Summary:
Place global arrays in comdat sections with their associated functions.
This makes sure they are stripped along with the functions they
reference, even on the BFD linker.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51902
llvm-svn: 342186
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only n bits wide (where n is a variable.)
There are many ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)
More complicated, canonical pattern:
https://rise4fun.com/Alive/uhA
We do need to have two `switch()`'es like this,
to not mismatch the swappable predicates.
https://bugs.llvm.org/show_bug.cgi?id=38708
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52001
llvm-svn: 342173
This adds DebugCounter support to the PartiallyInlineLibCalls pass,
which should make debugging/automated bisection easier in the future.
Patch by Zhizhou Yang!
Differential Revision: https://reviews.llvm.org/D50093
llvm-svn: 342172
This allows the xor to be removed completely.
This might help with recomitting r341674, but seems good regardless.
Coincidentally fixes PR38915.
Differential Revision: https://reviews.llvm.org/D51964
llvm-svn: 342163
I accidentally committed this diff with rL342147 because
I had applied D51964. We probably do need those checks,
but D51964 has tests and more discussion/motivation,
so they should be re-added with that patch.
llvm-svn: 342149
I don't have a test case for this, but it's motivated by
the discussion in D51964, and I've added TODO comments for
the better fix - move simplifications into instsimplify
because that's more efficient and reduces risk of infinite
loops in instcombine caused by transforms trying to do the
opposite folds.
In this case, we know that the transform that tries to move
'not' through min/max can be fooled by the multiple uses
of a value in another min/max, so try to squash the
foldSPFofSPF() patterns first.
llvm-svn: 342147
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only `n` bits wide (where `n` is a variable.)
There are **many** ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)
Let's handle the second variant first, since it is much simpler.
https://rise4fun.com/Alive/LYjYhttps://bugs.llvm.org/show_bug.cgi?id=38708
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51985
llvm-svn: 342067
Fix for https://bugs.llvm.org/show_bug.cgi?id=38912.
In GVNHoist::computeInsertionPoints() we iterate over the Value
Numbers and calculate the Iterated Dominance Frontiers without
clearing the IDFBlocks vector first. IDFBlocks ends up accumulating
an insane number of basic blocks, which bloats the compilation time
of SemaChecking.cpp with ubsan enabled.
Differential Revision: https://reviews.llvm.org/D51980
llvm-svn: 342055
Previously the alignment on the newly created switch table data was not set,
meaning that DataLayout::getPreferredAlignment was free to overalign it to 16
bytes. This causes unnecessary code bloat.
Differential Revision: https://reviews.llvm.org/D51800
llvm-svn: 342039
Move the 2 classes out of LoopVectorize.cpp to make it easier to re-use
them for VPlan outside LoopVectorize.cpp
Reviewers: Ayal, mssimpso, rengolin, dcaballe, mkuper, hsaito, hfinkel, xbolva00
Reviewed By: rengolin, xbolva00
Differential Revision: https://reviews.llvm.org/D49488
llvm-svn: 342027
Summary:
The InductionDescriptor and RecurrenceDescriptor classes basically analyze the IR to identify the respective IVs. So, it is better to have them in the "Analysis" directory instead of the "Transforms" directory.
The rationale for this is to make the Induction and Recurrence descriptor classes available for analysis passes. Currently including them in an analysis pass produces link error (http://lists.llvm.org/pipermail/llvm-dev/2018-July/124456.html).
Induction and Recurrence descriptors are moved from Transforms/Utils/LoopUtils.h|cpp to Analysis/IVDescriptors.h|cpp.
Reviewers: dmgreen, llvm-commits, hfinkel
Reviewed By: dmgreen
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D51153
llvm-svn: 342016
Name: op_ugt_sum
%a = add i8 %x, %y
%r = icmp ugt i8 %x, %a
=>
%notx = xor i8 %x, -1
%r = icmp ugt i8 %y, %notx
Name: sum_ult_op
%a = add i8 %x, %y
%r = icmp ult i8 %a, %x
=>
%notx = xor i8 %x, -1
%r = icmp ugt i8 %y, %notx
https://rise4fun.com/Alive/ZRxI
AFAICT, this doesn't interfere with any add-saturation patterns
because those have >1 use for the 'add'. But this should be
better for IR analysis and codegen in the basic cases.
This is another fold inspired by PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613
llvm-svn: 342004
This reverts rL341954.
The builder `sanitizer-x86_64-linux-bootstrap-ubsan` has been
failing with timeouts at stage2 clang/ubsan:
[3065/3073] Linking CXX executable bin/lld
command timed out: 1200 seconds without output running python
../sanitizer_buildbot/sanitizers/buildbot_selector.py,
attempting to kill
llvm-svn: 342001
These are the folds in Alive;
Name: xor_ult
Pre: isPowerOf2(-C1)
%xor = xor i8 %x, C1
%r = icmp ult i8 %xor, C1
=>
%r = icmp ugt i8 %x, ~C1
Name: xor_ugt
Pre: isPowerOf2(C1+1)
%xor = xor i8 %x, C1
%r = icmp ugt i8 %xor, C1
=>
%r = icmp ugt i8 %x, C1
https://rise4fun.com/Alive/Vty
The ugt case in its simplest form was already handled by DemandedBits,
but that's not ideal as shown in the multi-use test.
I'm not sure if these are all of the symmetrical folds, but I adjusted
the existing code for one of the folds to try to show the similarities.
There's no obvious connection, but this is another preliminary step
for PR14613...
https://bugs.llvm.org/show_bug.cgi?id=14613
llvm-svn: 341997
Summary:
Place global arrays in comdat sections with their associated functions.
This makes sure they are stripped along with the functions they
reference, even on the BFD linker.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51902
llvm-svn: 341987
Summary:
Update MemorySSA in old LoopUnswitch pass.
Actual dependency and update is disabled by default.
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D45301
llvm-svn: 341984
I noticed that we were not back-propagating undef lanes to shuffle masks when we have a
shuffle that reduces the vector width. This is part of investigating/solving PR38691:
https://bugs.llvm.org/show_bug.cgi?id=38691
The DAG equivalent was proposed with:
D51696
Differential Revision: https://reviews.llvm.org/D51433
llvm-svn: 341981
Right now, the counters are added in regards of the number of successors
for a given BasicBlock: it's good when we've only 1 or 2 successors (at
least with BranchInstr). But in the case of a switch statement, the
BasicBlock after switch has several predecessors and we need know from
which BB we're coming from.
So the idea is to revert what we're doing: add a PHINode in each block
which will select the counter according to the incoming BB. They're
several pros for doing that:
- we fix the "switch" bug
- we remove the function call to "__llvm_gcov_indirect_counter_increment"
and the lookup table stuff
- we replace by PHINodes, so the optimizer will probably makes a better
job.
Patch by calixte!
Differential Revision: https://reviews.llvm.org/D51619
llvm-svn: 341977
For vectors, getPrimitiveSizeInBits returns the full vector width. This code should using the element size for vectors. This could be fixed by calling getScalarSizeInBits, but its even easier to just get it from the APInt we're checking.
Differential Revision: https://reviews.llvm.org/D51938
llvm-svn: 341971
There are 2 cases when we create PHI nodes:
* For the result of the call that was duplicated in the split blocks.
Those PHI nodes should have the debug location of the call.
* For values produced before the call. Those instructions need to be
duplicated in the split blocks and the PHI nodes should have the
debug locations of those instructions.
Fixes PR37962.
Reviewers: junbuml, gbedwell, vsk
Reviewed By: junbuml
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D51919
llvm-svn: 341970
Summary:
Place global arrays in comdat sections with their associated functions.
This makes sure they are stripped along with the functions they
reference, even on the BFD linker.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51902
llvm-svn: 341951
The presence of readnone and an access range attribute (argmemonly,
inaccessiblememonly, inaccessiblemem_or_argmemonly) is considered an
error by the verifier. This seems strict but also not wrong. This
patch makes sure function attribute detection will remove all access
range attributes for readnone functions.
llvm-svn: 341927
Currently we re-use cached info from sub loops or traverse them
to populate AliasSetTracker. But after that we traverse all basic blocks
from the current loop. This is redundant work.
All what we need is traversing the all basic blocks from the loop except
those which are used to get the data from the cache.
This should improve compile time only.
Reviewers: mkazantsev, reames, kariddi, anna
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51715
llvm-svn: 341896
IndVarSimplify's design is somewhat odd in the way how it reports that
some transform has made a change. It has a `Changed` field which can
be set from within any function, which makes it hard to track whether or
not it was set properly after a transform was made. It leads to oversights
in setting this flag where needed, see example in PR38855.
This patch removes the `Changed` field, turns it into a local and unifies
the signatures of all relevant transform functions to return boolean value
which designates whether or not this transform has made a change.
Differential Revision: https://reviews.llvm.org/D51850
Reviewed By: skatkov
llvm-svn: 341893
I'd made exactly this same change before, but it appears to have been accidentally reverted in another change. (I'm assuming accidental since it was without comment or test case, and in an unrelated change.)
llvm-svn: 341892
Summary:
Revert min/max changes in rL341674 dues to high compile times causing timeouts (PR38897).
Checking in to unblock failing builds. Patch available for post-commit review and re-revert once resolved.
Working on a smaller reproducer for PR38897.
Reviewers: craig.topper, spatel
Subscribers: sanjoy, jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D51897
llvm-svn: 341883
Before tagging a function with coldcc make sure the target supports cold calling
convention. Without this patch HotColdSplitting pass fails on aarch64 with:
fatal error: error in backend: Unsupported calling convention.
llvm-svn: 341838
There were two combines not covered by the check before now, neither of which
actually differed from normal in the benefit analysis.
The most recent seems to be because it was just added at the top of the
function (naturally). The older is from way back in 2008 (r46687) when we just
didn't put those checks in so routinely, and has been diligently maintained
since.
llvm-svn: 341831
When GVN propagates an equality by replacing one value with another it also
needs to invalidate the cached information for the value being replaced.
Differential Revision: https://reviews.llvm.org/D51218
llvm-svn: 341820
Currently, `rewriteFirstIterationLoopExitValues` does not set Changed flag even if it makes
changes in the IR. There is no clear evidence that it can cause a crash, but it
looks highly suspicious and likely invalid.
Differential Revision: https://reviews.llvm.org/D51779
Reviewed By: skatkov
llvm-svn: 341779
Currently, `sinkUnusedInvariants` does not set Changed flag even if it makes
changes in the IR. There is no clear evidence that it can cause a crash, but it
looks highly suspicious and likely invalid.
Differential Revision: https://reviews.llvm.org/D51777
Reviewed By: skatkov
llvm-svn: 341777
Summary:
Move InductionDescriptor::transform() routine from LoopUtils to its only uses in LoopVectorize.cpp.
Specifically, the function is renamed as InnerLoopVectorizer::emitTransformedIndex().
This is a child to D51153.
Reviewers: dmgreen, llvm-commits
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D51837
llvm-svn: 341776
Summary:
Block splitting is done with either identical edges being merged, or not.
Only critical edges can be split without merging identical edges based on an option.
Teach the memoryssa updater to take this into account: for the same edge between two blocks only move one entry from the Phi in Old to the new Phi in New.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D51563
llvm-svn: 341709
shuf (sel (shuf NarrowCond, undef, WideMask), X, Y), undef, NarrowMask) -->
sel NarrowCond, (shuf X, undef, NarrowMask), (shuf Y, undef, NarrowMask)
The motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=38691
...is the last regression test. In that case, we're just left with the narrow select.
Note that if we do create new shuffles, they use the existing extraction identity mask,
so there's no danger that this transform creates arbitrary shuffles.
Differential Revision: https://reviews.llvm.org/D51496
llvm-svn: 341708
Summary:
Do away with demangling. It wasn't really necessary.
Declared some local functions to be static.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51740
llvm-svn: 341681
If the ~X wasn't able to simplify above the max/min, we might be able to simplify it by moving it below the max/min.
I had to modify the ~(min/max ~X, Y) transform to prevent getting stuck in a loop when we saw the new ~(max/min X, ~Y) before the ~Y had been folded away to remove the new not.
Differential Revision: https://reviews.llvm.org/D51398
llvm-svn: 341674
Fix a latent bug in loop vectorizer which generates incorrect code for
memory accesses that are executed conditionally. As pointed in review,
this bug definitely affects uniform loads and may affect conditional
stores that should have turned into scatters as well).
The code gen for conditionally executed uniform loads on architectures
that support masked gather instructions is broken.
Without this patch, we were unconditionally executing the *conditional*
load in the vectorized version.
This patch does the following:
1. Uniform conditional loads on architectures with gather support will
have correct code generated. In particular, the cost model
(setCostBasedWideningDecision) is fixed.
2. For the recipes which are handled after the widening decision is set,
we use the isScalarWithPredication(I, VF) form which is added in the
patch.
3. Fix the vectorization cost model for scalarization
(getMemInstScalarizationCost): implement and use isPredicatedInst to
identify *all* predicated instructions, not just scalar+predicated. So,
now the cost for scalarization will be increased for maskedloads/stores
and gather/scatter operations. In short, we should be choosing the
gather/scatter in place of scalarization on archs where it is
profitable.
4. We needed to weaken the assert in useEmulatedMaskMemRefHack.
Reviewers: Ayal, hsaito, mkuper
Differential Revision: https://reviews.llvm.org/D51313
llvm-svn: 341673
Find cold blocks based on profile information (or optionally with static analysis).
Forward propagate profile information to all cold-blocks.
Outline a cold region.
Set calling conv and prof hint for the callsite of the outlined function.
Worked in collaboration with: Sebastian Pop <s.pop@samsung.com>
Differential Revision: https://reviews.llvm.org/D50658
llvm-svn: 341669
If OtherOpT or OtherOpF have scalar types and the condition is a vector,
we would create an invalid select.
Reviewers: spatel, john.brawn, mssimpso, craig.topper
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D51781
llvm-svn: 341666
Currently eliminateInstructions only returns true if any instruction got
replaced. In the test case for this patch, we eliminate the trivially
dead calls, for which eliminateInstructions not do a replacement and the
function is not marked as changed, which is why the inliner crashes
while traversing the call graph.
Alternatively we could also change eliminateInstructions to return true
in case we mark instructions for deletion, but that's slightly more code
and doing it at the place where the replacement happens seems safer.
Fixes PR37517.
Reviewers: davide, mcrosier, efriedma, bjope
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D51169
llvm-svn: 341651
Introduce the -msan-kernel flag, which enables the kernel instrumentation.
The main differences between KMSAN and MSan instrumentations are:
- KMSAN implies msan-track-origins=2, msan-keep-going=true;
- there're no explicit accesses to shadow and origin memory.
Shadow and origin values for a particular X-byte memory location are
read and written via pointers returned by
__msan_metadata_ptr_for_load_X(u8 *addr) and
__msan_store_shadow_origin_X(u8 *addr, uptr shadow, uptr origin);
- TLS variables are stored in a single struct in per-task storage. A call
to a function returning that struct is inserted into every instrumented
function before the entry block;
- __msan_warning() takes a 32-bit origin parameter;
- local variables are poisoned with __msan_poison_alloca() upon function
entry and unpoisoned with __msan_unpoison_alloca() before leaving the
function;
- the pass doesn't declare any global variables or add global constructors
to the translation unit.
llvm-svn: 341637