Unlike its legacy counterpart new pass manager's LoopUnrollPass does
not provide any means to select which flavors of unroll to run
(runtime, peeling, partial), relying on global defaults.
In some cases having ability to run a restricted LoopUnroll that
does more than LoopFullUnroll is needed.
Introduced LoopUnrollOptions to select optional unroll behaviors.
Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing.
Reviewers: chandlerc, tejohnson
Differential Revision: https://reviews.llvm.org/D53440
llvm-svn: 345723
Summary:
Instead of writing boolean values temporarily into 32-bit VGPRs
if they are involved in PHIs or are observed from outside a loop,
we use bitwise masking operations to combine lane masks in a way
that is consistent with wave control flow.
Move SIFixSGPRCopies to before this pass, since that pass
incorrectly attempts to move SGPR phis to VGPRs.
This should recover most of the code quality that was lost with
the bug fix in "AMDGPU: Remove PHI loop condition optimization".
There are still some relevant cases where code quality could be
improved, in particular:
- We often introduce redundant masks with EXEC. Ideally, we'd
have a generic computeKnownBits-like analysis to determine
whether masks are already masked by EXEC, so we can avoid this
masking both here and when lowering uniform control flow.
- The criterion we use to determine whether a def is observed
from outside a loop is conservative: it doesn't check whether
(loop) branch conditions are uniform.
Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b
Reviewers: arsenm, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D53496
llvm-svn: 345719
Summary:
The optimization to early break out of loops if all threads are dead was
never fully implemented.
But the PHI node analyzing is actually causing a number of problems, so
remove all the extra code for it.
(This does actually regress code quality in a few places because it
ends up relying more heavily on phi's of i1, which we don't do a
great job with. However, since it fixes real bugs in the wild, we
should take this change. I have some prototype changes to improve
i1 lowering in general -- not just for control flow -- which should
help recover the code quality, I just need to make those changes
fit for general consumption. -- Nicolai)
Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1
Patch-by: Christian König <christian.koenig@amd.com>
Reviewers: arsenm, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D53359
llvm-svn: 345718
Before this patch, class PredicateExpander only knew how to expand simple
predicates that performed checks on instruction operands.
In particular, the new scheduling predicate syntax was not rich enough to
express checks like this one:
Foo(MI->getOperand(0).getImm()) == ExpectedVal;
Here, the immediate operand value at index zero is passed in input to function
Foo, and ExpectedVal is compared against the value returned by function Foo.
While this predicate pattern doesn't show up in any X86 model, it shows up in
other upstream targets. So, being able to support those predicates is
fundamental if we want to be able to modernize all the scheduling models
upstream.
With this patch, we allow users to specify if a register/immediate operand value
needs to be passed in input to a function as part of the predicate check. Now,
register/immediate operand checks all derive from base class CheckOperandBase.
This patch also changes where TIIPredicate definitions are expanded by the
instructon info emitter. Before, definitions were expanded in class
XXXGenInstrInfo (where XXX is a target name).
With the introduction of this new syntax, we may want to have TIIPredicates
expanded directly in XXXInstrInfo. That is because functions used by the new
operand predicates may only exist in the derived class (i.e. XXXInstrInfo).
This patch is a non functional change for the existing scheduling models.
In future, we will be able to use this richer syntax to better describe complex
scheduling predicates, and expose them to llvm-mca.
Differential Revision: https://reviews.llvm.org/D53880
llvm-svn: 345714
This removes the assertion that a copy of a moved-from SmallSetIterator
equals the original, which is illegal due to SmallSetIterator including
an instance of a standard `std::set` iterator.
C++ [iterator.requirements.general] states that comparing singular
iterators has undefined result:
> Iterators can also have singular values that are not associated with
> any sequence. [...] Results of most expressions are undefined for
> singular values; the only exceptions are destroying an iterator that
> holds a singular value, the assignment of a non-singular value to an
> iterator that holds a singular value, and, for iterators that satisfy
> the Cpp17DefaultConstructible requirements, using a value-initialized
> iterator as the source of a copy or move operation.
This assertion triggers the following error in the GNU C++ Library in
debug mode under EXPENSIVE_CHECKS:
/usr/include/c++/8.2.1/debug/safe_iterator.h:518:
Error: attempt to compare a singular iterator to a singular iterator.
Objects involved in the operation:
iterator "lhs" @ 0x0x7fff86420670 {
state = singular;
}
iterator "rhs" @ 0x0x7fff86420640 {
state = singular;
}
Patch by Eugene Sharygin.
Reviewers: fhahn, dblaikie, chandlerc
Reviewed By: fhahn, dblaikie
Differential Revision: https://reviews.llvm.org/D53793
llvm-svn: 345712
We need the install-liblldb-stripped target to depend on the
lldb-framework target in order for the installation to be guaranteed to
behave correctly, otherwise it's possible for the lldb-framework and
install-liblldb-stripped targets to run in parallel, resulting in
temporary or partially processed files being copied into the framework.
install-liblldb already depends on lldb-framework for this reason.
Differential Revision: https://reviews.llvm.org/D53917
llvm-svn: 345711
Our a16 support was only enabled for sample/gather and buffer
load/store, but not for image load/store operations (which take an i16
as the pixel index rather than a half).
Fix our isel lowering and add test cases to prove it out.
Differential Revision: https://reviews.llvm.org/D53750
llvm-svn: 345710
Calling it too early might cause dllimport to get inherited onto the
VarDecl before the initializer got attached. See the test case for an
example where this broke things.
llvm-svn: 345709
For some unclear reason rewriteLoopExitValues considers recalculation
after the loop profitable if it has some "soft uses" outside the loop (i.e. any
use other than call and return), even if we have proved that it has a user inside
the loop which we think will not be optimized away.
There is no existing unit test that would explain this. This patch provides an
example when rematerialisation of exit value is not profitable but it passes
this check due to presence of a "soft use" outside the loop.
It makes no sense to recalculate value on exit if we are going to compute it
due to some irremovable within the loop. This patch disallows applying this
transform in the described situation.
Differential Revision: https://reviews.llvm.org/D51581
Reviewed By: etherzhhb
llvm-svn: 345708
This adds the support for DW_FORM_addrx, DW_FORM_addrx1,
DW_FORM_addrx2, DW_FORM_addrx3, DW_FORM_addrx4 forms.
Differential revision: https://reviews.llvm.org/D53813
llvm-svn: 345706
optsize using masked wide loads
Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53668
llvm-svn: 345705
Turns out it's not always possible to figure out whether an asm()
statement argument points to a valid memory region.
One example would be per-CPU objects in the Linux kernel, for which the
addresses are calculated using the FS register and a small offset in the
.data..percpu section.
To avoid pulling all sorts of checks into the instrumentation, we replace
actual checking/unpoisoning code with calls to
msan_instrument_asm_load(ptr, size) and
msan_instrument_asm_store(ptr, size) functions in the runtime.
This patch doesn't implement the runtime hooks in compiler-rt, as there's
been no demand in assembly instrumentation for userspace apps so far.
llvm-svn: 345702
Emit pseudo instructions indicating unwind codes corresponding to each
instruction inside the prologue/epilogue. These are used by the MCLayer to
populate the .xdata section.
Differential Revision: https://reviews.llvm.org/D50288
llvm-svn: 345701
There is no SCHED_IDLE semantic equivalent in BSD systems.
Reviewers: kadircet, sammccall
Revieweed By: sammccall
Differential Revision: https://reviews.llvm.org/D53922
llvm-svn: 345700
In the course of D51340, @takuto.ikuta discovered that Clang fails to put
dllexport/import attributes on static locals during template instantiation.
For regular functions, this happens in Sema::FinalizeDeclaration(), however for
template instantiations we need to do something in or around
TemplateDeclInstantiator::VisitVarDecl(). This patch does that, and extracts
the code to a utility function.
Differential Revision: https://reviews.llvm.org/D53870
llvm-svn: 345699
This patch introduces a concept of "frame recognizer" and "recognized frame". This should be an extensible mechanism that retrieves information about special frames based on ABI, arguments or other special properties of that frame, even without source code. A few examples where that could be useful could be 1) objc_exception_throw, where we'd like to get the current exception, 2) terminate_with_reason and extracting the current terminate string, 3) recognizing Objective-C frames and automatically extracting the receiver+selector, or perhaps all arguments (based on selector).
Differential Revision: https://reviews.llvm.org/D44603
llvm-svn: 345693
A ConstantExpr class represents a full expression that's in a context where a
constant expression is required. This class reflects the path the evaluator
took to reach the expression rather than the syntactic context in which the
expression occurs.
In the future, the class will be expanded to cache the result of the evaluated
expression so that it's not needlessly re-evaluated
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D53475
llvm-svn: 345692
This patch introduces a concept of "frame recognizer" and "recognized frame". This should be an extensible mechanism that retrieves information about special frames based on ABI, arguments or other special properties of that frame, even without source code. A few examples where that could be useful could be 1) objc_exception_throw, where we'd like to get the current exception, 2) terminate_with_reason and extracting the current terminate string, 3) recognizing Objective-C frames and automatically extracting the receiver+selector, or perhaps all arguments (based on selector).
Differential Revision: https://reviews.llvm.org/D44603
llvm-svn: 345686
This patch introduces a concept of "frame recognizer" and "recognized frame". This should be an extensible mechanism that retrieves information about special frames based on ABI, arguments or other special properties of that frame, even without source code. A few examples where that could be useful could be 1) objc_exception_throw, where we'd like to get the current exception, 2) terminate_with_reason and extracting the current terminate string, 3) recognizing Objective-C frames and automatically extracting the receiver+selector, or perhaps all arguments (based on selector).
Differential Revision: https://reviews.llvm.org/D44603
llvm-svn: 345678
For arguments, pass it indirectly, since the ABI doc says pretty clearly
that arguments larger than 8 bytes are passed indirectly. This makes
va_list handling easier, anyway.
When returning, GCC returns in XMM0, and we match them.
Fixes PR39492.
llvm-svn: 345676
The debug-use flag must be set exactly for uses on DBG_VALUEs. This is
so obvious that it can be trivially inferred while parsing. This will
reduce noise when printing while omitting an information that has little
value to the user.
The parser will keep recognizing the flag for compatibility with old
`.mir` files.
Differential Revision: https://reviews.llvm.org/D53903
llvm-svn: 345671