As a result of adding multiblock outlining, it became possible to outline the entirety of basic block, and branches that only pointed to the basic blocks contained in the outlined section. This means that there are no exit paths, and no return statement. There was a previous assertion from the older version of the outliner that explicitly made sure there was a return statement. This removes that assertion.
Reviewers: paquette
Differential Revision: https://reviews.llvm.org/D120868
The prologue and epilogue emission were unbalanced in light of different
strategies of async frame context emission. Adjust the epilogue emission
to match the prologue emission. This makes the elision work properly as
well as the deployment based. Due to the fact that the epilogue always
was clearing a bit (which should not be set in the first place), the
client would not notice the behavioural issue unless the deployment
version was in effect.
This patch update the array value copy pass to support fir-array_amend
and fir.array_access.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld, schweitz
Differential Revision: https://reviews.llvm.org/D121300
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
This is another fold generalized from D111530.
We can find a common source for a rotate operation hidden inside an 'or':
https://alive2.llvm.org/ce/z/9pV8hn
Deciding when this is profitable vs. a funnel-shift is tricky, but this
does not show any regressions: if a target has a rotate but it does not
have a funnel-shift, then try to form the rotate here. That is why we
don't have x86 test diffs for the scalar tests that are duplicated from
AArch64 ( 74a65e3834 ) - shld/shrd are available. That also makes it
difficult to show vector diffs - the only case where I found a diff was
on x86 AVX512 or XOP with i64 elements.
There's an additional check for a legal type to avoid a problem seen
with x86-32 where we form a 64-bit rotate but then it gets split
inefficiently. We might avoid that by adding more rotate folds, but
I didn't check to see what is missing on that path.
This gets most of the motivating patterns for AArch64 / ARM that are in
D111530.
We still need a couple of enhancements to setcc pattern matching with
rotate/funnel-shift to get the rest.
Differential Revision: https://reviews.llvm.org/D120933
Summary:
In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In current implementation,
user SGPRs are set up unnecessarily for some cases. If the target has aperture
registers, queue_ptr is not needed to reference aperture bases. For trap
handling, if target suppots getDoorbellID, queue_ptr is also not necessary.
Futher, code object version 5 introduces new kernel ABI which passes queue_ptr
as an implicit kernel argument, so user SGPRs are no longer necessary for
queue_ptr. Based on the trap handling document:
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table,
llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap
in the backend.
Reviewers: sameerds, arsenm
Fixes: SWDEV-307189
Differential Revision: https://reviews.llvm.org/D119762
Flat can be merged with flat global since address cast is a no-op.
A combined memory operation needs to be promoted to flat.
Differential Revision: https://reviews.llvm.org/D120431
These compound predicates are not required, since we can use a
combination of setting the SubtargetPredicate (to a subtarget
predicate like isGFX940Plus) and OtherPredicates (to a list of feature
predicates like HasAtomicFaddInsts) instead. NFC.
Differential Revision: https://reviews.llvm.org/D121289
These tests don't seem specific to the debug mode, so it makes sense to
run them even when the debug mode is disabled. When we run with the debug
mode enabled, we'll get the out-of-bounds checking that this test seems
to be concerned with.
Differential Revision: https://reviews.llvm.org/D121241
Add `-fopenacc` flag to the `bbc` tool.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: schweitz
Differential Revision: https://reviews.llvm.org/D121117
Add a new pass in the pre-ra AMDGPU scheduler to check if sinking trivially rematerializable defs that only has one use outside of the defining block will increase occupancy. If we can determine that occupancy can be increased, then rematerialize only the minimum amount of defs required to increase occupancy. Also re-schedule all regions that had occupancy matching the previous min occupancy using the new occupancy.
This is based off of the discussion in https://reviews.llvm.org/D117562.
The logic to determine the defs we should collect and determining if sinking would be beneficial is mostly the same. Main differences is that we are no longer limiting it to immediate defs and the def and use does not have to be part of a loop.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D119475
Add `-fopenmp` flag to the `bbc` tool.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: schweitz, awarzynski
Differential Revision: https://reviews.llvm.org/D121118
This patch intersects the fast math flags from the two fcmps instead
of dropping them.
I poked at this a bunch with Alive2 for nnan and ninf flags and it seemed
to check out. With the other flags it told me "Couldn't prove the
correctness of the transformation". Not sure if I should just preserve
nnan and ninf?
Reviewed By: spatel, lebedev.ri
Differential Revision: https://reviews.llvm.org/D121243
The code that computed the extent of a dimension of a
non-allocatable/non-automatic component array during
finalization had a reversed subtraction; fix, and
use variables to make the code a little more readable.
Differential Revision: https://reviews.llvm.org/D121163
Based on review comments in D97705 applied some code cleanups in
<charconv>. The header now uses a more recent libc++ style.
Reviewed By: Quuxplusone, #libc, philnik
Differential Revision: https://reviews.llvm.org/D121223
Ensure step-avoid-regexp logs are emitted in the case where the regex has no
capture groups.
Without this change, the log is printed only if the regex has at least one
capture group.
Another change is to the log message: the first capture group has been removed
from the message. There could be zero capture groups, and there could be two or
more capture groups.
Differential Revision: https://reviews.llvm.org/D119298
Add `IsAggregateType` to the SB API.
I'd like to use this from tests, and there are numerous other `Is<X>Type`
predicates on `SBType`.
Differential Revision: https://reviews.llvm.org/D121252
Single value phis won't be modeled in VPlan. If the phi only gets used
outside the loop, the current code misses the fact that the incoming
value is not dead. Update the code to also look through such phis to
check for outside users.
Fixes#54266
This patch adds support for:
* `-S` in Flang's compiler and frontend drivers,
and implements:
* `-emit-obj` in Flang's frontend driver and `-c` in Flang's compiler
driver (this is consistent with Clang).
(these options were already available before, but only as placeholders).
The semantics of these options in Clang and Flang are identical.
The `EmitObjAction` frontend action is renamed as `BackendAction`. This
new name more accurately reflects the fact that this action will
primarily run the code-gen/backend pipeline in LLVM. It also makes more
sense as an action implementing both `-emit-obj` and `-S` (originally,
it was just `-emit-obj`).
`tripleName` from FirContext.cpp is deleted and, when a target triple is
required, `mlir::LLVM::LLVMDialect::getTargetTripleAttrName()` is used
instead. In practice, this means that `fir.triple` is replaced with
`llvm.target_triple`. The former was effectively ignored. The latter is
used when lowering from the LLVM dialect in MLIR to LLVM IR (i.e. it's
embedded in the generated LLVM IR module). The driver can then re-use
it when configuring the backend. With this change, the LLVM IR files
generated by e.g. `tco` will from now on contain the correct target
triple.
The code-gen.f90 test is replaced with code-gen-x86.f90 and
code-gen-aarch64.f90. With 2 seperate files we can verify that
`--target` is correctly taken into account. LIT configuration is updated
to enable e.g.:
```
! REQUIRES: aarch64-registered-target
```
Differential Revision: https://reviews.llvm.org/D120568
This enables tests out of clang/unittests/Analysis/FlowSensitive to
use the testing support utilities.
Reviewed-by: ymandel, gribozavr2
Differential Revision: https://reviews.llvm.org/D121285
This patch moves the platform creation and selection logic into the
per-debugger platform lists. I've tried to keep functional changes to a
minimum -- the main (only) observable difference in this change is that
APIs, which select a platform by name (e.g.,
Debugger::SetCurrentPlatform) will not automatically pick up a platform
associated with another debugger (or no debugger at all).
I've also added several tests for this functionality -- one of the
pleasant consequences of the debugger isolation is that it is now
possible to test the platform selection and creation logic.
This is a product of the discussion at
<https://discourse.llvm.org/t/multiple-platforms-with-the-same-name/59594>.
Differential Revision: https://reviews.llvm.org/D120810
LIBCXX_ENABLE_ASSERTIONS does not have any relationship to the `assert`
macro -- it only controls assertions that are internal to the library.
Playing around with `NDEBUG` only muddies the picture further than it
already is.
Also, remove a failing assertion in the benchmarks. That assertion had
never been exercised because we defined `NDEBUG` manually, and it was
failing since we introduced the ability to generate a benchmark vector
with the Quicksort adversary ordering (which is obviously not sorted).
This was split off of https://llvm.org/D121123.
Differential Revision: https://reviews.llvm.org/D121244
This patch refactors the range checking function to make it compatible with all relocation types and supports range checking for R_RISCV_BRANCH. Moreover, it refactors the alignment check functions.
Reviewed By: StephenFan
Differential Revision: https://reviews.llvm.org/D117946
Embedded nul characters are still printed, and they don't terminate the
string. See also D111634.
Differential Revision: https://reviews.llvm.org/D120803
Our SIGTSTP handler was working, but that was mostly accidental.
The reason it worked is because lldb is multithreaded for most of its
lifetime and the OS is reasonably fast at responding to signals. So,
what happened was that the kill(SIGTSTP) which we sent from inside the
handler was delivered to another thread while the handler was still set
to SIG_DFL (which then correctly put the entire process to sleep).
Sometimes it happened that the other thread got the second signal after
the first thread had already restored the handler, in which case the
signal handler would run again, and it would again attempt to send the
SIGTSTP signal back to itself.
Normally it didn't take many iterations for the signal to be delivered
quickly enough. However, if you were unlucky (or were playing around
with pexpect) you could get SIGTSTP while lldb was single-threaded, and
in that case, lldb would go into an endless loop because the second
SIGTSTP could only be handled on the main thread, and only after the
handler for the first signal returned (and re-installed itself). In that
situation the handler would keep re-sending the signal to itself.
This patch fixes the issue by implementing the handler the way it
supposed to be done:
- before sending the second SIGTSTP, we unblock the signal (it gets
automatically blocked upon entering the handler)
- we use raise to send the signal, which makes sure it gets delivered to
the thread which is running the handler
This also means we don't need the SIGCONT handler, as our TSTP handler
resumes right after the entire process is continued, and we can do the
required work there.
I also include a test case for the SIGTSTP flow. It uses pexpect, but it
includes a couple of extra twists. Specifically, I needed to create an
extra process on top of lldb, which will run lldb in a separate process
group and simulate the role of the shell. This is needed because SIGTSTP
is not effective on a session leader (the signal gets delivered, but it
does not cause a stop) -- normally there isn't anyone to notice the
stop.
Differential Revision: https://reviews.llvm.org/D120320
This ensures that the user is aware that many commands will not work
correctly.
We print the warning only once (per module) to avoid spamming the user
with potentially thousands of error messages.
Differential Revision: https://reviews.llvm.org/D120892
This patch adds PrettyStackEntries before running passes. The entries
include the pass name and the IR unit the pass runs on.
The information is used the print additional information when a pass
crashes, including the name and a reference to the IR unit on which it
crashed. This is similar to the behavior of the legacy pass manager.
The improved stack trace now includes:
Stack dump:
0. Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll
1. Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll'
2. Running pass 'LoopVectorizePass' on function '@a'
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D120993
Adds "frontend attributes" to the list of things to come talk about,
removes the extra timezone information to hopefully reduce confusion
about daylight savings time.