This patch is motivated by pr48057 where an output dependency is not detected
since loop interchange did not check a store instruction with itself.
Fixed that deficiency.
Reviewed By: bmahjour, Meinersbur, #loopoptwg
Differential Revision: https://reviews.llvm.org/D118102
Summary:
according to Jame's comment in the patch https://reviews.llvm.org/D112735
Created a NFC patch to remove global variable " std::vector<NMSymbol> SymbolList"
Reviewer : James Henderson,Fangrui Song
Differential Revision: https://reviews.llvm.org/D120687
RTBuilder.h has been moved in `flang/Optimizer/Builder/Runtime/RTBuilder.h`.
This duplicate is not necessary anymore.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D121317
This patch fixes the issue introduced in 14de0820e8 and D120089, that
if dynamic libraries are used, the `CUmodule` array could be overwritten.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D121308
In SupportAndFAQ.rst, add blank lines before and after a bullet list and
sublist. This avoids an "Unepxected indentation" warning.
In Runtimes.rst, adjust the suggestion for setting LIBOMPTARGET_INFO.
The right shifts are not necessary as the bit mask values are already
correct.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D119595
Summary:
1. added helper function printObjectNamesInfo() to print out object file name, archive name, architecture name.
2. One small behaviors change.
in the function dumpMachOUniversalBinaryArchAll , delete the functionality:
if (moreThanOneArch)
outs() << "\n";
Reviewer : James Henderson,Fangrui Song
Differential Revision: https://reviews.llvm.org/D120357
The existing handling produced crash for test case (attached with patch).
Now the function transferSRADebugInfo is modified to
- Ignore the current variable if it starts after the current Fragment.
- Ignore the current variable if it ends before the current Fragment.
- Generate (!DIExpression()) if current variable completely fits the
current Fragment.
- Otherwise (as earlier), generate the DW_OP_LLVM_fragment in IR if current
Fragment partially defines current variable.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D121107
This patch adds more lowering and tests for character array assignment/copy.
This patch is part of the upstreaming effort from fir-dev branch.
Depends on D121300
Reviewed By: PeteSteinfeld, schweitz
Differential Revision: https://reviews.llvm.org/D121301
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Reflow the textual comment which preserves formatted output from
tooling. This makes the content legible again after the lldb source
code was reformatted with automated tooling.
As a result of adding multiblock outlining, it became possible to outline the entirety of basic block, and branches that only pointed to the basic blocks contained in the outlined section. This means that there are no exit paths, and no return statement. There was a previous assertion from the older version of the outliner that explicitly made sure there was a return statement. This removes that assertion.
Reviewers: paquette
Differential Revision: https://reviews.llvm.org/D120868
The prologue and epilogue emission were unbalanced in light of different
strategies of async frame context emission. Adjust the epilogue emission
to match the prologue emission. This makes the elision work properly as
well as the deployment based. Due to the fact that the epilogue always
was clearing a bit (which should not be set in the first place), the
client would not notice the behavioural issue unless the deployment
version was in effect.
This patch update the array value copy pass to support fir-array_amend
and fir.array_access.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld, schweitz
Differential Revision: https://reviews.llvm.org/D121300
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
This is another fold generalized from D111530.
We can find a common source for a rotate operation hidden inside an 'or':
https://alive2.llvm.org/ce/z/9pV8hn
Deciding when this is profitable vs. a funnel-shift is tricky, but this
does not show any regressions: if a target has a rotate but it does not
have a funnel-shift, then try to form the rotate here. That is why we
don't have x86 test diffs for the scalar tests that are duplicated from
AArch64 ( 74a65e3834 ) - shld/shrd are available. That also makes it
difficult to show vector diffs - the only case where I found a diff was
on x86 AVX512 or XOP with i64 elements.
There's an additional check for a legal type to avoid a problem seen
with x86-32 where we form a 64-bit rotate but then it gets split
inefficiently. We might avoid that by adding more rotate folds, but
I didn't check to see what is missing on that path.
This gets most of the motivating patterns for AArch64 / ARM that are in
D111530.
We still need a couple of enhancements to setcc pattern matching with
rotate/funnel-shift to get the rest.
Differential Revision: https://reviews.llvm.org/D120933
Summary:
In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In current implementation,
user SGPRs are set up unnecessarily for some cases. If the target has aperture
registers, queue_ptr is not needed to reference aperture bases. For trap
handling, if target suppots getDoorbellID, queue_ptr is also not necessary.
Futher, code object version 5 introduces new kernel ABI which passes queue_ptr
as an implicit kernel argument, so user SGPRs are no longer necessary for
queue_ptr. Based on the trap handling document:
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table,
llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap
in the backend.
Reviewers: sameerds, arsenm
Fixes: SWDEV-307189
Differential Revision: https://reviews.llvm.org/D119762
Flat can be merged with flat global since address cast is a no-op.
A combined memory operation needs to be promoted to flat.
Differential Revision: https://reviews.llvm.org/D120431
These compound predicates are not required, since we can use a
combination of setting the SubtargetPredicate (to a subtarget
predicate like isGFX940Plus) and OtherPredicates (to a list of feature
predicates like HasAtomicFaddInsts) instead. NFC.
Differential Revision: https://reviews.llvm.org/D121289
These tests don't seem specific to the debug mode, so it makes sense to
run them even when the debug mode is disabled. When we run with the debug
mode enabled, we'll get the out-of-bounds checking that this test seems
to be concerned with.
Differential Revision: https://reviews.llvm.org/D121241
Add `-fopenacc` flag to the `bbc` tool.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: schweitz
Differential Revision: https://reviews.llvm.org/D121117
Add a new pass in the pre-ra AMDGPU scheduler to check if sinking trivially rematerializable defs that only has one use outside of the defining block will increase occupancy. If we can determine that occupancy can be increased, then rematerialize only the minimum amount of defs required to increase occupancy. Also re-schedule all regions that had occupancy matching the previous min occupancy using the new occupancy.
This is based off of the discussion in https://reviews.llvm.org/D117562.
The logic to determine the defs we should collect and determining if sinking would be beneficial is mostly the same. Main differences is that we are no longer limiting it to immediate defs and the def and use does not have to be part of a loop.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D119475
Add `-fopenmp` flag to the `bbc` tool.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: schweitz, awarzynski
Differential Revision: https://reviews.llvm.org/D121118
This patch intersects the fast math flags from the two fcmps instead
of dropping them.
I poked at this a bunch with Alive2 for nnan and ninf flags and it seemed
to check out. With the other flags it told me "Couldn't prove the
correctness of the transformation". Not sure if I should just preserve
nnan and ninf?
Reviewed By: spatel, lebedev.ri
Differential Revision: https://reviews.llvm.org/D121243
The code that computed the extent of a dimension of a
non-allocatable/non-automatic component array during
finalization had a reversed subtraction; fix, and
use variables to make the code a little more readable.
Differential Revision: https://reviews.llvm.org/D121163
Based on review comments in D97705 applied some code cleanups in
<charconv>. The header now uses a more recent libc++ style.
Reviewed By: Quuxplusone, #libc, philnik
Differential Revision: https://reviews.llvm.org/D121223
Ensure step-avoid-regexp logs are emitted in the case where the regex has no
capture groups.
Without this change, the log is printed only if the regex has at least one
capture group.
Another change is to the log message: the first capture group has been removed
from the message. There could be zero capture groups, and there could be two or
more capture groups.
Differential Revision: https://reviews.llvm.org/D119298
Add `IsAggregateType` to the SB API.
I'd like to use this from tests, and there are numerous other `Is<X>Type`
predicates on `SBType`.
Differential Revision: https://reviews.llvm.org/D121252
Single value phis won't be modeled in VPlan. If the phi only gets used
outside the loop, the current code misses the fact that the incoming
value is not dead. Update the code to also look through such phis to
check for outside users.
Fixes#54266
This patch adds support for:
* `-S` in Flang's compiler and frontend drivers,
and implements:
* `-emit-obj` in Flang's frontend driver and `-c` in Flang's compiler
driver (this is consistent with Clang).
(these options were already available before, but only as placeholders).
The semantics of these options in Clang and Flang are identical.
The `EmitObjAction` frontend action is renamed as `BackendAction`. This
new name more accurately reflects the fact that this action will
primarily run the code-gen/backend pipeline in LLVM. It also makes more
sense as an action implementing both `-emit-obj` and `-S` (originally,
it was just `-emit-obj`).
`tripleName` from FirContext.cpp is deleted and, when a target triple is
required, `mlir::LLVM::LLVMDialect::getTargetTripleAttrName()` is used
instead. In practice, this means that `fir.triple` is replaced with
`llvm.target_triple`. The former was effectively ignored. The latter is
used when lowering from the LLVM dialect in MLIR to LLVM IR (i.e. it's
embedded in the generated LLVM IR module). The driver can then re-use
it when configuring the backend. With this change, the LLVM IR files
generated by e.g. `tco` will from now on contain the correct target
triple.
The code-gen.f90 test is replaced with code-gen-x86.f90 and
code-gen-aarch64.f90. With 2 seperate files we can verify that
`--target` is correctly taken into account. LIT configuration is updated
to enable e.g.:
```
! REQUIRES: aarch64-registered-target
```
Differential Revision: https://reviews.llvm.org/D120568
This enables tests out of clang/unittests/Analysis/FlowSensitive to
use the testing support utilities.
Reviewed-by: ymandel, gribozavr2
Differential Revision: https://reviews.llvm.org/D121285
This patch moves the platform creation and selection logic into the
per-debugger platform lists. I've tried to keep functional changes to a
minimum -- the main (only) observable difference in this change is that
APIs, which select a platform by name (e.g.,
Debugger::SetCurrentPlatform) will not automatically pick up a platform
associated with another debugger (or no debugger at all).
I've also added several tests for this functionality -- one of the
pleasant consequences of the debugger isolation is that it is now
possible to test the platform selection and creation logic.
This is a product of the discussion at
<https://discourse.llvm.org/t/multiple-platforms-with-the-same-name/59594>.
Differential Revision: https://reviews.llvm.org/D120810
LIBCXX_ENABLE_ASSERTIONS does not have any relationship to the `assert`
macro -- it only controls assertions that are internal to the library.
Playing around with `NDEBUG` only muddies the picture further than it
already is.
Also, remove a failing assertion in the benchmarks. That assertion had
never been exercised because we defined `NDEBUG` manually, and it was
failing since we introduced the ability to generate a benchmark vector
with the Quicksort adversary ordering (which is obviously not sorted).
This was split off of https://llvm.org/D121123.
Differential Revision: https://reviews.llvm.org/D121244
This patch refactors the range checking function to make it compatible with all relocation types and supports range checking for R_RISCV_BRANCH. Moreover, it refactors the alignment check functions.
Reviewed By: StephenFan
Differential Revision: https://reviews.llvm.org/D117946
Embedded nul characters are still printed, and they don't terminate the
string. See also D111634.
Differential Revision: https://reviews.llvm.org/D120803