This patch merges STR<S,D,Q,W,X>pre-STR<S,D,Q,W,X>ui and
LDR<S,D,Q,W,X>pre-LDR<S,D,Q,W,X>ui instruction pairs into a single
STP<S,D,Q,W,X>pre and LDP<S,D,Q,W,X>pre instruction, respectively.
For each pair, there is a MIR test that verifies this optimization.
Differential Revision: https://reviews.llvm.org/D99272
Change-Id: Ie97a20c8c716c08492fe229c22e14e3c98ef08b7
When producing the runtime type information for a component of a derived type
that had a LEN type parameter, we were not allowing a KIND parameter of the
derived type. This was causing one of the NAG correctness tests to fail
(.../hibiya/d5.f90).
I added a test to our own test suite to check for this.
Also, I fixed a typo in .../module/__fortran_type_info.f90.
I allowed KIND type parameters to be used for the declarations of components
that use LEN parameters by constant folding the value of the LEN parameter. To
make the constant folding work, I had to put the semantics::DerivedTypeSpec of
the associated derived type into the folding context. To get this
semantics::DerivedTypeSpec, I changed the value of the semantics::Scope object
that was passed to DescribeComponent() to be the derived type scope rather than
the containing non-derived type scope.
This scope change, in turn, caused differences in the symbol table output that
is checked in typeinfo01.f90. Most of these differences were in the order that
the symbols appeared in the dump. But one of them changed one of the values
from "CHARACTER(2_8,1)" to "CHARACTER(1_8,1)". I'm not sure if these changes
are significant. Please verify that the results of this test are still valid.
Also, I wonder if there are other situations in this code where we should be
folding constants. For example, what if the field of a component has a
component whose type is a PDT with a LEN type parameter, and the component's
declaration depends on the KIND type parameter of the current PDT. Here's an
example:
type string(stringkind)
integer,kind :: stringkind
character(stringkind) :: value
end type string
type outer(kindparam)
integer,kind :: kindparam
type(string(kindparam)) :: field
end type outer
I don't understand the code or what it's trying to accomplish well enough to
figure out if such cases are correctly handled by my new code.
Differential Revision: https://reviews.llvm.org/D101482
The functionality in SVEIntrinsicOpts::isReinterpretToSVBool was moved in
D101302, however the original now unused function was not removed (NFC).
Differential Revision: https://reviews.llvm.org/D101642
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.
To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.
Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)
Differential Revision: https://reviews.llvm.org/D101164
This patch fixes two bugs that arise when a 'defm' inherits from a multiclass
and also from a class with assertions.
Differential Revision: https://reviews.llvm.org/D101626
* Using a base address or skipping it with LLDB_INVALID_ADDRESS
* Using a data offset, which does not effect the printed addresses
* Not providing an output stream
* Formatting a double sized HexFloat
* Formatting over multiple lines
Since address printing now has its own test,
I've removed the base address from all the format
type tests.
The multi line tests still use a base address to check that
it's incremented correctly for each new line.
Reviewed By: teemperor
Differential Revision: https://reviews.llvm.org/D101627
This patch introduces a new infrastructure that is used to select the load and
store instructions in the PPC backend.
The primary motivation is that the current implementation of selecting load/stores
is dependent on the ordering of patterns in TableGen. Given this limitation, we
are not able to easily and reliably generate the P10 prefixed load and stores
instructions (such as when the immediates that fit within 34-bits). This
refactoring is meant to provide us with more control over the patterns/different
forms to exploit, as well as eliminating dependency of pattern declaration in TableGen.
The idea of this refactoring is that it introduces a set of addressing modes that
correspond to different instruction formats of a particular load and store
instruction, along with a set of common flags that describes a load/store.
Whenever a load/store instruction is being selected, we analyze the instruction
and compute a set of flags for it. The computed flags are then used to
select the most optimal load/store addressing mode.
This patch is the first of a series of patches to be committed - it contains the
initial implementation of the refactored load/store selection infrastructure and
also updates P8/P9 patterns to adopt this infrastructure. The idea is that
incremental patches will add more implementation and support, and eventually
the old implementation will be removed.
Differential Revision: https://reviews.llvm.org/D93370
Summary:
This patch implements the backend implementation of adding global variables
directly to the table of contents (TOC), rather than adding the address of the
variable to the TOC.
Currently, this patch will look for the "toc-data" attribute on symbols in the
IR, and then add those symbols to the TOC.
ATM, this is implemented for 32 bit AIX.
Reviewers: sfertile
Differential Revision: https://reviews.llvm.org/D101178
There already was a check for undeduced and incomplete types, but it
failed to trigger when outer type (SubstTemplateTypeParm in test) looked
fine, but inner type was not.
Differential Revision: https://reviews.llvm.org/D100667
Currently we have a bit of a mess related to tids:
- sanitizers re-declare kInvalidTid multiple times
- some call it kUnknownTid
- implicit assumptions that main tid is 0
- asan/memprof claim their tids need to fit into 24 bits,
but this does not seem to be true anymore
- inconsistent use of u32/int to store tids
Introduce kInvalidTid/kMainTid in sanitizer_common
and use them consistently.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D101428
This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).
VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D78203
The only effect of the optimization is to remove s_set_gpr_idx_*
instructions, and update_mir_test_checks.py always inserts CHECK: rather
than CHECK-NEXT: checks, so without this implicit negative check, the
tests would always pass even if the optimization did nothing.
Differential Revision: https://reviews.llvm.org/D101622
Removed extension begin/end pragma as it has no effect and
it is added unconditionally for all targets.
Differential Revision: https://reviews.llvm.org/D92244
Early exit from method DispatchStage::isAvailable() if the dispatch group is
already full. Not all instructions declare at least one uOP.
Fixes PR50174.
Currently Clang does not add mustprogress to inifinite loops with a
known constant condition, matching C11 behavior. The forward progress
guarantee in C++11 and later should allow us to add mustprogress to any
loop (http://eel.is/c++draft/intro.progress#1).
This allows us to simplify the code dealing with adding mustprogress a
bit.
Reviewed By: aaron.ballman, lebedev.ri
Differential Revision: https://reviews.llvm.org/D96418
As these jobs only run in a couple seconds, and block starting of
other jobs, they can run on the "service" queue which doesn't get
blocked by other long-running jobs.
Differential Revision: https://reviews.llvm.org/D101437
Use of bitcast resulted in lanes being swapped for vcreateq with big
endian. Fix this by using vreinterpret. No code change for little
endian. Adds IR lit test.
Differential Revision: https://reviews.llvm.org/D101606
Due to a somewhat annoying, but necessary, shortfall in -Wunused-lambda-capture, These unused captures aren't warned about.
Reviewed By: kadircet
Differential Revision: https://reviews.llvm.org/D101611
as a way of not running it on Windows, where the file paths when
extracting repro2.tar can become longer than the maximum file length
limit (depending on the build dir name) and cause the test to fail.
(See https://crbug.com/1204463 for example test failure.)
Previously, the JavaScript import sorter would ignore `// clang-format
off` and `on` comments. This change fixes that. It tracks whether
formatting is enabled for a stretch of imports, and then only sorts and
merges the imports where formatting is enabled, in individual chunks.
This means that there's no meaningful total order when module references are mixed
with blocks that have formatting disabled. The alternative approach
would have been to sort all imports that have formatting enabled in one
group. However that raises the question where to insert the
formatting-off block, which can also impact symbol visibility (in
particular for exports). In practice, sorting in chunks probably isn't a
big problem.
This change also simplifies the general algorithm: instead of tracking
indices separately and sorting them, it just sorts the vector of module
references. And instead of attempting to do fine grained tracking of
whether the code changed order, it just prints out the module references
text, and compares that to the previous text. Given that source files
typically have dozens, but not even hundreds of imports, the performance
impact seems negligible.
Differential Revision: https://reviews.llvm.org/D101515
Previous attempt to fix infinite recursion in min/max reassociation was not fully successful (D100170). Newly discovered failing case is due to not properly handled when there is a single use. It should be processed separately from 2 uses case.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D101359
Hoisting and sinking instructions out of conditional blocks enables
additional vectorization by:
1. Executing memory accesses unconditionally.
2. Reducing the number of instructions that need predication.
After disabling early hoisting / sinking, we miss out on a few
vectorization opportunities. One of those is causing a ~10% performance
regression in one of the Geekbench benchmarks on AArch64.
This patch tires to recover the regression by running hoisting/sinking
as part of a SimplifyCFG run after LoopRotate and before LoopVectorize.
Note that in the legacy pass-manager, we run LoopRotate just before
vectorization again and there's no SimplifyCFG run in between, so the
sinking/hoisting may impact the later run on LoopRotate. But the impact
should be limited and the benefit of hosting/sinking at this stage
should outweigh the risk of not rotating.
Compile-time impact looks slightly positive for most cases.
http://llvm-compile-time-tracker.com/compare.php?from=2ea7fb7b1c045a7d60fcccf3df3ebb26aa3699e5&to=e58b4a763c691da651f25996aad619cb3d946faf&stat=instructions
NewPM-O3: geomean -0.19%
NewPM-ReleaseThinLTO: geoman -0.54%
NewPM-ReleaseLTO-g: geomean -0.03%
With a few benchmarks seeing a notable increase, but also some
improvements.
Alternative to D101290.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D101468
As discussed in D100107, this patch first convert index_vector to
step_vector, and convert step_vector back to index_vector after LegalizeDAG.
Differential Revision: https://reviews.llvm.org/D100816
It seems incorrect to use TTI data in some places,
and override it in others. In this case, TTI says
that `extractvalue` are free, yet we bill them.
While this doesn't address https://bugs.llvm.org/show_bug.cgi?id=50099 yet,
it reduces the cost from 55 to 50 while the threshold is 45.
Differential Revision: https://reviews.llvm.org/D101228
Currently we call el_set directly to configure the editor in the libedit
wrapper. There are some cases in which this causes extra casting, but we pass
captureless lambdas as function pointers, which should work out of the box.
Since el_set takes varargs, if the cast is incorrect or if the cast is not
present, it causes a run time failure rather than compile error. This change
makes it so a few different types of configuration is done inside a helper
function to provide type safety and eliminate that casting. I didn't do all
edit line configuration because I'm not sure how important it was in other cases
and it might require something more general keep up with libedit's signature.
I'm open to suggestions, though.
Reviewed By: teemperor, JDevlieghere
Differential Revision: https://reviews.llvm.org/D101250
Covering basic cases where you have 1 item on 1 line.
Apart from eFormatCharArray, where using multiple lines
highlights the difference between it and eFormatVectorOfChar.
Reviewed By: #lldb, teemperor
Differential Revision: https://reviews.llvm.org/D101453
DAGCombiner was recently taught how to combine STEP_VECTOR nodes,
meaning the step value is no longer guaranteed to be one by the time it
reaches the backend for lowering.
This patch supports such cases on RISC-V by lowering to other step
values to a multiply following the vid.v instruction. It includes a
small optimization for common cases where the multiply can be expressed
as a shift left.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D100856