The patch in D112349 added a previously nonexistant restriction on ifunc
resolvers that they MUST be defintions. However, the function
multiversioning depends on being able to resolve these resolvers at
link-time, so this additional restriction was breaking.
This reverts commit 5fbcf67734.
ProcessDebugger is used in ProcessWindows and NativeProcessWindows.
I thought I was simplifying things by renaming to DoGetMemoryRegionInfo
in ProcessDebugger but the Native process side expects "GetMemoryRegionInfo".
Follow the pattern that WriteMemory uses. So:
* ProcessWindows::DoGetMemoryRegioninfo calls ProcessDebugger::GetMemoryRegionInfo
* NativeProcessWindows::GetMemoryRegionInfo does the same
The recipe produces exactly one VPValue and can inherit directly from
it. This is in line with other recipes and avoids having to use
getVPSingleValue.
A reserved register:
- is not allocatable
- is considered always live
- is ignored by liveness tracking
NVPTX special registers match the criteria, and marking them as
reserved helps to avoid machine verifier error:
*** Bad machine code: Using an undefined physical register ***
- function: foo
- basic block: %bb.0 (0x557bb178b708)
- instruction: %0:int32regs = MOV_SPECIAL $envreg0
- operand 1: $envreg0
Differential Revision: https://reviews.llvm.org/D113008
Add a warning to TableGen for unused template arguments in classes and
multiclasses, for example:
multiclass Foo<int x> {
def bar;
}
$ llvm-tblgen foo.td
foo.td:1:20: warning: unused template argument: Foo::x
multiclass Foo<int x> {
^
A flag '--no-warn-on-unused-template-args' is added to disable the
warning. The warning is disabled for LLVM and sub-projects if
'LLVM_ENABLE_WARNINGS=OFF'.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D109359
TargetExternalSymbol is considered to be an immediate and not a
register, so machine verifier emits an error:
*** Bad machine code: Expected a register operand. ***
- function: static_offset
- basic block: %bb.0 bb (0x560e9b306028)
- instruction: %3:int64regs = MoveParamI64 &static_offset_param_1
- operand 1: &static_offset_param_1
The patch adds variants of this instruction with an immediate operand
for byval arguments on 64-bit and 32-bit targets.
Differential Revision: https://reviews.llvm.org/D113006
LLVM has the habit of turning adds with no common bits set into ors,
which means we need to detect them and treat them like adds again in the
MVE gather/scatter lowering pass.
Differential Revision: https://reviews.llvm.org/D112922
On AArch64 we have various things using the non address bits
of pointers. This means when you lookup their containing region
you won't find it if you don't remove them.
This changes Process GetMemoryRegionInfo to a non virtual method
that uses the current ABI plugin to remove those bits. Then it
calls DoGetMemoryRegionInfo.
That function does the actual work and is virtual to be overriden
by Process implementations.
A test case is added that runs on AArch64 Linux using the top
byte ignore feature.
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/D102757
This teaches the MVE gather scatter lowering pass that SHL is
essentially the same as Mul, where we are able to optimize the
induction of a gather/scatter address by pushing them out of loops.
https://alive2.llvm.org/ce/z/wG4VyT
Differential Revision: https://reviews.llvm.org/D112920
Implement two builtins to pack/unpack IBM extended long double float,
according to GCC 'Basic PowerPC Builtin Functions Available ISA 2.05'.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D112055
I've added a test for a loop containing a conditional uniform load for
a target that supports masked loads. The test just ensures that we
correctly use gather instructions and have the correct mask.
Differential Revision: https://reviews.llvm.org/D112619
- Provide the operator overloads for constructing (semi-)affine expressions in
Python by combining existing expressions with constants.
- Make AffineExpr, AffineMap and IntegerSet hashable in Python.
- Expose the AffineExpr composition functionality.
Reviewed By: gysit, aoyal
Differential Revision: https://reviews.llvm.org/D113010
Before this patch, flags such as undef were dropped by TII::insertBranch
(used by BranchFolding pass), resulting in the following error from
machine verifier:
*** Bad machine code: Reading virtual register without a def ***
- function: hoge
- basic block: %bb.0 bb (0x562e9c240e68)
- instruction: CBranch %2:int1regs, %bb.3
- operand 0: %2:int1regs
Differential Revision: https://reviews.llvm.org/D113001
This patch adds the substring information to the fircg.ext_embox and
fircg.ext_rebox operations.
Substring is used for CHARACTER types.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D112807
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
This patch attempts to fix a compiler crash that occurs when long
double type is used with -mno-x87 compiler option.
The option disables x87 target feature, which in turn disables x87
registers, so CG cannot select them for x86_fp80 LLVM IR type. Long
double is lowered as x86_fp80 for some targets, so it leads to a
crash.
The option seems to contradict the SystemV ABI, which requires long
double to be represented as a 80-bit floating point, and it also
requires to use x87 registers.
To avoid that, `long double` type is disabled when -mno-x87 option is
set. In addition to that, `float` and `double` also use x87 registers
for return values on 32-bit x86, so they are disabled as well.
Differential Revision: https://reviews.llvm.org/D98895
In D71220 a pattern was added to replace shuffle's insertelement operand
if inserted scalar is not demanded. The pattern was added only for
the case where the shuffle's mask size is equal to element's vector size.
However, that condition is not required because the pattern does not
change the shuffle vector size.
This patch extends the pattern to also include cases where shuffle's mask
size is not equal to element's vector size.
Differential Revision: https://reviews.llvm.org/D112318
This better decouples transfer read/write from vector-only rewrite of conv.
This form is close to ready to plop into a new vector.conv op and the vector.transfer operations to be generalized as part of generic vectorization once the properties ConvolutionOpInterface are inferred from the indexing maps.
This also results in a nice perf boost in the dw == 1 cases.
Differential revision: https://reviews.llvm.org/D112822
This refactoring prepares conv1d vectorization for a future integration into
the generic codegen path.
Once transfer_read / transfer_write vectorization also supports sliding windows,
the special pattern for conv can disappear.
This will also likely need a vector.conv operation.
Differential Revision: https://reviews.llvm.org/D112797
This reverts commit 5cbec88cbf.
Vitaly said that 2faac77f26 actually works.
Sanitizer's armv7-linux-androideabi24 configuration has other issues which haven't been identified yet, but that's unrelated to the empty symbol name issue.
If we assume SPMD-mode during the fixpoint iteration we have to execute
the kernel in SPMD-mode. If we change our mind during manifest there is
the chance of a mismatch between the simplification, e.g., of
`__kmpc_is_spmd_exec_mode` calls, and the execution mode. This problem
was introduced in D109438.
This patch is compromise to resolve the problem purely in OpenMP-opt
while trying to keep the benefits of D109438 around. This might not
always work, see `get_hardware_num_threads_in_block_fold` but it often
does. At the same time we do keep value specialization and execution
mode in sync.
Proper solutions to this problem should be considered. I believe a new
execution mode is the easiest way forward (Singleton-SPMD).
Alternatively, SPMD-mode execution can be used with a way to provide a
new thread_limit (here 1) to the runtime. This is more general and could
be useful if we see `num_threads` clauses or workshared loops with small
trip counts in the kernel. In either proposal we need to disable the
guarding for the kernel (which was the motivation for D109438).
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D112894
Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was
OK to be used in the custom state machine. Now that SPMD barriers are
assumed to be aligned we need to use a "generic" barrier in places
that are not aligned.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112893
A lot of NVVM intrinsics can use the default intrinsic attributes (e.g.,
nosync, nofree, ...) as well as `speculatable`. The latter is important
if we want to recompute intrinsics results instead of communicating them
via memory.
I did use default attributes for almost all `readnone` attributes but
speculatable only where I had reasonable confidence they cannot
experience UB. That said, someone should double check.
TODO: There seem to be various intrinsics marked `Commutative` which
should not, e.g., fma and div.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D109987
Global symbols cannot have any name so we need to sanitize the string
first. Also remove an assertion that is not actually necessary nor
true in general.
Reviewed By: ggeorgakoudis
Differential Revision: https://reviews.llvm.org/D112892
When we pick state 0 to initialize state but thread N is going to be the
"main thread", in generic mode, we would require extra synchronization.
Instead, we should pick the main thread to initialize state in generic
mode and any thread in SPMD mode.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112874
The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to
optimize AGPR to AGPR copy but it missed checking for the SGPR
clobbering for the S_MOV_B64_IMM_PSEUDO generation.
Differential Revision: https://reviews.llvm.org/D113005
Two things in this diff:
1) Warn on the invalid range, currently three types of checking, see the detailed message in the code.
2) In some situation, llvm-profgen gives lots of warnings on the truncated stacks which is noisy. This change provides a switch to `--show-detailed-warning` to skip the warnings. Alternatively, we use a summary for those warning and show the percentage of cases with those issues.
Example of warning summary.
```
warning: 0.05%(1120/2428958) cases with issue: Profile context truncated due to missing probe for call instruction.
warning: 0.00%(2/178637) cases with issue: Range does not belong to any functions, likely from external function.
```
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D111902
Data references in a loop should not access elements over the
statically allocated size. So we can infer a loop max trip count
from this undefined behavior.
Reviewed By: reames, mkazantsev, nikic
Differential Revision: https://reviews.llvm.org/D109821
The changes in D80163 defered the assignment of MachineMemOperand (MMO)
until the X86ExpandPseudo pass. This will result in crash due to prolog
insert point been sunk across the pseudo instruction VASTART_SAVE_XMM_REGS.
Moving the assignment to the creation of the node can avoid the problem.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D112859