Just short-circuit when a change was made, the erased value is invalid
after that. Found by asan.
This pass looks like it could use rewrite patterns instead which don't
have this issue, but let's fix the asan build first.
RISC-V expand register tuple spilling into series of register spilling after
register allocation phase by the pseudo instruction expansion, however part of
register tuple might be still undefined during spilling, machine verifier will
complain the spill instruction is using an undefined physical register.
Optimal solution should be doing liveness analysis and do not emit spill
and reload for those undefined parts, but accurate liveness info at that point
is not so easy to get.
So the suboptimal solution is still spill and reload those undefined parts, but
adding implicit-use of super register to spill function, then machine
verifier will only report report using undefined physical register if
the when whole super register is undefined, and this behavior are also
documented in MachineVerifier::checkLiveness[1].
Example for demo what happend:
```
v10m2 = xxx
# v12m2 not define yet
PseudoVSPILL2_M2 v10m2_v12m2
...
```
After expansion:
```
v10m2 = xxx
# v12m2 not define yet
# Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r
VS2R_V v10m2
VS2R_V v12m2 # Use undef reg!
```
What this patch did:
```
v10m2 = xxx
# v12m2 not define yet
# Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r
VS2R_V v10m2 implicit v10m2_v12m2
# Use undef reg (v12m2), but v10m2_v12m2 ins't totally undef, so
# that's OK.
VS2R_V v12m2 implicit v10m2_v12m2
```
[1] https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/MachineVerifier.cpp#L2016-L2019
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D127642
If `create-deallocs=0`, mark all bufferization.alloc_tensor ops as escaping. (Unless they already have an `escape` attribute.) In the absence of analysis information, check SSA use-def chains to see if the value may be yielded.
Differential Revision: https://reviews.llvm.org/D127302
Bufferization of the func dialect must go through `OneShotModuleBufferize`. With this change, the analysis interface methods of the BufferizableOpInterface of func dialect ops can be used together with the normal `OneShotBufferize`. (In the absence of analysis information, they will return conservative results.)
Differential Revision: https://reviews.llvm.org/D127299
As OpenMP 5.0, for firstprivate, lastprivate, copyin, and copyprivate
clauses, if the list item is a polymorphic variable with the allocatable
attribute, the behavior is unspecified.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D127601
Rather than invoking the linker directly, let the compiler driver
handle it. This ensures that we use the correct linker in the case
of cross-compiling.
Differential Revision: https://reviews.llvm.org/D127828
scf::ForOp and scf::WhileOp must insert buffer copies not only for out-of-place bufferizations, but also to enforce additional invariants wrt. to buffer aliasing behavior. This is currently happening in the respective `bufferize` methods. With this change, the tensor copy insertion pass will also enforce these invariants by inserting copies. The `bufferize` methods can then be simplified and made independent of the `AnalysisState` data structure in a subsequent change.
Differential Revision: https://reviews.llvm.org/D126822
VSETVLIInfos right after VLEFF/VLSEGFF are currently unknown since they modify
VL. Unknown VSETVLIInfos make next vector operations needed to be inserted
VSET(I)VLI. Actually the next vector operation of VLEFF/VLSEGFF may not need to
be inserted VSET(I)VLI if it uses same VTYPE and the resulted vl of
VLEFF/VLSEGFF.
Take the below C code as an example,
vint8m4_t vec_src1 = vle8ff_v_i8m4(str1, &new_vl, vl);
vbool2_t mask1 = vmseq_vx_i8m4_b2(vec_src1, 0, new_vl);
vsetvli insertion adds a redundant vsetvli for that,
Assembly result:
vsetvli a2,a2,e8,m4,ta,mu
vle8ff.v v28,(a0)
csrr a3,vl ; redundant
vsetvli zero,a3,e8,m4,ta,mu ; redundant
vmseq.vi v25,v28,0
After D126794, VLEFF/VLSEGFF has a define having value of VL. The patch consider
there is a ghost vsetvli right after VLEFF/VLSEGFF. The ghost VSET(I)LIs use the
vl output of the VLEFF/VLSEGFF as its AVL and same VTYPE of the VLEFF/VLSEGFF.
The ghost vsetvli must be redundant, and we could use it to get the VSETVLIInfo
right after VLEFF/VLSEGFF.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D127576
Turn off RemoveBracesLLVM while analyzing InsertBraces and vice
versa to avoid potential interference of each other and better the
performance.
Differential Revision: https://reviews.llvm.org/D127685
The sched_barrier builtin allow the scheduler's behavior to be shaped by users
when very specific codegen is needed in order to create highly optimized code.
This patch adds more granular control over the types of instructions that are
allowed to be reordered with respect to one or multiple sched_barriers. A mask
is used to specify groups of instructions that should be allowed to be scheduled
around a sched_barrier. The details about this mask may be used can be found in
llvm/include/llvm/IR/IntrinsicsAMDGPU.td.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D127123
This change adds test cases targeting the AArch64 Linux platform to
the ORC runtime integration test suite.
Reviewed By: lhames, sunho
Differential Revision: https://reviews.llvm.org/D127720
For the 'thread until' command, the selected thread ID, to perform the operation on, could be of the current thread or the specified thread.
Reviewed By: jingham
Differential Revision: https://reviews.llvm.org/D48865
Per GLSL Pow extended instruction spec: "Result is undefined if
x < 0. Result is undefined if x = 0 and y <= 0." So we need to
handle negative `x` values specifically.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D127816
Since almost all pseudos have the same form of BaseInstr, we
can just set it as default value to reduce some lines.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D127632
1. Support user specified linker (-fuse-ld)
2. Support user specified linker script (-T)
Reviewed By: MaskRay, haowei
Differential Revision: https://reviews.llvm.org/D126192
Make default loop tiling options explicit from CLI options. We can also set default value for separate option which is declared implicitly.
Reviewed By: ayzhuang
Differential Revision: https://reviews.llvm.org/D127711
Summary:
The static linking test ensures that we can statically link offloading
programs. To create the test we used `llvm-ar`. However, this may not
exist in the user's environment. This patch changes it to use the
binutils `ar` which should exist on every system running these tests
currently. In the future we should set up the dependencies properly.
For amdgpu target long double type is the same as double type.
The width and align of long double type was incorrectly
overridden when copying aux target properties, which
caused assertion in codegen when emitting global
variables with long double type.
This patch fix that by saving and restoring width
and align of long double type.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D127771
Fixes: SWDEV-335515
After the frameindex is resolved, the offset can be negative. It would
be materialized as unsigned integer and can still calculated by add instruction.
A new tableGen backend gen-dxil-enum is added to generate enum for DXIL operation and operation class.
A new file "DXILConstants.inc" will be generated when build DirectX target which include the enums.
More tableGen backends will be added to replace manually written table in DirectX backend.
The unused fields in dxil_inst will be used in future PR.
Reviewed By: bogner
Differential Revision: https://reviews.llvm.org/D125435