Implements proper (de-)serialization logic for BranchConditionalOp when
such ops have true/false target operands.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D101602
Replace all `linalg.indexed_generic` ops by `linalg.generic` ops that access the iteration indices using the `linalg.index` op.
Differential Revision: https://reviews.llvm.org/D101612
This is a simple fix on LE. On BE, vector shuffles are categorized into
different ops. We may need more work to eliminate these in
tablegen/pre-isel.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D101605
Lorenz Bauer reported an issue in bpf mailing list ([1]) where
for FIELD_EXISTS relocation, if the object is an array subscript,
the patched immediate is the object offset from the base address,
instead of 1.
Currently in BPF AbstractMemberAccess pass, the final offset
from the base address is the patched offset except FIELD_EXISTS
which is 1 unconditionally. In this particular case, the last
data structure access is not a field (struct/union offset)
so it didn't hit the place to set patched immediate to be 1.
This patch fixed the issue by checking the relocation type.
If the type is FIELD_EXISTS, just set to 1.
Tested by modifying some bpf selftests, libbpf is okay with
such types with FIELD_EXISTS relocation.
[1] https://lore.kernel.org/bpf/CACAyw99n-cMEtVst7aK-3BfHb99GMEChmRLCvhrjsRpHhPrtvA@mail.gmail.com/
Differential Revision: https://reviews.llvm.org/D102036
Follow up on 431e3138a and complete the other possible combinations.
Besides enforcing the new behavior, it also mitigates TSAN false positives when
combining orders that used to be stronger.
The pattern to convert subtensor ops to their rank-reduced versions
(by dropping unit-dims in the result) can also convert to a zero-rank
tensor. Handle that case.
This also fixes a OOB access bug in the existing pattern for such
cases.
Differential Revision: https://reviews.llvm.org/D101949
The linker suggests using -Wl,-z,notext.
Replaced assert by exit also fixed this.
After renaming, interceptor.c would be used to test interceptors in general by D101204.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D101649
Nearly complete alignment to spec v0.22
- Adds Div op
- Concat inputs now variadic
- Removes Placeholder op
Note: TF side PR https://github.com/tensorflow/tensorflow/pull/48921 deletes Concat legalizations to avoid breaking TensorFlow CI. This must be merged only after the TF PR has merged.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D101958
[libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one
D101976 would require a second barrier instance. This NFC to amdgpu makes it
simpler to add one (an extra global, one more line in init). Also renames the
current barrier to L0.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102016
This patch converts llvm.memcpy intrinsic into Tail Predicated
Hardware loops for a target that supports the Arm M-profile
Vector Extension (MVE).
From an implementation point of view, the patch
- adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel)
- adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter,
on matching the above node.
- Adds a custom inserter function that expands the pseudo instruction into MIR suitable
to be (by later passes) into a WLSTP loop.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D99723
[libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin
Drops an enum that was identical to a HSA one, localises some functions where
they were only called from one TU. Covers everything internalize + adce can
identify as dead, except for msgpack::dump which is useful when debugging.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102014
This looks like just an oversight in the AsyncThread function. It gets a result of
eStateInvalid, and then marks the process as exited, but doesn't set "done" to true,
so we go to fetch another event. That is not safe, since you don't know when that
extra packet is going to arrive. If it arrives while you are tearing down the
process, the internal-state-thread might try to handle it when the process in not
in a good state.
Rather than put more effort into checking all the shutdown paths to make sure this
extra packet doesn't cause problems, just don't fetch it. We weren't going to do
anything useful with it anyway.
The main part of the patch is setting "done = true" when we get the eStateInvalid.
I also added a check at the beginning of the while(done) loop to prevent another error
from getting us to fetch packets for an exited process.
I added a test case to ensure that if an Interrupt fails, we call the process
exited. I can't test exactly the error I'm fixing, there's no good way to know
that the stop reply for the failed interrupt wasn't fetched. But at least this
asserts that the overall behavior is correct.
Differential Revision: https://reviews.llvm.org/D101933
We were modifying precisely when intersecting the lock sets of multiple
predecessors without back edge. That's no coincidence: we can't modify
on back edges, it doesn't make sense to modify at the end of a function,
and otherwise we always want to intersect on forward edges, because we
can build a new lock set for those.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D101755
This lint check is a part of the FLOCL (FPGA Linters for OpenCL) project
out of the Synergy Lab at Virginia Tech.
FLOCL is a set of lint checks aimed at FPGA developers who write code
in OpenCL.
The altera ID dependent backward branch lint check finds ID dependent
variables and fields used within loops, and warns of their usage. Using
these variables in loops can lead to performance degradation.
We can end up with a call to `indexTopLevelDecl(D)` with `D == nullptr` in non-assert builds e.g. when indexing a module in `indexModule` and
- `ASTReader::GetDecl` returns `nullptr` if `Index >= DeclsLoaded.size()`, thus returning `nullptr`
=> `ModuleDeclIterator::operator*` returns `nullptr`
=> we call `IndexCtx.indexTopLevelDecl` with `nullptr`
Be resilient and just ignore the `nullptr` decls during indexing.
Reviewed By: akyrtzi
Differential Revision: https://reviews.llvm.org/D102001
It is currently stored in the high bits, which is disallowed on certain
platforms (e.g. android). This revision switches the representation to use
the low bits instead, fixing crashes/breakages on those platforms.
Differential Revision: https://reviews.llvm.org/D101969
The CGSCC pass manager interplay with the FunctionAnalysisManagerCGSCCProxy is 'special' in the sense that the former will rerun the latter if there are changes to a SCC structure; that being said, some of the functions in the SCC may be unchanged. In that case, the function simplification pipeline will be re-run, which impacts compile time[1].
This patch allows the function simplification pipeline be skipped if it was already run and the function was not modified since.
The behavior is currently disabled by default. This is because, currently, the rerunning of the function simplification pipeline on an unchanged function may still result in changes. The patch simplifies investigating and fixing those cases where repeated function pass runs do actually positively impact code quality, while offering an easy workaround for those impacted negatively by compile time regressions, and not impacting mainline scenarios.
[1] A [[ http://llvm-compile-time-tracker.com/compare.php?from=eb37d3546cd0c6e67798496634c45e501f7806f1&to=ac722d1190dc7bbdd17e977ef7ec95e69eefc91e&stat=instructions | compile time tracker ]] run with the option enabled.
Differential Revision: https://reviews.llvm.org/D98103
Use correct spelling of CMAKE_OSX_DEPLOYMENT_TARGET and bump the
minimum version to 10.13 which matches what we use for host tools
in Fuchsia.
Differential Revision: https://reviews.llvm.org/D102013
Summary:
This NFC patch replaces the representation of registers and the left shift operator in the PowerPC assembly code to allow it to be consumed by the GNU flavored assembler and the AIX assembler.
* Registers - change the representation of PowperPC registers from %rn, %fn, %vsn, and %vrn to the register number alone, e.g., n. The GNU flavored assembler and the AIX assembler are able to determine the register kind based on the context of the instruction in which the register is used.
* Left shift operator - use macro PPC_LEFT_SHIFT to represent the left shift operator. The left shift operator in the AIX assembly language is < instead of <<
Reviewed by: sfertile, MaskRay, compnerd
Differential Revision: https://reviews.llvm.org/D101179
Use result_type for the IMPLICIT_DEF in masked vector patterns.
This doesn't matter today because result_type and op_type are
always the same.
Use multiclass inheritance to reduce repeated code.
Add InputNamelist and OutputNamelist as I/O data transfer APIs
to be used with internal & external list-directed I/O; delete the
needless original namelist-specific Begin... calls.
Implement NAMELIST output and input; add basic tests.
Differential Revision: https://reviews.llvm.org/D101931
The code that initializes the default units 5 & 6 had
a race condition that would allow threads access to the
unit map before it had been populated.
Also add some missing calls to va_end() that will never
be called (they're in program abort situations) but might
elicit warnings if absent.
Differential Revision: https://reviews.llvm.org/D101928
What the Fortran standard calls "preconnected" external I/O units
might not be known to be connected to unformatted or formatted files
until the first I/O data transfer statement is executed.
Support this deferred determination by representing the flag as
a tri-state Boolean and adapting its points of use.
Differential Revision: https://reviews.llvm.org/D101929
This expose a lambda control instead of just a boolean to control unit
dimension folding.
This however gives more control to user to pick a good heuristic.
Folding reshapes helps fusion opportunities but may generate sub-optimal
generic ops.
Differential Revision: https://reviews.llvm.org/D101917
The builtins were updated to take signed parameters in 627a526955, but the
intrinsics that use those builtins were not updated as well. The intrinsic test
did not catch this sign mismatch because it is only reported as an error under
-fno-lax-vector-conversions.
This commit fixes the type mismatch and adds -fno-lax-vector-conversions to the
test to catch similar problems in the future.
Differential Revision: https://reviews.llvm.org/D101979
This fixes an issue where mixed TOC / NOTOC calls can call the incorrect
thunks if a previous thunk already exists. The issue appears when a TOC
funciton calls a NOTOC callee and then a different NOTOC function calls the same
NOTOC callee. In this case the linker would sometimes incorrectly call the
same thunk for both cases.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101837
As mentioned before in D78813, currently the XCOFF backend does not
support writing 64-bit object files, which the ORC JIT tests will try to
exercise if we are on AIX. This patch disables the tests on AIX for now.
This is consistent with what's been done, for example, regarding
`armv7`.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D101971
Expand 128 bit shifts instead of using a libcall.
This patch removes the 128 bit shift libcalls and thereby causes
ExpandShiftWithUnknownAmountBit() to be called.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D101993
Rename RVInstR4 as used by F/D/Zfh extensions to RVInstR4Frm.
Introduce new RVInstR4 that takes funct3 as a parameter.
Add new format classes for FSRI and FSRIW instead of trying to
bend RVInstR4 to use a shamt overlayed on rs2 and funct2.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D100427
Treat them just like we do for properties - as a `property` semantic
token although ideally we could differentiate the two.
Differential Revision: https://reviews.llvm.org/D101785
AMDGPUAsmParser::isSupportedDPPCtrl() was failing to correctly
find a DPP register operand, regadless of the position it is
always src0. Moved this check into a new validateDPP() method
where we have full instruction already. In particular it was
failing to reject this case:
v_cvt_u32_f64 v5, v[0:1] quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf
Essentially it was broken for any case where size of dst and
src0 differ.
It also improves the diagnostics with a proper error message.
The check in the InstPrinter also drops verification of the dst
register as it does not have anything to do with the dpp operand.
Differential Revision: https://reviews.llvm.org/D101930