The is from discussion in https://reviews.llvm.org/D104247#inline-993387
The contract and reassoc flags shouldn't imply each other .
All the aggressive fsub fusion reassociate operations,
we should guard them with reassoc flag check.
Reviewed By: mcberg2017
Differential Revision: https://reviews.llvm.org/D104723
For example, without this patch:
```
$ cat test.c
int main() {
int x;
#pragma omp target enter data map(alloc: x)
#pragma omp target enter data map(alloc: x)
#pragma omp target enter data map(alloc: x)
#pragma omp target exit data map(delete: x)
;
return 0;
}
$ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c
$ LIBOMPTARGET_DEBUG=1 ./a.out |& grep 'Creating\|Mapping exists\|last'
Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=1, Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (incremented), Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=3 (incremented), Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (decremented)
Libomptarget --> There are 4 bytes allocated at target address 0x00000000013bb040 - is not last
```
`RefCount` is reported as decremented to 2, but it ought to be reset
because of the `delete` map type, and `is not last` is incorrect.
This patch migrates the reset of reference counts from
`DeviceTy::deallocTgtPtr` to `DeviceTy::getTgtPtrBegin`, which then
correctly reports the reset. Based on the `IsLast` result from
`DeviceTy::getTgtPtrBegin`, `targetDataEnd` then correctly reports `is
last` for any deletion. `DeviceTy::deallocTgtPtr` is responsible only
for the final reference count decrement and mapping removal.
An obscure side effect of this patch is that a `delete` map type when
the reference count is infinite yields `DelEntry=IsLast=false` in
`targetDataEnd` and so no longer results in a
`DeviceTy::deallocTgtPtr` call. Without this patch, that call is a
no-op anyway besides some unnecessary locking and mapping table
lookups.
Reviewed By: grokos
Differential Revision: https://reviews.llvm.org/D104560
For example, without this patch:
```
$ cat test.c
int main() {
int x;
#pragma omp target enter data map(alloc: x)
#pragma omp target exit data map(release: x)
;
return 0;
}
$ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c
$ LIBOMPTARGET_DEBUG=1 ./a.out |& grep 'Creating\|Mapping exists'
Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, updated RefCount=1
```
There are two problems in this example:
* `RefCount` is not reported when a mapping is created, but it might
be 1 or infinite. In this case, because it's created by `omp target
enter data`, it's 1. Seeing that would make later `RefCount`
messages easier to understand.
* `RefCount` is still 1 at the `omp target exit data`, but it's
reported as `updated`. The reason it's still 1 is that, upon
deletions, the reference count is generally not updated in
`DeviceTy::getTgtPtrBegin`, where the report is produced. Instead,
it's zeroed later in `DeviceTy::deallocTgtPtr`, where it's actually
removed from the mapping table.
This patch makes the following changes:
* Report the reference count when creating a mapping.
* Where an existing mapping is reported, always report a reference
count action:
* `update suppressed` when `UpdateRefCount=false`
* `incremented`
* `decremented`
* `deferred final decrement`, which replaces the misleading
`updated` in the above example
* Add comments to `DeviceTy::getTgtPtrBegin` to explain why it does
not zero the reference count. (Please advise if these comments miss
the point.)
* For unified shared memory, don't report confusing messages like
`RefCount=` or `RefCount= updated` given that reference counts are
irrelevant in this case. Instead, just report `for unified shared
memory`.
* Use `INFO` not `DP` consistently for `Mapping exists` messages.
* Fix device table dumps to print `INF` instead of `-1` for an
infinite reference count.
Reviewed By: jhuber6, grokos
Differential Revision: https://reviews.llvm.org/D104559
Since we now have modules-enabled CI, it is now redundant to have ad-hoc
tests that check arbitrary things about our modules support. Instead,
the whole test suite should pass with modules enabled, period.
This patch also removes the module cache path workaround: one would
expect that modules work properly without that workaround. If that
isn't the case and we do run into flaky test failures, we can re-enable
the workaround temporarily (but that would be very vexing and we should
fix Clang ASAP if that's the case).
Differential Revision: https://reviews.llvm.org/D104746
According to https://eel.is/c++draft/over.literal
> double operator""_Bq(long double); // OK: does not use the reserved identifier _Bq ([lex.name])
> double operator"" _Bq(long double); // ill-formed, no diagnostic required: uses the reserved identifier _Bq ([lex.name])
Obey that rule by keeping track of the operator literal name status wrt. leading whitespace.
Fix: https://bugs.llvm.org/show_bug.cgi?id=50644
Differential Revision: https://reviews.llvm.org/D104299
I added asserts to these in https://reviews.llvm.org/D104525.
They are available (directly or otherwise) via the API so we
should not assert.
Restore the previous behaviour. If the message
is empty, we return early before printing anything.
For SetError don't assert that the error is a failure.
The remaining assert is in AppendRawError which
is not part of the API.
Reviewed By: teemperor
Differential Revision: https://reviews.llvm.org/D104778
The default Altivec ABI was implemented but the clang error for specifying
its use still remains. Users could get around this but not specifying the
type of Altivec ABI but we need to remove the error.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D102094
This changes the approach taken to tail-merge the blocks
to always create a new block instead of trying to reuse some block,
and generalizes it to support dealing not with just the `ret` in the future.
This effectively lifts the CallBr restriction, although this isn't really intentional.
That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it.
Other restrictions of the transform remain.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104598
This adds more poison folding optimizations to InstSimplify.
Since all binary operators propagate poison, these are fine.
Also, the precondition of `select cond, undef, x` -> `x` is relaxed to allow the case when `x` is undef.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104661
Replacing existing uses with AppendError.
SetError is also part of the SBI API. This remains
but instead of calling the underlying SetError it
will call AppendError.
Reviewed By: teemperor
Differential Revision: https://reviews.llvm.org/D104768
With regards to overrunning, the langref (llvm/docs/LangRef.rst)
specifies:
(llvm.experimental.vector.insert)
Elements ``idx`` through (``idx`` + num_elements(``subvec``) - 1)
must be valid ``vec`` indices. If this condition cannot be determined
statically but is false at runtime, then the result vector is
undefined.
(llvm.experimental.vector.extract)
Elements ``idx`` through (``idx`` + num_elements(result_type) - 1)
must be valid vector indices. If this condition cannot be determined
statically but is false at runtime, then the result vector is
undefined.
For the non-mixed cases (e.g. inserting/extracting a scalable into/from
another scalable, or inserting/extracting a fixed into/from another
fixed), it is possible to statically check whether or not the above
conditions are met. This was previously missing from the verifier, and
if the conditions were found to be false, the result of the
insertion/extraction would be replaced with an undef.
With regards to invalid indices, the langref (llvm/docs/LangRef.rst)
specifies:
(llvm.experimental.vector.insert)
``idx`` represents the starting element number at which ``subvec``
will be inserted. ``idx`` must be a constant multiple of
``subvec``'s known minimum vector length.
(llvm.experimental.vector.extract)
The ``idx`` specifies the starting element number within ``vec``
from which a subvector is extracted. ``idx`` must be a constant
multiple of the known-minimum vector length of the result type.
Similarly, these conditions were not previously enforced in the
verifier. In some circumstances, invalid indices were permitted
silently, and in other circumstances, an undef was spawned where a
verifier error would have been preferred.
This commit adds verifier checks to enforce the constraints above.
Differential Revision: https://reviews.llvm.org/D104468
Spin-off from D104740: I don't think this special handling is needed
anymore. Calls in textual IR are annotated with addrspace(N) (which
defaults to the program address space from data layout) and specifies
the expected pointer address space of the callee. There is no need
to special-case the program address space on top of that, as it
already is the default expected address space, and we shouldn't
allow use of the program address space if the call was explicitly
annotated with some other address space.
The IsCall parameter is retained because it will be used again soon.
Differential Revision: https://reviews.llvm.org/D104752
NFCI, although the test change shows that ConstantExpr::getAsInstruction
is better than the old implementation of createReplacementInstr because
it propagates things like the sdiv "exact" flag.
Differential Revision: https://reviews.llvm.org/D104124
This adds handling for signed predicates, similar to how unsigned
predicates are already handled.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104732
Don't use SCC iterators when we're only interested in reachability.
Use df_begin/df_end inline to find reachable nodes.
Differential Revision: https://reviews.llvm.org/D104704
This particular linker invocation is only run to check that we accept
options, but we don't inspect the generated command line. As all other
commands in the file have their output piped to FileCheck, the lit test
doesn't print any other output; therefore silence this one for consistency
as well.
This is consistent with how clang prints its internal commands with
-### and -v.
When linking with -verbose, we get log messages from the actual
linking written to stderr. By printing the command to the same stream,
we make sure they appear in a sensible chronological order.
Differential Revision: https://reviews.llvm.org/D104527
The patch changes the pretty printed FillOp operand order from output, value to value, output. The change is a follow up to https://reviews.llvm.org/D104121 that passes the fill value using a scalar input instead of the former capture semantics.
Differential Revision: https://reviews.llvm.org/D104356
During slice computation of affine loop fusion, detect one id as the mod
of another id w.r.t a constant in a more generic way. Restrictions on
co-efficients of the ids is removed. Also, information from the
previously calculated ids is used for simplification of affine
expressions, e.g.,
If `id1` = `id2`,
`id_n - divisor * id_q - id_r + id1 - id2 = 0`, is simplified to:
`id_n - divisor * id_q - id_r = 0`.
If `c` is a non-zero integer,
`c*id_n - c*divisor * id_q - c*id_r = 0`, is simplified to:
`id_n - divisor * id_q - id_r = 0`.
Reviewed By: bondhugula, ayzhuang
Differential Revision: https://reviews.llvm.org/D104614
Correct a documentation typo, and delete a duplicate test in
`pdl-to-pdl-interp-rewriter.mlir`.
Reviewed By: pr4tgpt, bondhugula, rriddle
Differential Revision: https://reviews.llvm.org/D104688
This reverts commit ea011ec5ed.
This still causes some miscompiles, I'll follow up in the phabricator
review with a sample of that issue (which is part of the sample of
the previous issue).
If an instruction has several operands and a PC-relative one is not the
first of them, the generator may produce the code that does not pass the
'Address' parameter to the printout method. For example, for an Arm
instruction 'LE LR, $imm', it reuses the same code as for other
instructions where the second operand is not PC-relative:
void ARMInstPrinter::printInstruction(...) {
...
case 11:
// BF16VDOTI_VDOTD, BF16VDOTI_VDOTQ, BF16VDOTS_VDOTD, ...
printOperand(MI, 1, STI, O);
O << ", ";
printOperand(MI, 2, STI, O);
break;
...
The patch fixes that by considering 'PCRel' when comparing
'AsmWriterOperand' values.
Differential Revision: https://reviews.llvm.org/D104698
Follow-up on Roman's idea expressed in D103959.
- If a Phi has undefined inputs from live blocks:
- and no other inputs, assume it is undef itself;
- and exactly one non-undef input, we can assume that all undefs are equal to this input.
Differential Revision: https://reviews.llvm.org/D104618
Reviewed By: lebedev.ri, nikic