We do not keep the actual value of the CIE ID field, because it is
predefined, and use a constant when dumping a CIE record. The issue
was that the predefined value is different for .debug_frame and
.eh_frame sections, but we always printed the one which corresponds
to .debug_frame. The patch fixes that by choosing an appropriate
constant to print.
See the following for more information about .eh_frame sections:
https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html
Differential Revision: https://reviews.llvm.org/D73627
In code generators, one can automate the translation of typed ArrayAttrs
if element attribute translators are already implemented. However, the
type of the element attribute is lost at the construction of
TypedArrayAttrBase. With this change one can inspect the element type
and generate the translation logic automatically, which will reduce the
code repetition.
Differential Revision: https://reviews.llvm.org/D73579
Finalization can introduce new blocks we need to outline as well so it
makes sense to identify the blocks that need to be outlined after
finalization happened. There was also a minor unit test adjustment to
account for the fact that we have a single outlined exit block now.
Summary:
Even though this test is a check for failure, lld still attempts
to open the final output file, which fails when the default "a.out"
file is used and the current directory is read-only. Specifying an
output file works around this problem.
Reviewers: espindola
Subscribers: emaste, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74523
On KNL targets, we widen 128/256-bit strict_fsetcc nodes to
512-bits without forcing the upper bits to zero. This can cause
spurious exceptions due to garbage upper bits. This behavior was
inherited from the non-strict case where the spurious exception
isn't a problem.
Currently, BPF does not support dynamic static allocation.
For a program like below:
extern void bar(int *);
void foo(int n) {
int a[n];
bar(a);
}
The current error message looks like:
unimplemented operand
UNREACHABLE executed at /.../llvm/lib/Target/BPF/BPFISelLowering.cpp:199!
Let us make error message explicit so it will be clear to the user
what is the problem. With this patch, the error message looks like:
fatal error: error in backend: Unsupported dynamic stack allocation
...
Differential Revision: https://reviews.llvm.org/D74521
Reapply 8a56d64d76 with minor fixes.
The problem was that cancellation can cause new edges to the parallel
region exit block which is not outlined. The CodeExtractor will encode
the information which "exit" was taken as a return value. The fix is to
ensure we do not return any value from the outlined function, to prevent
control to value conversion we ensure a single exit block for the
outlined region.
This reverts commit 3aac953afa.
Patchable statepoint is lowered into sequence of nops, so zeroed call target
should not be on register. It is better to use getTargetConstant instead
of getConstant to select zero constant for call target.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D74465
Summary:
As discussed in https://llvm.discourse.group/t/rfc-add-affine-parallel/350, this is the first in a series of patches to bring in support for the `affine.parallel` operation.
This first patch adds the IR representation along with custom printer/parser implementations.
Reviewers: bondhugula, herhut, mehdi_amini, nicolasvasilache, rriddle, earhart, jbruestle
Reviewed By: bondhugula, nicolasvasilache, rriddle, earhart, jbruestle
Subscribers: jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74288
When SI_IF is inserted, it constrains the source register with a
register class, which was quite likely a G_ICMP. This was incorrectly
treating it as a scalar, and then applyMappingImpl would end up
producing invalid MIR since this was unexpected.
Also fix not using all VGPR sources for vcc outputs.
These tests fail when the default is switched to assume IEEE denormal
handling. I'm not sure if PPC really has a way to control the denormal
input handling.
This patch is replacements missed in my last change doing this across LLVM.
No functional change, although I think there was a missing typename
in struct conjunction that is now fixed.
In order to fix PR44560 and to prepare for loop transformations we now
finalize a function late, which will also do the outlining late. The
logic is as before but the actual outlining step happens now after the
function was fully constructed. Once we have loop transformations we
can apply them in the finalize step before the outlining.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D74372
This is a longstanding bug that seems to have been hidden by
a combination of (1) the normal flow being to deserialize the
interface before deserializing its parameter and (2) a precise
ordering of work that was apparently recently disturbed,
perhaps by my abstract-serialization work or Bruno's ObjC
module merging work.
Fixes rdar://59153545.
We used coarse-grained liveness before, thus we looked if the
instruction was executed, but we did not use fine-grained liveness,
hence if the instruction was needed or could be deleted even if the
surrounding ones are live. This patches introduces this level of
liveness checks together with other liveness queries, e.g., for uses.
For more control we enforce that all liveness queries go through the
Attributor.
Test have been adjusted to reflect the changes or augmented to prevent
deletion of the parts we want to check.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D73313
If we have a replacement for a value, via AAValueSimplify, the original
value will lose all its uses. Thus, as long as a value is simplified we
can skip the uses in checkForAllUses, given that these uses are
transitive uses for the simplified version and will therefore affect the
simplified version as necessary.
Since this allowed us to remove calls without side-effects and a known
return value, we need to make sure not to eliminate `musttail` calls.
Those we keep around, or later remove the entire `musttail` call chain.
We relied on wouldInstructionBeTriviallyDead before but that functions
does not take assumed information, especially for calls, into account.
The replacement, AAIsDead::isAssumeSideEffectFree, does.
This change makes AAIsDeadCallSiteReturn more complex as we can have
a dead call or only dead users.
The test have been modified to include a side effect where there was
none in order to keep the coverage.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D73311
Summary: The 5.0 spec states, "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine." The attached test shows Max Threads: 96, Num Threads: 128 without the proposed change. The number of threads should not exceed the (max) nthreads ICV, hence we should return the higher SPMD thread number even when omp_get_max_threads() is called in a generic kernel. This change does fail the api test, max_threads.c, because now it would return 64 instead of 32.
Reviewers: jdoerfert, ABataev, grokos, JonChesterfield
Reviewed By: jdoerfert
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D74092
Summary:
[libomptarget][nfc] Change enum values to match those in cuda/rtl
support.h and cuda/rtl.cpp (and downsteam hsa/rtl.cpp) have enums for execution
mode. These are actually independent - the numbers that used within support, or
within the plugin, are never passed across the boundary.
Nevertheless, trying to work out why the values are different between the two
has generated a reasonable amount of confusion. This patch changes support to
match the values in plugin, on the basis that the plugin also has some comments
which I'd have to update if I changed that one instead. Credit to Ron for
working through this in our own fork. See rocm-developer-tools/aomp/issues/7
for that earlier diagnostic write up.
Also happy with generic = 0, spmd = 1 - provided it's the same in both places.
Reviewers: jdoerfert, grokos, ABataev, ronlieb
Reviewed By: grokos
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D74503
Current tail duplication embedded in MBP duplicates a BB into all or none of its predecessors without too much cost analysis. So sometimes it is duplicated into cold predecessors, and in other cases it may miss the duplication into hot predecessors.
This patch improves tail duplication in 3 aspects:
A successor can be duplicated into part of its predecessors.
A more fine-grained benefit analysis, combined with 1, now a successor is duplicated into hot predecessors only.
If a successor can't be duplicated into one predecessor, it doesn't impact the duplication into other predecessors.
Differential Revision: https://reviews.llvm.org/D73387
As discussed in https://reviews.llvm.org/D74447, this patch disables integrated-cc1 behavior if there's more than one job to be executed. This is meant to limit memory bloating, given that currently jobs don't clean up after execution (-disable-free is always active in cc1 mode).
I see this behavior as temporary until release 10.0 ships (to ease merging of this patch), then we'll reevaluate the situation, see if D74447 makes more sense on the long term.
Differential Revision: https://reviews.llvm.org/D74490
Move the logic for initialization and termination for DynamicLoaderMacOS
into DynamicLoaderMacOSXDYLD so that there's one initializer for the
DynamicLoaderMacOSXDYLD plugin.
This patch adapts the method MemRefDescriptor::fromStaticShape to
support static non-zero offsets. The updated method uses the
getStridesAndOffset method to extract strides and offset. The patch also
adapts the test cases since sizes and strides are now set in forward
instead of reverse order.
Differential Revision: https://reviews.llvm.org/D74474
This patch is a follow up to 878a24ee24. Name of bitfields
with value-dependent width should be set as type-dependent. This
patch adds the required value-dependency check and sets the
type-dependency accordingly.
Patch fixes PR44886
Differential revision: https://reviews.llvm.org/D72242
Tablegen's DAGISelMatcher emits integers in a VBR format,
so if an integer is below 128 it can fit into a single
byte, otherwise high bit is set, next byte is used etc.
MatcherTable is essentially an unsigned char table. When
SelectionDAGISel parses the table it does a reverse translation.
In a situation when numeric value of an integer to emit is
unknown it can be emitted not as OPC_EmitInteger but as
OPC_EmitStringInteger using a symbolic name of the value.
In this situation the value should not exceed 127.
One of the situations when OPC_EmitStringInteger is used is
if we need to emit a subreg into a matcher table. However,
number of subregs can exceed 127. Currently last defined subreg
for AMDGPU is 192. That results in a silent bug in the ISel
with matcher reading from an invalid offset.
Fixed this bug to emit actual VBR encoded value for a subregs
which value exceeds 127.
Differential Revision: https://reviews.llvm.org/D74368