Current DWARFLinker implementation does not support some debug sections
(mainly DWARF v5 sections). This patch adds diagnostic for such sections.
The warning would be displayed for critical(such that could not be removed)
sections and the source file would be skipped. Other unsupported sections
would be removed and warning message should be displayed. The zero exit
status would be returned for both cases.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D123623
This is used to fix a bug in SymbolTable::replaceAllSymbolUses where we replace symbols that
we shouldn't.
Differential Revision: https://reviews.llvm.org/D130693
This builtin allows the creation of custom scheduling pipelines on a per-region
basis. Like the sched_barrier builtin this is intended to be used either for
testing, in situations where the default scheduler heuristics cannot be
improved, or in critical kernels where users are trying to get performance that
is close to handwritten assembly. Obviously using these builtins will require
extra work from the kernel writer to maintain the desired behavior.
The builtin can be used to create groups of instructions called "scheduling
groups" where ordering between the groups is enforced by the scheduler.
__builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter
is a mask that determines the types of instructions that you would like to
synchronize around and add to a scheduling group. These instructions will be
selected from the bottom up starting from the sched_group_barrier's location
during instruction scheduling. The second parameter is the number of matching
instructions that will be associated with this sched_group_barrier. The third
parameter is an identifier which is used to describe what other
sched_group_barriers should be synchronized with. Note that multiple
sched_group_barriers must be added in order for them to be useful since they
only synchronize with other sched_group_barriers. Only "scheduling groups" with
a matching third parameter will have any enforced ordering between them.
As an example, the code below tries to create a pipeline of 1 VMEM_READ
instruction followed by 1 VALU instruction followed by 5 MFMA instructions...
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 1 VALU
__builtin_amdgcn_sched_group_barrier(2, 1, 0)
// 5 MFMA
__builtin_amdgcn_sched_group_barrier(8, 5, 0)
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 3 VALU
__builtin_amdgcn_sched_group_barrier(2, 3, 0)
// 2 VMEM_WRITE
__builtin_amdgcn_sched_group_barrier(64, 2, 0)
Reviewed By: jrbyrnes
Differential Revision: https://reviews.llvm.org/D128158
Summary:
The AIX linker does not merge typeinfos when shared libraries are involved, which causes address comparison to fail although the types are the same. This patch changes to use the non-unique implementation for typeinfo comparison for AIX.
Reviewed by: hubert.reinterpretcast, philnik, libc++
Differential Revision: https://reviews.llvm.org/D130715
This avoids a vmerge at the end and avoids spurious fflags updates.
This isn't used for constrained intrinsic so we technically don't have
to worry about fflags, but it doesn't cost much to support it.
To support I've extend our FCOPYSIGN_VL node to support a passthru
operand. Similar to what was done for VRGATHER*_VL nodes.
I plan to do a similar update for trunc, floor, and ceil.
Reviewed By: reames, frasercrmck
Differential Revision: https://reviews.llvm.org/D130659
Current implementation of decomposition of Linalg operations wouldnt
work if the `outs` operand values were used within the body of the
operation. Relax this restriction. This potentially sets the stage for
decomposing ops with reduction iterator types (but is not done here
since it requires more study).
Differential Revision: https://reviews.llvm.org/D130527
While The tiling interface provides a mechanism for operations to be
tiled into tiled version of the op (or another op at the same level of
abstraction), the `generateScalarImplementation` method added here is
the "exit point" after all transformations have been done. Ops that
implement this method are expected to generate IR that are directly
lowerable to backend dialects like LLVM or SPIR-V dialects.
Differential Revision: https://reviews.llvm.org/D130612
GetDemandedBits is mainly a wrapper around SimplifyMultipleUseDemandedBits now, and is only used by DAGCombiner::visitSTORE so I've moved all remaining functionality there.
visitSTORE was making use of this to 'simplify' constants for a trunc-store. Just removing this code left to a mixture of regressions and gains - it came down to whether a target preferred a sign or zero extended constant for materialization/truncation. I've just moved the code over for now, but a next step would be to move this to targetShrinkDemandedConstant, but some targets that override the method expect a basic binop, and might react badly to a store node.....
We already had the reasoning about uniform mem op loads; if the address is accessed at least once, we know the instruction doesn't need predicated to ensure fault safety. For stores, we do need to ensure that the values visible in memory are the same with and without predication. The easiest sub-case to check for is that all the values being stored are the same. Since we know that at least one lane is active, this tells us that the value must be visible.
Warning on confusing terminology: "uniform" vs "uniform mem op" mean two different things here, and this patch is specific to the later. It would *not* be legal to make this same change for merely "uniform" operations.
Differential Revision: https://reviews.llvm.org/D130637
This supports lowering from parse-tree to MLIR and translation from
MLIR to LLVM IR using OMPIRBuilder for OpenMP simdlen clause in SIMD
construct.
Reviewed By: shraiysh, peixin, arnamoy10
Differential Revision: https://reviews.llvm.org/D130195
Will allow plugins to migrate away from using global variables to
manage lifetime, which will fix a segfault discovered in relation to D127432
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D130712
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D130012
This is pretty straightforward, it just adds a builtin to return a
pointer to a resource handle. This maps to a dx intrinsic.
The shape of this builtin and the underlying intrinsic will likely
shift a bit as this implementation becomes more feature complete, but
this is a good basis to get started.
Depends on D128569.
Differential Revision: https://reviews.llvm.org/D130016
Most of the change here is fleshing out the HLSLExternalSemaSource with
builder implementations to build the builtin types. Eventually, I may
move some of this code into tablegen or a more managable declarative
file but I want to get the AST generation logic ready first.
This code adds two new types into the HLSL AST, `hlsl::Resource` and
`hlsl::RWBuffer`. The `Resource` type is just a wrapper around a handle
identifier, and is largely unused in source. It will morph a bit over
time as I work on getting the source compatability correct, but for now
it is a reasonable stand-in. The `RWBuffer` type is not ready for use.
I'm posting this change for review because it adds a lot of
infrastructure code and is testable.
There is one change to clang code outside the HLSL-specific logic here,
which addresses a behavior change introduced a long time ago in
967d438439. That change resulted in unintentionally breaking
situations where an incomplete template declaration was provided from
an AST source, and needed to be completed later by the external AST.
That situation doesn't happen in the normal AST importer flow, but can
happen when an AST source provides incomplete declarations of
templates. The solution is to annotate template specializations of
incomplete types with the HasExternalLexicalSource bit from the base
template.
Depends on D128012.
Differential Revision: https://reviews.llvm.org/D128569
I'm actually trying to get rid of GetDemandedBits - but while dismantling it I noticed that we were altering opaque constants. Fixing that causes a FP_TO_INT_SAT regression that should be addressed separately - I'll raise a bug.
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D130012
These vdup and vmov float16 intrinsics are being defined in both the
general section and then again in fp16 under a !aarch64 flag. The
vdup_lane intrinsics were being defined in both aarch64 and !aarch64
sections, so have been commoned. They are defined as macros, so do not
give duplicate warnings, but removing the duplicates shouldn't alter the
available intrinsics.
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.
This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits.
There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP.
Differential Revision: https://reviews.llvm.org/D77804
It simplifies the code overall and removes the need for manual bookkeeping.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D130447
It simplifies the code overall and removes the need for manual bookkeeping.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D130444
In the 2e29b0138c we introduce a specific solving algorithm
that analyzes the VGPR to SGPR copies use chains and either lowers
the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms.
Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs
in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs
are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert
copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues.
At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain
were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF
because of the weird mixing of SALU and VALU instructions.
This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact
that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs,
we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common
VGPR to SGPR lowering algorithm.
This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D130367
Add host exception support check utility flag. This is needed to not run tests that require exception support in few buildbots that lacks related symbols for some reason.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D129242
Fix "JIT session error: Symbols not found: [ DW.ref.__gxx_personality_v0 ] error" which happens when trying to use exceptions on ppc linux. To do this, it expands AutoClaimSymbols option in RTDyldObjectLinkingLayer to also claim weak symbols before they are tried to be resovled. In ppc linux, DW.ref symbols is emitted as weak hidden symbols in the later stage of MC pipeline. This means when using IRLayer (i.e. LLJIT), IRLayer will not claim responsibility for such symbols and RuntimeDyld will skip defining this symbol even though it couldn't resolve corresponding external symbol.
Reviewed By: sgraenitz
Differential Revision: https://reviews.llvm.org/D129175
The patch mainly focuses on the lack of warnings for
-Wtautological-compare. It works fine for positive numbers but doesn't
for negative numbers. This is because the warning explicitly checks for
an IntegerLiteral AST node, but -1 is represented by a UnaryOperator
with an IntegerLiteral sub-Expr.
For the below code we have warnings:
if (0 == (5 | x)) {}
but not for
if (0 == (-5 | x)) {}
This patch changes the analysis to not look at the AST node directly to
see if it is an IntegerLiteral, but instead attempts to evaluate the
expression to see if it is an integer constant expression. This handles
unary negation signs, but also handles all the other possible operators
as well.
Fixes#42918
Differential Revision: https://reviews.llvm.org/D130510
This works with any logic + extend:
https://alive2.llvm.org/ce/z/vzsqQD
The motivating case is from issue #56294, but that's still not optimal
(it should simplify completely).