In various circumstances, when we clobber a register there may be
alternative locations that the value is live in. The classic example would
be a value loaded from the stack, and then clobbered: the value is still
available on the stack. InstrRefBasedLDV was coping with this at block
starts where it's forced to pick a location, however it wasn't searching
for alternative locations when values were clobbered.
This patch notifies the "Transfer Tracker" object when clobbers occur, and
it's able to find alternatives and issue DBG_VALUEs for that location. See:
the added test.
Differential Revision: https://reviews.llvm.org/D88405
The remarks will trigger on some functions that are marked cold, such as the
`__muldc3` intrinsic functions. Change the remarks to avoid these functions.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105196
This displays as: `Size: 4 bytes (+4 padding)`
Also stop showing (byte) offset/size for bitfields. They're not
meaningful and using them to calculate padding is dangerous!
Differential Revision: https://reviews.llvm.org/D98377
We have analogous rules in instsimplify, etc.., but were missing the same in SCEV. The fold is near trivial, but came up in the context of a larger change.
The various design docs have been moved to RST, and the linked blog post
does not apply anymore since libc++ is the default library used by Clang
on Apple platforms.
I believe this Changed flag should be initialized to false,
otherwise the if (!Changed) is always dead. This doesn't
manifest in a functional issue because the PHINode checks will
fail if nothing changed. They are identical to the earlier
checks that must have already failed to get into this else block.
While there remove an else after return to reduce indentation.
Differential Revision: https://reviews.llvm.org/D105159
This patch augments Lit with the ability to parse regular expressions
in boolean expressions. This includes REQUIRES:, XFAIL:, UNSUPPORTED:,
and all other special Lit markup that evaluates to a boolean expression.
Regular expressions can be specified by enclosing them in {{...}},
similarly to how FileCheck handles such regular expressions. The regular
expression can either be on its own, or it can be part of an identifier.
For example, a match expression like {{.+}}-apple-darwin{{.+}} would match
the following variables:
x86_64-apple-darwin20.0
arm64-apple-darwin20.0
arm64-apple-darwin22.0
etc...
In the long term, this could be used to remove the need to handle the
target triple specially when parsing boolean expressions.
Differential Revision: https://reviews.llvm.org/D104572
Based off the worse case numbers generated by D103695, the AVX1/2/512 sitofp/uitofp/fptosi/fptoui costs were higher than necessary (based off instruction counts instead of actual throughput).
The SSE costs still need further fixes, but I hit an issue with the order in which SSE costs are checked - we need to check CUSTOM costs (with non-legal types) first, and then fallback to LEGALIZED types. I'm looking at this now, and this should let us start thinning out a lot of the duplicates in the costs tables.
Then we can finally start work on vXi64 / vXi16 / vXi8 / vXi1 integers, which should let us look at sub-128-bit vectorization (D103925).
The executeregionop is used to allow multiple blocks within SCF constructs. If the container allows multiple blocks, inline the region
Differential Revision: https://reviews.llvm.org/D104960
This patch adds a new clang builtin, __arithmetic_fence. The purpose of the
builtin is to provide the user fine control, at the expression level, over
floating point optimization when -ffast-math (-ffp-model=fast) is enabled.
The builtin prevents the optimizer from rearranging floating point expression
evaluation. The new option fprotect-parens has the same effect on
parenthesized expressions, forcing the optimizer to respect the parentheses.
Reviewed By: aaron.ballman, kpn
Differential Revision: https://reviews.llvm.org/D100118
This patch adds additional remarks, suggesting the use of `noescape` for failed
globalization and indicating when internalization failed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105150
Deduce circumstances where an affine load could not possibly be read by an operation (such as an affine load), and if so, eliminate the load
Differential Revision: https://reviews.llvm.org/D105041
A couple of filecheck patterns had not been hooked up with
the patterns suffering from some drift. As this test is old
and llvm-objdump has improved a lot, take this opportunity to
hide the instruction encoding. I've also taken out a lot of
the explanatory comments that llvm-objdump improvements make
redundant, as these comments oftern don't get updated when addresses
change.
Differential Revision: https://reviews.llvm.org/D104907
Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes
for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the
result of the PHI lowering are inserted into the basic block predecessors - not into the block itself.
As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop
body size/cost estimation.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D105104
There are a couple of problems with the code to patch
unrelocated BLX instructions:
1. The calculation of the PC needs to take into account
the alignment of the instruction. The Thumb BLX
uses alignDown(PC, 4) for the source address.
2. The calculation of the PC bias is hard-coded to 4
which works for Thumb, but when there is a BLX the
branch will be in Arm state so it needs an 8 byte
PC bias.
No asssembler generates an unrelocated BLX instruction
so these problems do not affect real world programs.
However we should still fix them.
Differential Revision: https://reviews.llvm.org/D104905
When clamping the index for a memory access to a stacked vector we must
take into account the entire type being accessed, not just assume that
we are accessing only a single element.
Differential Revision: https://reviews.llvm.org/D105016
Update the OpDSL documentation to reflect recent changes. In particular, the updated documentation discusses:
- Attributes used to parameterize index expressions
- Shape-only tensor support
- Scalar parameters
Differential Revision: https://reviews.llvm.org/D105123
This patch adds unbundling support of an archive file. It takes an
archive file along with a set of offload targets as input.
Output is a device specific archive for each given offload target.
Input archive contains bundled code objects bundled using
clang-offload-bundler. Each generated device specific archive contains
a set of device code object files which are named as
<Parent Bundle Name>-<CodeObject-GPUArch>.
Entries in input archive can be of any binary type which is
supported by clang-offload-bundler, like *.bc. Output archives will
contain files in same type.
Example Usuage:
clang-offload-bundler --unbundle --inputs=lib-generic.a -type=a
-targets=openmp-amdgcn-amdhsa--gfx906,openmp-amdgcn-amdhsa--gfx908
-outputs=devicelib-gfx906.a,deviceLib-gfx908.a
Reviewed By: jdoerfert, yaxunl
Differential Revision: https://reviews.llvm.org/D93525
No need to try to create the default constructor for private copy, it
will be called automatically in the initializer of the declare
reduction. Fixes balance between constructors/destructors calls.
Differential Revision: https://reviews.llvm.org/D105143
Hi,
In function TransformTemplateArgument,
would it be better to add line break at the end of "if" expressions?
I use clang-format to do the job for me.
Thanks a lot
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D104604
This is a followup patch on D103636 where
it seemed checking on amdgpu-calls and
amdgpu-stack-objects is unnecessary. Removing these
checks didn't regress any tests functionally.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D104513
Currently UREM & SREM on constant ranges produces overly pessimistic
results for single element constant ranges.
Delegate to APInt's implementation if both operands are single element
constant ranges. We already do something similar for other binary
operators, like binary AND.
Fixes PR49731.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D105115
Looking at PostDominatorTree::dominates, we can see that has the same
logic (with the addition of handling Phi nodes - which are not used as inputs in
this pass) as the helper function.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105141
My use case for this is illustrated in the test case: I want to define
the same instruction twice with different (disjoint) predicates, because
the instruction has different operands on different subtargets. It's
convenient to do this with a multiclass that also defines an alias for
the instruction.
Previously tablegen would complain if this alias was defined twice with
no predicate. One way to fix this would be to add a predicate on each
definition of the alias, matching the predicate on the instruction. But
this (a) is slightly awkward to do in the real world use case I had, and
(b) leads to an inefficient matcher that will do something like this:
if (Mnemonic == "foo_alias") {
if (Features.test(Feature_Subtarget1Bit))
Mnemonic == "foo";
else if (Features.test(Feature_Subtarget2Bit))
Mnemonic == "foo";
return;
}
It would be more efficient to skip the feature tests and return "foo"
unconditionally.
Overall it seems better to allow multiple definitions of the identical
alias with no predicate.
Differential Revision: https://reviews.llvm.org/D105033
Extend the OpDSL syntax with an optional `domain` function to specify an explicit dimension order. The extension is needed to provide more control over the dimension order instead of deducing it implicitly depending on the formulation of the tensor comprehension. Additionally, the patch also ensures the symbols are ordered according to the operand definitions of the operation.
Differential Revision: https://reviews.llvm.org/D105117
`ARMInstPrinter::printMveAddrModeQOperand()` was added in D62680, but
was never used. It looks like `printT2AddrModeImm8Operand<false>()` is
used instead.
Differential Revision: https://reviews.llvm.org/D105124
This was missing and also there was a bug in the lowering itself, which went unnoticed due to it.
Differential Revision: https://reviews.llvm.org/D105122
Compilation database might have empty string as a command line argument.
But ExpandResponseFilesDatabase::expand doesn't expect this and assumes
that string.front() can be used for any argument. It is undefined behaviour if
string is empty. With debug build mode it causes crash in clangd.
Test Plan: check-clang
Differential Revision: https://reviews.llvm.org/D105120
Fix generateCopyForMemRefRegion for a missing check: in some cases, when
the thing to generate copies for itself is empty, no fast buffer/copy
loops would have been allocated/generated. Add an extra assertion there
while at this.
Differential Revision: https://reviews.llvm.org/D105170
Update AMDGPU gfx90a memory model to make coarse grain memory allocations
consistent when fine grained system scope atomic acquire and release is
performed.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D105137
Now the option is off by default. Since we are not sure if this option
would make the compile time increase aggressively. Although we tested it
on SPEC2017, we may need to test more to make it on by default.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D104365
Now we lack a benchmark to measure the performance change for each
commit.
Since coro elide is the main optimization in coroutine module, I wonder
it may be an estimation to count the number of elided coroutine in
private code bases.
e.g., for a certain commit, if we found that the number of elided goes
down, we could find it before the commit check-in.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D105095