The following line asserts when `sh_addralign > MAX_UINT32 && (uint32_t)sh_addralign == 0`:
```
ExpectedOffset = alignTo(ExpectedOffset,
SecHdr.sh_addralign ? SecHdr.sh_addralign : 1);
```
it happens because `sh_addralign` is truncated to 32-bit value, but `alignTo`
doesn't accept `Align == 0`. We should change `1` to `1uLL`.
Differential revision: https://reviews.llvm.org/D92163
If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion.
Allows us to move the x86-specific lowering to the generic expansion code.
Differential Revision: https://reviews.llvm.org/D92183
The test case isn't using the AST matchers for all checks as there doesn't seem to be support for
matching NonTypeTemplateParmDecl default arguments. Otherwise this is simply importing the
default arguments.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D92106
This patch adds tests for things that happened to be fixed by previous
patches, but that should continue working if we do decide to treat
sizeless types as incomplete types.
Differential Revision: https://reviews.llvm.org/D79584
Folding a select of vector constants that include undef elements only
applies to fixed vectors, but there's no earlier check the type is not
scalable so it crashes for scalable vectors. This adds a check so this
optimization is only attempted for fixed vectors.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D92046
This was orginally committed in 2245fb8aaa.
but was immediately reverted in f3abd54958
because of a PHI handling issue.
Original commit message:
1. It doesn't make sense to enforce that the bonus instruction
is only used once in it's basic block. What matters is
whether those user instructions fit within our budget, sure,
but that is another question.
2. It doesn't make sense to enforce that said bonus instructions
are only used within their basic block. Perhaps the branch
condition isn't using the value computed by said bonus instruction,
and said bonus instruction is simply being calculated
to be used in successors?
So iff we can clone bonus instructions, to lift these restrictions,
we just need to carefully update their external uses
to use the new cloned instructions.
Notably, this transform (even without this change) appears to be
poison-unsafe as per alive2, but is otherwise (including the patch) legal.
We don't introduce any new PHI nodes, but only "move" the instructions
around, i'm not really seeing much potential for extra cost modelling
for the transform, especially since now we allow at most one such
bonus instruction by default.
This causes the fold to fire +11.4% more (13216 -> 14725)
as of vanilla llvm test-suite + RawSpeed.
The motivational pattern is IEEE-754-2008 Binary16->Binary32
extension code:
ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)
^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v
That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM
apparently that fold is happening somewhere else afterall,
so something else also has a similar 'artificial' restriction.
The ops are very similar to the std variants, but support async GPU execution.
gpu.alloc does not currently support an alignment attribute, and the new ops do not have
canonicalizers/folders like their std siblings do.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D91698
This is related to MIPS. Currently we might report an error and exit,
though there is no problem to report a warning and try to continue dumping
an object. The code uses `MipsGOTParser<ELFT> Parser`, which is isolated
in this method.
Differential revision: https://reviews.llvm.org/D92090
These patterns are using zexti32 which matches either assertzexti32
or (and X, 0xffffffff). But if we match (and X, 0xffffffff) it will
remove the AND and the inputs may no longer have the zero bits
needed to guarantee the result has enough zeros.
This commit changes the patterns to only match assertzexti32.
I'm not sure how to test the broken case since the DIVUW/REMUW nodes
are created during type legalization, but type legalization won't
create an (and X, 0xfffffffff) directly on the inputs.
I've also changed the zexti32 on the root of the pattern to just
checking for AND. We were previously also matching assertzexti32,
but I doubt that pattern would ever occur.
When widening an IndVar that has LCSSA Phi users outside
the loop, we can safely widen it as usual and then truncate
the result outside the loop without hurting the performance.
Differential Revision: https://reviews.llvm.org/D91593
Reviewed By: skatkov
For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will
have the impact the precision. As the fsqrt added belong to the cold path of the
cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve
the precision if the input is denormal.
Reviewed By: Spatel
Differential Revision: https://reviews.llvm.org/D80974
We need a real target now, and it was only being created if grpc was
built from source or imported from homebrew.
Differential Revision: https://reviews.llvm.org/D92107
Add a flag that disables caching when computing aliasing results
potentially based on a phi-phi NoAlias assumption. We'll still
insert cache entries temporarily to catch infinite recursion,
but will drop them afterwards, so they won't persist in BatchAA.
Differential Revision: https://reviews.llvm.org/D91936
Many bots are unhappy, at the very least missed a few codegen tests,
and possibly this has a logic hole inducing a miscompile
(will be really awesome to have ready reproducer..)
Need to investigate.
This reverts commit 2245fb8aaa.
1. It doesn't make sense to enforce that the bonus instruction
is only used once in it's basic block. What matters is
whether those user instructions fit within our budget, sure,
but that is another question.
2. It doesn't make sense to enforce that said bonus instructions
are only used within their basic block. Perhaps the branch
condition isn't using the value computed by said bonus instruction,
and said bonus instruction is simply being calculated
to be used in successors?
So iff we can clone bonus instructions, to lift these restrictions,
we just need to carefully update their external uses
to use the new cloned instructions.
Notably, this transform (even without this change) appears to be
poison-unsafe as per alive2, but is otherwise (including the patch) legal.
We don't introduce any new PHI nodes, but only "move" the instructions
around, i'm not really seeing much potential for extra cost modelling
for the transform, especially since now we allow at most one such
bonus instruction by default.
This causes the fold to fire +11.4% more (13216 -> 14725)
as of vanilla llvm test-suite + RawSpeed.
The motivational pattern is IEEE-754-2008 Binary16->Binary32
extension code:
ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)
^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v
That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM
apparently that fold is happening somewhere else afterall,
so something else also has a similar 'artificial' restriction.
We don't actually update the ABI lists at every release -- it's too much
work, since we'd technically have to do it even for minor releases.
Furthermore, I don't think anybody uses those (I certainly don't rely
on them for anything).
Instead, it is better to rely on the ABI list changelog and the canonical
ABI list that we always keep up to date. If one wants to know what symbols
were shipped in a specific release, that can be discovered easily using
Git, which is a superior tool than keeping textual copies of old versions.
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).
This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.
The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.
Differential Revision: https://reviews.llvm.org/D91649
Similar to D92113. Currently `clang -fstack-size-section -fno-unique-section-names`
sets the linked-to symbol to the first `.text`, which is:
* incorrect for COMDAT sections
* inferior for non-COMDAT sections in -ffunction-sections mode (poor --gc-sections: .stack_sizes cannot be separately discarded)
Note, if the section symbol can be referenced in more places (if the
function begin symbol does not apply), we probably should consider
defining a different BeginSymbol for sections with ",unique" linkage.
Reviewed By: grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D92151
This patch enables passing non variadic vector type parameters on the caller and callee side and vector return on AIX that are passed in vector registers only.
So far, support is enabled for only the AIX extended Altivec ABI Calling convention.
Reviewed By: sfertile, DiggerLin
Differential Revision: https://reviews.llvm.org/D86476
The test case isn't using the AST matchers for all checks as there doesn't seem to be support for
matching TemplateTypeParmDecl default arguments. Otherwise this is simply importing the
default arguments.
Also updates several LLDB tests that now as intended omit the default template
arguments of several std templates.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D92103
Using sysctl requires including headers that are considered internal on
Linux, like <sys/sysctl.h> & friends. Instead, sysconf is defined by POSIX
(and we have a fallback for Windows), so all the systems we support should
be happy with just sysconf.
Differential Revision: https://reviews.llvm.org/D92135
I'm not 100% sure what the issue actually is since I can't reproduce it
locally, however what I explain in the comment is my best attempt to
explain what's going on.
Differential Revision: https://reviews.llvm.org/D92131
The rewrite logic has an optimization to drop a cast operation after
rewriting block arguments if the cast operation has no users. This is
unsafe as there might be a pending rewrite that replaced the cast operation
itself and hence would trigger a second free.
Instead, do not remove the casts and leave it up to a later canonicalization
to do so.
Differential Revision: https://reviews.llvm.org/D92184