This is a fairly common pattern:
```
%mask = G_CONSTANT iN <mask val>
%add = G_ADD %lhs, %rhs
%and = G_AND %add, %mask
```
We have combines to eliminate G_AND with a mask that does nothing.
If we combined the above to this:
```
%mask = G_CONSTANT iN <mask val>
%narrow_lhs = G_TRUNC %lhs
%narrow_rhs = G_TRUNC %rhs
%narrow_add = G_ADD %narrow_lhs, %narrow_rhs
%ext = G_ZEXT %narrow_add
%and = G_AND %ext, %mask
```
We'd be able to take advantage of those combines using the trunc + zext.
For this to work (or be beneficial in the best case)
- The operation we want to narrow then widen must only be used by the G_AND
- The G_TRUNC + G_ZEXT must be free
- Performing the operation at a narrower width must not produce a different
value than performing it at the original width *after masking.*
Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj
At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign.
Differential Revision: https://reviews.llvm.org/D107929
Unfortunately Mesa is still using amdgcn-- as the triple for OpenGL,
so we still have the awkward unknown OS case to deal with. Previously
if the HSA ABI intrinsics appeared, we we would not add the ABI
registers to the function. We would emit an error later, but we still
need to produce some compile result. Start adding the registers to any
compute function, regardless of the OS. This keeps the internal state
more consistent, and will help avoid numerous test crashes in a future
patch which starts assuming the ABI inputs are present on functions by
default.
Previously we would allow promotion even if the byval/inalloca
attributes on the call and the callee didn't match.
It's ok if the byval/inalloca types aren't the same. For example, LTO
importing may rename types.
Fixes PR51397.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D107998
AttributeList::hasAttribute() is confusing. In an attempt to change the
name to something that suggests using other methods, fix up some
existing uses.
When cross compiling lldb-server, do not create a host build
for building lldb-tblgeb when LLDB_TABLEGEN_EXE is already
provided. This avoids an expensive and time-consuming build step
if lldb-tblgen was already built previously for host.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D108053
The approach for handling reductions in the outer most
dimension follows that for inner most dimensions, outlined
below
First, transpose to move reduction dims, if needed
Convert reduction from n-d to 2-d canonical form
Then, for outer reductions, we emit the appropriate op
(add/mul/min/max/or/and/xor) and combine the results.
Differential Revision: https://reviews.llvm.org/D107675
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().
Add hasRetAttr() since it was missing from AttributeList.
Avoid doing the detection work inside the constructor. In addition to
polymorphism being unintuitive in constructors and other design problems
such as if an exception is thrown, the ScopDetection class is usable
without detection in the sense of "no Scop found" or "function skipped".
It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual
FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this
intrinsic needs to remain as a call as it cannot be lowered to any instruction, which
also means we need to disable CTR loop generation for fma involving the ppcf128 type.
This patch accomplishes this behaviour.
Differential Revision: https://reviews.llvm.org/D107914
This covers the SSE and AVX/AVX2 headers. AVX512 has a lot more macros
due to rounding mode.
Fixes part of PR51324.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D107843
These are lowered, matching SDAG behaviour. (See
llvm/test/CodeGen/AArch64/ssub_sat.ll and llvm/test/CodeGen/AArch64/sadd_sat.ll)
These fall back ~159 times on a build of clang with GISel enabled.
Differential Revision: https://reviews.llvm.org/D107777
Summary:
The assertion that both functions were not missing was incorrect and would
fail when one of the functions was missing. Fixed it and moved the
assertion earlier to check the input parameters to better capture
first-failure. Added lit test.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D107989
Since then, the SCEV pointer handling as been improved,
so the assertion should now hold.
This reverts commit b96114c1e1,
relanding the assertion from commit 141e845da5.
This patch implements the following check for TEAMS construct:
```
OpenMP Version 5.0 Teams construct restriction: A teams region can
only be strictly nested within the implicit parallel region or a target
region. If a teams construct is nested within a target construct, that
target construct must contain no statements, declarations or directives
outside of the teams construct.
```
Also add one test case for the check.
Reviewed By: kiranchandramohan, clementval
Differential Revision: https://reviews.llvm.org/D106335
We use kShadowCnt (number of shadow cells per application granule)
when computing shadow, but it's wrong. We need the ratio
between shadow and app memory (how much shadow is larger than app memory),
which is kShadowMultiplier.
Currently both are equal to 4, so it works fine.
Use the correct constant.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D108033
This patch implements the following semantic checks for cancellation constructs:
```
OpenMP Version 5.0 Section 2.18.1: CANCEL construct restriction:
If construct-type-clause is taskgroup, the cancel construct must be
closely nested inside a task or a taskloop construct and the cancel
region must be closely nested inside a taskgroup region. If
construct-type-clause is sections, the cancel construct must be closely
nested inside a sections or section construct. Otherwise, the cancel
construct must be closely nested inside an OpenMP construct that matches
the type specified in construct-type-clause of the cancel construct.
OpenMP Version 5.0 Section 2.18.2: CANCELLATION POINT restriction:
A cancellation point construct for which construct-type-clause is
taskgroup must be closely nested inside a task or taskloop construct,
and the cancellation point region must be closely nested inside a
taskgroup region. A cancellation point construct for which
construct-type-clause is sections must be closely nested inside a
sections or section construct. A cancellation point construct for which
construct-type-clause is neither sections nor taskgroup must be closely
nested inside an OpenMP construct that matches the type specified in
construct-type-clause.
```
Also add test cases for the check.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D106538
Some Clang diagnostics could only report OpenCL C version. Because
C++ for OpenCL can be used as an alternative to OpenCL C, the text
for diagnostics should reflect that.
Desrciptions modified for these diagnostics:
`err_opencl_unknown_type_specifier`
`warn_option_invalid_ocl_version`
`err_attribute_requires_opencl_version`
`warn_opencl_attr_deprecated_ignored`
`ext_opencl_ext_vector_type_rgba_selector`
Differential Revision: https://reviews.llvm.org/D107648
It might changed the condition of a branch into a constant,
so we should restart and constant-fold terminator,
instead of continuing with the tautological "conditional" branch.
This fixes the issue reported at https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff