Update scan-build-py to be able to trigger sarif-html output format in clang static analyzer.
NOTE: testcase `test_sarif_and_html_creates_sarif_and_html_reports` will fail if the default clang does not have change https://reviews.llvm.org/D96389 . This can be remediated by pointing the default clang in arguments.py to a locally built clang. I was unable to figure out where these particular tests for scan-build-py are being invoked (aside from manually), so any help there would be greatly appreciated.
Reviewed By: aabbaabb, xazax.hun
Differential Revision: https://reviews.llvm.org/D96570
In every catchpad except `catch (...)`, we add a call to
`_Unwind_CallPersonality`, which is a wapper to call the personality
function. (In most of other Itanium-based architectures the call is done
from libunwind, but in wasm we don't have the control over the VM.)
Because the personatlity function is called to figure out whether the
current exception is a type we should catch, such as `int` or
`SomeClass&`, `catch (...)` does not need the personality function call.
For the same reason, all cleanuppads don't need it.
When we call `_Unwind_CallPersonality`, we store some necessary info in
a data structure called `__wasm_lpad_context` of type
`_Unwind_LandingPadContext`, which is defined in the wasm's port of
libunwind in Emscripten. Also the personality wrapper function returns
some info (selector and the caught pointer) in that data structure, so
it is used as a medium for communication.
One of the info we need to store is the address for LSDA info for the
current function. `wasm.lsda()` intrinsic returns that address. (This
intrinsic will be lowered to a symbol that points to the LSDA address.)
The simpliest thing is call `wasm.lsda()` every time we need to call
`_Unwind_CallPersonality` and store that info in `__wasm_lpad_context`
data structure. But we tried to be better than that (D77423 and some
more previous CLs), so if catchpad A dominates catchpad B and catchpad A
is not `catch (...)`, we didn't insert `wasm.lsda()` call in catchpad B,
thinking that the LSDA address is the same for a single function and we
already visited catchpad A and `__wasm_lpad_context.lsda` field would
already have that value.
But this can be incorrect if there is a call to another function, which
also can have the personality function and LSDA, between catchpad A and
catchpad B, because `__wasm_lpad_context` is a globally defined
structure and the callee function will overwrite its `lsda` field.
So in this CL we don't try to do any optimizaions on adding
`wasm.lsda()` call; we store the result of `wasm.lsda()` every time we
call `_Unwind_CallPersonality`. We can do some complicated analysis,
like checking if there is a function call between the dominating
catchpad and the current catchpad, but at this time it seems overkill.
This deletes three tests because they all tested `wasm.ldsa()` call
optimization.
Fixes https://github.com/emscripten-core/emscripten/issues/13548.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D97309
llvm::parallelTransformReduce does not schedule work on the caller thread, which becomes very costly for
the inliner where a majority of SCCs are small, often ~1 element. The switch to llvm::parallelForEach solves this,
and also aligns the implementation with the PassManager (which realistically should share the same implementation).
This change dropped compile time on an internal benchmark by ~1(25%) second.
Differential Revision: https://reviews.llvm.org/D96086
A majority of operations have a very small number of interfaces, which means that the cost of using a hash map is generally larger for interface lookups than just a binary search. In the future when there are a number of operations with large amounts of interfaces, we can switch to a hybrid approach that optimizes lookups based on the number of interfaces. For now, however, a binary search is the best approach.
This dropped compile time on a largish TF MLIR module by 20%(half a second).
Differential Revision: https://reviews.llvm.org/D96085
https://bugs.llvm.org/show_bug.cgi?id=40858
CheckShadow is now called for each binding in the structured binding to make sure it does not shadow any other variable in scope. This does use a custom implementation of getShadowedDeclaration though because a BindingDecl is not a VarDecl
Added a few unit tests for this. In theory though all the other shadow unit tests should be duplicated for the structured binding variables too but whether it is probably not worth it as they use common code. The MyTuple and std interface code has been copied from live-bindings-test.cpp
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D96147
When targeting a MSVC triple, --dependant-libs with the name of the clang runtime library for profiling is added to the command line args. In it's current implementations clang_rt.profile-<ARCH> is chosen as the name. When building a distribution using LLVM_ENABLE_PER_TARGET_RUNTIME_DIR this fails, due to the runtime file names not having an architecture suffix in the filename.
This patch refactors getCompilerRT and getCompilerRTBasename to always consider per-target runtime directories. getCompilerRTBasename now simply returns the filename component of the path found by getCompilerRT
Differential Revision: https://reviews.llvm.org/D96638
In DWARF v4 compile units go in .debug_info and type units go in
.debug_types. However, in v5 both kinds of units are in .debug_info.
Therefore we can't decide whether to use the CU or TU index just by
looking at which section we're reading from. We have to wait until we
have read the unit type from the header.
Differential Revision: https://reviews.llvm.org/D96194
When computing dense address, a vectorized index must be accounted
for properly. This bug was formerly undetected because we get 0 * prev + i
in most cases, which folds away the scalar part. Now it works for all cases.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97317
Under certain (currently unknown) conditions, llvm-profdata is outputting
profiles that have two consecutive entries in the MemOPSize section for the
value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch
instruction with two cases for 0. As mentioned, we’re not quite sure what’s
causing this to happen, but this patch prevents llvm-profdata from outputting a
profile that has this problem and gives an error with a request for a
reproducible.
Differential Revision: https://reviews.llvm.org/D92074
This is used to lower UDOT/SDOT instructions, as opposed to relying on
the intrinsic. Subsequent optimizations will be able to optimize them
more cleanly based on these nodes.
Following a discussion about the current state of this check on the 12.X branch, it was decided to purge the check as it wasn't in a fit to release state, see https://llvm.org/PR49318.
This check has since had some of those issues addressed and should be good for the next release cycle now, pending any more bug reports about it.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D97275
Pulled out from D90479 - this recognises invalid nsw shl patterns with signbit changes that result in poison.
Differential Revision: https://reviews.llvm.org/D97305
This is to limit compile time. I did experiments with some
inputs and found that compile time keeps reasonable for this
pass if we have less than 100000 virtual registers and then
starts to explode somewhere between 100000 and 150000.
Differential Revision: https://reviews.llvm.org/D97218
Originally, when we added the new driver, we created dedicated test
directories for `flang-new`. This way we separated the tests for the
`throwaway` and the new driver.
As we are increasing test coverage and starting to share tests between
the two drivers, it makes sense to share all directories and instead
rely on:
```
! REQUIRES: new-flang-driver
```
to mark tests as exclusively for the new driver.
Differential Revision: https://reviews.llvm.org/D97207
`ptx71` is not supported in release version of LLVM yet. As a result,
the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned
in D97004. Since the support in D97004 is just a WA for releease, and we'll not
use it in the near future, using `ptx70` for CUDA 11 is feasible.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D97195
Affine parallel ops may contain and yield results from MemRefsNormalizable ops in the loop body. Thus, both affine.parallel and affine.yield should have the MemRefsNormalizable trait.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D96821
This (mostly) reverts 32c501dd88. Hit a
case where this causes a behaviour change, perhaps the same root cause
that triggered the revert of a40db5502b in
7799ef7121.
(The API changes in DirectoryEntry.h have NOT been reverted as a number
of subsequent commits depend on those.)
https://reviews.llvm.org/D90497#2582166
This code creates 3 setccs that need to be expanded. It was
creating a sign bit test as setge X, 0 which is non-canonical.
Canonical would be setgt X, -1. This misses the special case in
IntegerExpandSetCCOperands for sign bit tests that assumes
canonical form. If we don't hit this special case we end up
with a multipart setcc instead of just checking the sign of
the high part.
To fix this I've reversed the polarity of all of the setccs to
setlt X, 0 which is canonical. The rest of the logic should
still work. This seems to produce better code on RISCV which
lacks a setgt instruction.
This probably still isn't the best code sequence we could use here.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D97181
The Linux kernel when built with CONFIG_THUMB2_KERNEL makes use of these
instructions with immediate operands and wide encodings.
These are the T4 variants of the follow sections from the Arm ARM.
F5.1.72 LDR (immediate)
F5.1.229 STR (immediate)
I wasn't able to represent these simple aliases using t2InstAlias due to
the Constraints on the non-suffixed existing instructions, which results
in some manual parsing logic needing to be added.
F1.2 Standard assembler syntax fields
describes the use of the .w (wide) vs .n (narrow) encoding suffix.
Link: https://bugs.llvm.org/show_bug.cgi?id=49118
Link: https://github.com/ClangBuiltLinux/linux/issues/1296
Reported-by: Stefan Agner <stefan@agner.ch>
Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D96632
Add support for the new crash reporter api if the headers are available. Falls back to the old API if they are not available. This change was based on [[ 0164d546d2/llvm/lib/Support/PrettyStackTrace.cpp (L111) | /llvm/lib/Support/PrettyStackTrace.cpp ]]
There is a lit for this behavior here: https://reviews.llvm.org/D96737 but is not included in this diff because it is potentially flaky.
rdar://69767688
Reviewed By: delcypher, yln
Commited by Dan Liew on behalf of Emily Shi.
Differential Revision: https://reviews.llvm.org/D96830
Added a lit test that finds its corresponding crash log and checks to make sure it has asn output under `Application Specific Information`.
This required adding two python commands:
- `get_pid_from_output`: takes the output from the asan instrumentation and parses out the process ID
- `print_crashreport_for_pid`: takes in the pid of the process and the file name of the binary that was run and prints the contents of the corresponding crash log.
This test was added in preparation for changing the integration with crash reporter from the old api to the new api, which is implemented in a subsequent commit.
rdar://69767688
Reviewed By: delcypher
Commited by Dan Liew on behalf of Emily Shi.
Differential Revision: https://reviews.llvm.org/D96737
Add `frame variable` dereference suppport to libc++ `std::shared_ptr`.
This change allows for commands like `v *thing_sp` and `v thing_sp->m_id`. These
commands now work the same way they do with raw pointers. This is done by adding an
unaccounted for child member named `$$dereference$$`.
Also, add API tests for `std::shared_ptr`, previously there were none.
Differential Revision: https://reviews.llvm.org/D97165
Generalize the return value of tryToCreateWidenRecipe to return either a
newly create recipe or an existing VPValue. Use this to avoid creating
unnecessary VPBlendRecipes.
Fixes PR44800.
Prefer to keep uniform (non-divergent) multiplies on the scalar ALU when
possible. This significantly improves some game cases by eliminating
v_readfirstlane instructions when the result feeds into a scalar
operation, like the address calculation for a scalar load or store.
Since isDivergent is only an approximation of whether a value is in
SGPRs, it can potentially regress some situations where a uniform value
ends up in a VGPR. These should be rare in real code, although the test
changes do contain a number of examples.
Most of the test changes are just using s_mul instead of v_mul/mad which
is generally better for both register pressure and latency (at least on
GFX10 where sgpr pressure doesn't affect occupancy and vector ALU
instructions have significantly longer latency than scalar ALU). Some
R600 tests now use MULLO_INT instead of MUL_UINT24.
GlobalISel appears to handle more scenarios in the desirable way,
although it can also be thrown off and fails to select the 24-bit
multiplies in some cases.
Alternative solution considered and rejected was to allow selecting
MUL_[UI]24 to S_MUL_I32. I've rejected this because the definition of
those SD operations works is don't-care on the most significant 8 bits,
and this fact is used in some combines via SimplifyDemandedBits.
Based on a patch by Nicolai Hähnle.
Differential Revision: https://reviews.llvm.org/D97063
The new intrinsic replaces the size in one specified AsyncFunctionPointer with
the size in another. This ability is necessary for functions which merely
forward to async functions such as those defined for partial applications.
Reviewed By: aschwaighofer
Differential Revision: https://reviews.llvm.org/D97229
This matches how libc++ itself is built. This avoids errors due to
mismatch if linking libc++ statically.
Differential Revision: https://reviews.llvm.org/D97169
This check registers an IncludeInserter, however the check itself doesn't actually emit any fixes or includes, so the inserter is redundant.
From what I can tell the fixes were removed in D26453(rL290051) but the inserter was left in, probably an oversight.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D97243
This commit prevents warnings from -Wconversion when a clang vector type
is implicitly converted to a sizeless builtin type -- for example, when
implicitly converting a fixed-predicate to a scalable predicate.
The code below:
1 #include <arm_sve.h>
2
3 #define N __ARM_FEATURE_SVE_BITS
4 #define FIXED_ATTR __attribute__((arm_sve_vector_bits (N)))
5 typedef svbool_t fixed_svbool_t FIXED_ATTR;
6
7 inline fixed_svbool_t foo(fixed_svbool_t p) {
8 return svnot_z(svptrue_b64(), p);
9 }
would previously raise this warning:
warning: implicit conversion turns vector to scalar: \
'fixed_svbool_t' (vector of 8 'unsigned char' values) to 'svbool_t' \
(aka '__SVBool_t') [-Wconversion]
Note that many cases of these implicit conversions were already
permitted because many functions inside arm_sve.h are spawned via
preprocessor macros, and the call to isInSystemMacro would cover us in
this case. This commit fixes the remaining cases.
Differential Revision: https://reviews.llvm.org/D97053
An option is added to the check to select wich set of functions is
defined as asynchronous-safe functions.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D90851
Change some test cases to use divergent addresses for vector loads,
which should be the common case in real world code. Using uniform
addresses causes poor instruction selection for the surrounding
code which has to be fixed up post-register-allocation, and this causes
a lot of testsuite churn for a forthcoming patch to stop selecting
24-bit vector multiply instructions for uniform multiplies.
This shows up some problems in the idot tests where we fail to select
v_dot instructions because the patterns only match MUL_[UI]24 ISD nodes,
but the DAG contains i16 mul nodes instead.
Differential Revision: https://reviews.llvm.org/D97062