This looks like not a practical pattern in our codebase (it could fail
in some sandbox environement).
Instead we print it via standard output, and it is controled by the
-attributor-print-call-graph, this follows a similiar pattern of attributor-print-dep.
Allow mangled names to include an arbitrary dot suffix, akin to vendor
specific suffix in Itanium mangling.
Primary motivation is a support for symbols renamed during ThinLTO
import / promotion (ThinLTO is the default configuration for optimized
builds in rustc).
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D104358
InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case.
Differential Revision: https://reviews.llvm.org/D104193
Commit 4c7f820b2b changed the llvm.powi intrinsic to support
different 'int' sizes for the exponent. That happened to break
the IntrinsicToLibdeviceFunc mapping in PPCGCodeGeneration, which
obviously should have been updated as part of commit 4c7f820b2b
(https://reviews.llvm.org/D99439).
The shortcoming was found by buildbots that use
-DPOLLY_ENABLE_GPGPU_CODEGEN=ON
This patch should fixup the problem.
This reverts commit 76d0747e08.
If a group has `__llvm_prf_vals` due to static value profiler counters
(`NS!=0`), we cannot make `__llvm_prf_data` private, because a prevailing text
section may reference `__llvm_prf_data` and will cause a `relocation refers to a
discarded section` linker error.
Note: while a `__profc_` group is non-prevailing, it may be referenced by a
prevailing text section due to inlining.
```
group section [ 66] `.group' [__profc__ZN5clang20EmitClangDeclContextERN4llvm12RecordKeeperERNS0_11raw_ostreamE] contains 4 sections:
[Index] Name
[ 67] __llvm_prf_cnts
[ 68] __llvm_prf_vals
[ 69] __llvm_prf_data
[ 70] .rela__llvm_prf_data
```
This should fix PR50683. The wrong assumption was that we
could always know what the callee is when we replace a call site
argument with undef. We wanted to know that to remove the `noundef`
that might be attached to the argument. Since no callee means we
did the propagation on the caller site, there is no need to remove
an attribute. It is only needed if we replace all uses and therefore
pass `undef` instead of the value that was passed in otherwise.
Users might want to run initialize for a set of AAs without an
intermediate update step. Running update eagerly is not a requirement
anyway so we make it optional.
To allow outside AAs that simplify values we need to ensure all value
simplification goes through the Attributor, not AAValueSimplify (or any
of the other AAs we have already like AAPotentialValues). This patch
also introduces an interface for the outside AAs to register
simplification callbacks for an IRPosition. To make this work as
expected we have to pass IRPositions instead of Values in
AAValueSimplify, which makes sense by itself.
If we simplify values we sometimes end up with type mismatches. If the
value is a constant we can often cast it though to still allow
propagation. The logic is now put into a helper and it replaces some
ad hoc things we did before.
This also introduces the AA namespace for abstract attribute related
functions and types.
Differential Revision: https://reviews.llvm.org/D103856
If the target stack is not accessible between different running
"threads" we have to make sure not to create allocas for mallocs
that might be used by multiple "threads". The "use check" is
sufficient to prevent this but if we apply the "free check" we have
to make sure the pointer is not communicated to others before
the free is reached.
Differential Revision: https://reviews.llvm.org/D98608
The initial use for AAExecutionDomain was to determine if a single
thread executes a block. While this is sometimes informative most
of the time, and for other reasons, we actually want to know if it
is the "initial thread". Thus, the thread that started execution on
the current device. The deduction needs to be adjusted in a follow
up as the methods we use right not are looking for the OpenMP thread
id which is resets whenever a thread enters a parallel region. What
we basically want is to look for `llvm.nvvm.read.ptx.sreg.ntid.x` and
equivalent functions.
We invalidated AAReachabilityImpl directly which is not helpful and
confusing as we still used it regardless. We now avoid invalidating it
(not needed anyway) and add checks for the state. This has by itself no
actual effect but prepares for later extensions.
The current naming scheme adds the `dfs$` prefix to all
DFSan-instrumented functions. This breaks mangling and prevents stack
trace printers and other tools from automatically demangling function
names.
This new naming scheme is mangling-compatible, with the `.dfsan`
suffix being a vendor-specific suffix:
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-structure
With this fix, demangling utils would work out-of-the-box.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D104494
The patch https://reviews.llvm.org/D101469 is intended to enable loop unrolling,
not interleaved access vectorization. The method bool enableInterleavedAccessVectorization()
should not be implemented.
The instruction can be 16-bit aligned while targeting 32-bit aligned
code. To calculate the target address correctly, the address of the
instruction has to be adjusted.
Differential Revision: https://reviews.llvm.org/D104446
Support -Wno-frame-larger-than (with no =) and make it properly
interoperate with -Wframe-larger-than. Reject -Wframe-larger-than with
no argument.
We continue to support Clang's old spelling, -Wframe-larger-than=, for
compatibility with existing users of that facility.
In passing, stop the driver from accepting and ignoring
-fwarn-stack-size and make it a cc1-only flag as intended.
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=48857.
Previous attempts can be found in D104007 and D101980.
A lot of discussions can be found in those two patches.
To summarize the bug:
When Clang emits IR for coroutines, the first thing it does is to make a copy of every argument to the local stack, so that uses of the arguments in the function will all refer to the local copies instead of the arguments directly.
However, in some cases we find that arguments are still directly used:
When Clang emits IR for a function that has pass-by-value arguments, sometimes it emits an argument with byval attribute. A byval attribute is considered to be local to the function (just like alloca) and hence it can be easily determined that it does not alias other values. If in the IR there exists a memcpy from a byval argument to a local alloca, and then from that local alloca to another alloca, MemCpyOpt will optimize out the first memcpy because byval argument's content will not change. This causes issues because after a coroutine suspension, the byval argument may die outside of the function, and latter uses will lead to memory use-after-free.
This is only a problem for arguments with either byval attribute or noalias attribute, because only these two kinds are considered local. Arguments without these two attributes will be considered to alias coro_suspend and hence we won't have this problem. So we need to be able to deal with these two attributes in coroutines properly.
For noalias arguments, since coro_suspend may potentially change the value of any argument outside of the function, we simply shouldn't mark any argument in a coroutiune as noalias. This can be taken care of in CoroEarly pass.
For byval arguments, if such an argument needs to live across suspensions, we will have to copy their value content to the frame, not just the pointer.
Differential Revision: https://reviews.llvm.org/D104184
When creating a PCH file the use of a temp file will be dictated by the
presence or absence of the -fno-temp-file flag. Creating a module file
will always use a temp file via the new ForceUseTemporary flag.
This fixes bug 50033.
This adds a basic SB API for creating and stopping traces.
Note: This doesn't add any APIs for inspecting individual instructions. That'd be a more complicated change and it might be better to enhande the dump functionality to output the data in binary format. I'll leave that for a later diff.
This also enhances the existing tests so that they test the same flow using both the command interface and the SB API.
I also did some cleanup of legacy code.
Differential Revision: https://reviews.llvm.org/D103500
This attribute computes the optimistic live call edges using the attributor
liveness information. This attribute will be used for deriving a
inter-procedural function reachability attribute.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D104059
For ELF, since all counters/data are in a section group (either `comdat any` or
`comdat noduplicates`), and the signature for `comdat any` is `__profc_`, the
D1003372 optimization prerequisite (linker GC cannot discard data variables
while the text section is retained) is always satisified, we can make __profd_
unconditionally private.
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103717
This pass emits a floating point compare and a conditional branch,
but if strictfp is enabled we don't emit a constrained compare
intrinsic.
The backend also won't expand the readonly sqrt call this pass inserts
to a sqrt instruction under strictfp. So we end up with 2 libcalls as
seen here. https://godbolt.org/z/oax5zMEWd
Fix these things by disabling the pass.
Differential Revision: https://reviews.llvm.org/D104479
These other platforms are unsupported and untested.
They could be re-added later based on MSan code.
Reviewed By: gbalats, stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D104481
The old version of this code would blindly perform arithmetic without
paying attention to whether the types involved were pointers or
integers. This could lead to weird expressions like negating a pointer.
Explicitly handle simple cases involving pointers, like "x < y ? x : y".
In all other cases, coerce the operands of the comparison to integer
types. This avoids the weird cases, while handling most of the
interesting cases.
Differential Revision: https://reviews.llvm.org/D103660
The target specific expression handling was slightly regressed by
bbea64250f. This restores the proper
sub-expression evaluation to allow for constant folding within the
expression. We explicitly discard the layout and assembler when
evaluating the expression to avoid any symbolic computation and instead
using the `evaluateAsRelocatable` to canonicalise and constant fold
only.
We can also simplify the expression handling - none of the target
variants support symbolic difference. This simplifies the logic for
that and adds additional tests to ensure that we do not accidentally
regress here in the future.
Reviewed By: maskray
Differential Revision: https://reviews.llvm.org/D104473
When we removed the allocator<void> specialization, the triviality of
std::allocator<void> changed because the primary template had a
non-trivial default constructor and the specialization didn't
(so std::allocator<void> went from trivial to non-trivial).
This commit fixes that oversight by giving a trivial constructor to
the primary template when instantiated on cv-void.
This was reported in https://llvm.org/PR50299.
Differential Revision: https://reviews.llvm.org/D104398
This fixes a GISEL vs SDAG regression that showed up at -Os in 256.bzip2
In `_getAndMoveToFrontDecode`:
gisel:
```
and w9, w0, #0xff
orr w9, w9, w8, lsl #8
```
sdag:
```
bfi w0, w8, #8, #24
```
Differential revision: https://reviews.llvm.org/D103291
When the number of shared libs is massive, there could be hundreds of
thousands of short lived progress events sent to the IDE, which makes it
irresponsive while it's processing all this data. As these small jobs
take less than a second to process, the user doesn't even see them,
because the IDE only display the progress of long operations. So it's
better not to send these events.
I'm fixing that by sending only the events that are taking longer than 5
seconds to process.
In a specific run, I got the number of events from ~500k to 100, because
there was only 1 big lib to parse.
I've tried this on several small and massive targets, and it seems to
work fine.
Differential Revision: https://reviews.llvm.org/D101128
Fold all exits based on known trip count/multiple information from
SCEV. Previously only the latch exit or the single exit were folded.
This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple
entirely: They're still used to a) decide whether runtime unrolling
should be performed and b) for ORE remarks. However, the core
unrolling logic is independent of them now.
Differential Revision: https://reviews.llvm.org/D104203
This patch will allow developers to remove unwanted instruction Defs (most likely from within a target specific InstrPostProcess) by setting that Def's RegisterID to 0.
Differential Revision: https://reviews.llvm.org/D104433
This change revisits https://reviews.llvm.org/D79248 which originally
added support for the --unresolved-symbols flag.
At the time I thought it would make sense to add a third option to this
flag called `import-functions` but it turns out (as was suspects by on
the reviewers IIRC) that this option can be authoganal.
Instead I've added a new option called `--import-undefined` that only
operates on symbols that can be imported (for example, function symbols
can always be imported as opposed to data symbols we can only be
imported when compiling with PIC).
This option gives us the full expresivitiy that emscripten needs to be
able allow reporting of undefined data symbols as well as the option to
disable that.
This change does remove the `--unresolved-symbols=import-functions`
option, which is been in the codebase now for about a year but I would
be extremely surprised if anyone was using it.
Differential Revision: https://reviews.llvm.org/D103290