The functions `__register_frame`/`__deregister_frame` are not
available on z/OS, so add a guard to not use them.
Reviewed By: lhames, abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D84787
CreateFunctionDeclaration should just take a StringRef. GetDeclarationName is
(only) used by CreateFunctionDeclaration so that's why now also takes a
StringRef.
This fixes existent FIXMEs: we should not error out when unable to
find the number of relocations.
Differential revision: https://reviews.llvm.org/D85891
The RISC-V Privileged Specification 1.11 defines `mcountinhibit`, which
has the same numeric CSR value as `mucounteren` from 1.09.1. This patch
enables the use of the old `mucounteren` name.
Patch by Yuichi Sugiyama.
Reviewed By: lenary, jrtc27, pzheng
Differential Revision: https://reviews.llvm.org/D85067
This fixes the "Unable to insert indirect branch" fatal error sometimes
seen when generating position-independent code.
Patch by msizanoen1
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D84833
Splitted out from D85519.
Currently we report "PT_DYNAMIC segment offset + size exceeds the size of the file",
this changes it to
"PT_DYNAMIC segment offset (0x1234) + file size (0x5678) exceeds the size of the file (0x68ab)"
Differential revision: https://reviews.llvm.org/D85654
The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits.
I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback.
This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read.
Known A: 0_______0_______0_______0_______
Known B: 0_______0_______0_______0_______
AOut: 00000000001000000000000000000000
AB, current: 00000000001111111111111111111111
AB, patch: 00000000001111111000000000000000
Committed on behalf of: @rrika (Erika)
Differential Revision: https://reviews.llvm.org/D72423
Perform lowerShuffleWithVPMOV as part of the v16i8/v8i16 shuffle lowering stages, which are the only types that are currently supported.
We need to expand support for lowering shuffles as truncations to fix the remaining regressions in D66004
This is the error message from the OS, so we shouldn't check against the
OS-specific part of the string.
Fixes the test on Windows which returns a different error message.
In D83876 the consensus seems that LLDB should never deleted orphaned modules
implicitly. However, SBDebugger::DeleteTarget is currently doing exactly that.
This code was added in 753406221b but I don't see
any explanation in the commit, so I think we should delete it.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D83933
The class contains an enum listing all host integer types as well as
some non-host types. This setup is a remnant of a time when this class
was actually implemented in terms of host integer types. Now that we are
using llvm::APInt, they are mostly useless and mean that each function
needs to enumerate all of these cases even though it treats most of them
identically.
I only leave e_sint and e_uint to denote the integer signedness, but I
want to remove that in a follow-up as well.
Removing these cases simplifies most of these functions, with the only
exception being PromoteToMaxType, which can no longer rely on a simple
enum comparison to determine what needs to be promoted.
This also makes the class ready to work with arbitrary integer sizes, so
it does not need to be modified when someone needs to add a larger
integer size.
Differential Revision: https://reviews.llvm.org/D85836
With -flimit-debug-info, we can run into cases when we only have a class
as a declaration, but we do have a definition of a nested class. In this
case, clang will hit an assertion when adding a member to an incomplete
type (but only if it's adding a c++ class, and not C struct).
It turns out we already had code to handle a similar situation arising
in the -gmodules scenario. This extends the code to handle
-flimit-debug-info as well, and reorganizes bits of other code handling
completion of types to move functions doing similar things closer
together.
Differential Revision: https://reviews.llvm.org/D85968
Right now the only places in the SB API where lldb:: ModuleSP instances are
destroyed are in SBDebugger::MemoryPressureDetected (where it's just attempted
but not guaranteed) and in SBDebugger::DeleteTarget (which will be removed in
D83933). Tests that directly create an lldb::ModuleSP and never create a target
therefore currently leak lldb::Module instances. This triggers the sanity checks
in lldbtest that make sure that the global module list is empty after a test.
This patch adds SBModule::GarbageCollectAllocatedModules as an explicit way to
clean orphaned lldb::ModuleSP instances. Also we now start calling this method
at the end of each test run and move the sanity check behind that call to make
this work. This way even tests that don't create targets can pass the sanity
check.
This fixes TestUnicodeSymbols.py when D83865 is applied (which makes that the
sanity checks actually fail the test).
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D83876
We didn't do anything with the llvm::Error we get from `Open`, so when we end up in the
error case we just crash due to the llvm::Error sanity check. Also add the missing newline
behind the error message so it no longer messes with the next (lldb) prompt.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D85970
`lldb-server platform --socket-file /any/path` currently always fails to create
the socket file. This stopped working after D67424 which changed the
input variables of `writeFileAtomically` slightly. We're expected to
pass in a temporary path template (`/tmp/foo-%%%%%`) and the final
path we want to write. Instead we currently pass in the never set
`temp_file_path` as the temporary path (which will make this function always
fail) and pass in the temp_file_spec's path as the final path (which is actually
the template path such as `/tmp/foo-%%%%%`) instead of the actual path
we want to write (e.g. `/tmp/foo`).
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/D85890
Support f128 using VE instructions. Update regression tests.
I've noticed there is no load or store i128 test, so I add them too.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D86035
This parameter isn't used anywhere in LLDB nor the Swift downstream branch. It
also doesn't really fit into the TypeSystem APIs that usually don't return
additional related functionality via some output parameters. Also the
implementations already states that the calculated value there is wrong.
Let's remove it. If we need this functionality at some point then Swift's much
nicer `GetByteStride` function seems like the way to go.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D84299
Right now if the test suite encounters a cleanup error it just prints "CLEANUP
ERROR:" but not any additional information.
This patch just prints the exception that caused the cleanup error. This should
make debugging the failing tests for D83865 easier (and seems in general nice to
have).
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/D83874
With gcc 6.3.0, I hit the following compilation bug.
../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic]
};
^
cc1plus: all warnings being treated as errors
The error is introduced by Commit ae7f08812e ("[InstCombine]
Aggregate reconstruction simplification (PR47060)")
With gcc 6.3.0, I hit the following compilation bug:
/home/yhs/work/llvm-project/clang/lib/Frontend/CompilerInvocation.cpp:
In function ‘bool ParseCodeGenArgs(clang::CodeGenOptions&, llvm::opt::ArgList&,
clang::InputKind, clang::DiagnosticsEngine&, const clang::TargetOptions&,
const clang::FrontendOptions&)’:
/home/yhs/work/llvm-project/clang/lib/Frontend/CompilerInvocation.cpp:780:12:
error: unused variable ‘A’ [-Werror=unused-variable]
if (Arg *A = Args.getLastArg(OPT_fuse_ctor_homing))
^
cc1plus: all warnings being treated as errors
The bug is introduced by Commit ae6523cd62 ("[DebugInfo] Add
-fuse-ctor-homing cc1 flag so we can turn on constructor homing only
if limited debug info is already on.")
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D84630
There are three fields that the FPOptions default constructor sets to
non-zero values; those fields previously could have been zero or
non-zero depending on whether they'd been explicitly removed from the
FPOptionsOverride set. However, that doesn't seem to ever actually
happen, so this is NFC, except that it makes the AST file representation
of FPOptionsOverride make more sense.
We don't appear to use these FPOptions for anything right now, but
they shouldn't be uninitialized because that makes our AST file output
nondeterministic.
This pattern happens in clang C++ exception lowering code, on unwind branch.
We end up having a `landingpad` block after each `invoke`, where RAII
cleanup is performed, and the elements of an aggregate `{i8*, i32}`
holding exception info are `extractvalue`'d, and we then branch to common block
that takes extracted `i8*` and `i32` elements (via `phi` nodes),
form a new aggregate, and finally `resume`'s the exception.
The problem is that, if the cleanup block is effectively empty,
it shouldn't be there, there shouldn't be that `landingpad` and `resume`,
said `invoke` should be a `call`.
Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`.
But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft,
while it is pointless, does not look like "empty cleanup block".
So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has
higher cost than it could have on unwind branch :S
This doesn't happen *that* often, but it will basically happen once per C++
function with complex CFG that called more than one other function
that isn't known to be `nounwind`.
I think, this is a missing fold in InstCombine, so i've implemented it.
I think, the algorithm/implementation is rather self-explanatory:
1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate.
2. For each element, try to find from which aggregate it was extracted.
If it was extracted from the aggregate with identical type,
from identical element index, great.
3. If all elements were found to have been extracted from the same aggregate,
then we can just use said original source aggregate directly,
instead of re-creating it.
4. If we fail to find said aggregate when looking only in the current block,
we need be PHI-aware - we might have different source aggregate when coming
from each predecessor.
I'm not sure if this already handles everything, and there are some FIXME's,
i'll deal with all that later in followups.
I'd be fine with going with post-commit review here code-wise,
but just in case there are thoughts, i'm posting this.
On RawSpeed, for example, this has the following effect:
```
| statistic name | baseline | proposed | Δ | % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified | 0 | 1253 | 1253 | 0.00% | 0.00% |
| simplifycfg.NumInvokes | 948 | 1355 | 407 | 42.93% | 42.93% |
| instcount.NumInsertValueInst | 4382 | 3210 | -1172 | -26.75% | 26.75% |
| simplifycfg.NumSinkCommonCode | 574 | 458 | -116 | -20.21% | 20.21% |
| simplifycfg.NumSinkCommonInstrs | 1154 | 921 | -233 | -20.19% | 20.19% |
| instcount.NumExtractValueInst | 29017 | 26397 | -2620 | -9.03% | 9.03% |
| instcombine.NumDeadInst | 166618 | 174705 | 8087 | 4.85% | 4.85% |
| instcount.NumPHIInst | 51526 | 50678 | -848 | -1.65% | 1.65% |
| instcount.NumLandingPadInst | 20865 | 20609 | -256 | -1.23% | 1.23% |
| instcount.NumInvokeInst | 34023 | 33675 | -348 | -1.02% | 1.02% |
| simplifycfg.NumSimpl | 113634 | 114708 | 1074 | 0.95% | 0.95% |
| instcombine.NumSunkInst | 15030 | 14930 | -100 | -0.67% | 0.67% |
| instcount.TotalBlocks | 219544 | 219024 | -520 | -0.24% | 0.24% |
| instcombine.NumCombined | 644562 | 645805 | 1243 | 0.19% | 0.19% |
| instcount.TotalInsts | 2139506 | 2135377 | -4129 | -0.19% | 0.19% |
| instcount.NumBrInst | 156988 | 156821 | -167 | -0.11% | 0.11% |
| instcount.NumCallInst | 1206144 | 1207076 | 932 | 0.08% | 0.08% |
| instcount.NumResumeInst | 5193 | 5190 | -3 | -0.06% | 0.06% |
| asm-printer.EmittedInsts | 948580 | 948299 | -281 | -0.03% | 0.03% |
| instcount.TotalFuncs | 11509 | 11507 | -2 | -0.02% | 0.02% |
| inline.NumDeleted | 97595 | 97597 | 2 | 0.00% | 0.00% |
| inline.NumInlined | 210514 | 210522 | 8 | 0.00% | 0.00% |
```
So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half,
and there is a very apparent decrease in instruction and basic block count.
On vanilla llvm-test-suite:
```
| statistic name | baseline | proposed | Δ | % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified | 0 | 744 | 744 | 0.00% | 0.00% |
| instcount.NumInsertValueInst | 2705 | 2053 | -652 | -24.10% | 24.10% |
| simplifycfg.NumInvokes | 1212 | 1424 | 212 | 17.49% | 17.49% |
| instcount.NumExtractValueInst | 21681 | 20139 | -1542 | -7.11% | 7.11% |
| simplifycfg.NumSinkCommonInstrs | 14575 | 14361 | -214 | -1.47% | 1.47% |
| simplifycfg.NumSinkCommonCode | 6815 | 6743 | -72 | -1.06% | 1.06% |
| instcount.NumLandingPadInst | 14851 | 14712 | -139 | -0.94% | 0.94% |
| instcount.NumInvokeInst | 27510 | 27332 | -178 | -0.65% | 0.65% |
| instcombine.NumDeadInst | 1438173 | 1443371 | 5198 | 0.36% | 0.36% |
| instcount.NumResumeInst | 2880 | 2872 | -8 | -0.28% | 0.28% |
| instcombine.NumSunkInst | 55187 | 55076 | -111 | -0.20% | 0.20% |
| instcount.NumPHIInst | 321366 | 320916 | -450 | -0.14% | 0.14% |
| instcount.TotalBlocks | 886816 | 886493 | -323 | -0.04% | 0.04% |
| instcount.TotalInsts | 7663845 | 7661108 | -2737 | -0.04% | 0.04% |
| simplifycfg.NumSimpl | 886791 | 887171 | 380 | 0.04% | 0.04% |
| instcount.NumCallInst | 553552 | 553733 | 181 | 0.03% | 0.03% |
| instcombine.NumCombined | 3200512 | 3201202 | 690 | 0.02% | 0.02% |
| instcount.NumBrInst | 741794 | 741656 | -138 | -0.02% | 0.02% |
| simplifycfg.NumHoistCommonInstrs | 14443 | 14445 | 2 | 0.01% | 0.01% |
| asm-printer.EmittedInsts | 7978085 | 7977916 | -169 | 0.00% | 0.00% |
| inline.NumDeleted | 73188 | 73189 | 1 | 0.00% | 0.00% |
| inline.NumInlined | 291959 | 291968 | 9 | 0.00% | 0.00% |
```
Roughly similar effect, less instructions and blocks total.
See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e.
Compile-time wise, this appears to be roughly geomean-neutral:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions
And this is a win size-wize in general:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text
See https://bugs.llvm.org/show_bug.cgi?id=47060
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D85787
Instead of calling `cuFuncGetAttribute` with
`CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation,
we can do it for the first one and cache the result as part of the
`KernelInfo` struct. The only functional change is that we now expect
`cuFuncGetAttribute` to succeed and otherwise propagate the error.
Ignoring any error seems like a slippery slope...
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D86038
When we implement OpenMP GPU reductions we use type punning a lot during
the shuffle and reduce operations. This is not always compatible with
language rules on aliasing. So far we generated TBAA which later allowed
to remove some of the reduce code as accesses and initialization were
"known to not alias". With this patch we avoid TBAA in this step,
hopefully for all accesses that we need to.
Verified on the reproducer of PR46156 and QMCPack.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D86037