Commit Graph

151504 Commits

Author SHA1 Message Date
Esme-Yi a00ff71668 [XCOFF] Improve error message context.
Summary: This patch improves the error message context of the
XCOFF interfaces by providing more details.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D110320
2021-10-11 02:52:20 +00:00
Lang Hames 771e69484a [ORC] Add dependence on pthreads library to ORC.
f341161689 introduced a dependence (for builds with LLVM_ENABLE_THREADS) on
pthreads. This commit updates the CMakeLists.txt file to include a LINK_LIBS
entry for pthreads.
2021-10-10 19:34:34 -07:00
Lang Hames f341161689 [ORC] Add TaskDispatch API and thread it through ExecutorProcessControl.
ExecutorProcessControl objects will now have a TaskDispatcher member which
should be used to dispatch work (in particular, handling incoming packets in
the implementation of remote EPC implementations like SimpleRemoteEPC).

The GenericNamedTask template can be used to wrap function objects that are
callable as 'void()' (along with an optional name to describe the task).
The makeGenericNamedTask functions can be used to create GenericNamedTask
instances without having to name the function object type.

In a future patch ExecutionSession will be updated to use the
ExecutorProcessControl's dispatcher, instead of its DispatchTaskFunction.
2021-10-10 18:39:55 -07:00
Amara Emerson f1e9ecea44 [AArch64][GlobalISel] Legalize G_VECREDUCE_XOR. Treated same as other bitwise reductions. 2021-10-10 17:01:21 -07:00
Lang Hames da7f993a8d [ORC] Reorder callWrapperAsync and callSPSWrapperAsync parameters.
The callee address is now the first parameter and the 'SendResult' function
the second. This change improves consistentency with the non-async functions
where the callee is the first address and the return value the second.
2021-10-10 13:10:43 -07:00
Dawid Jurczak 9e65929a8e [DSE] Re-enable calloc transformation with extra care (PR25892)
Transformation from malloc+memset to calloc is always correct and in many situations
it brings significant observable benefits in terms of execution speed and memory consumption [1][2].
Unfortunately there are cases when producing calloc cause performance drops [3].
As discussed here: https://reviews.llvm.org/D103009 it's possible to differentiate between those 2 scenarios.
If optimizer is able to prove that after malloc call it's _very_ likely to reach memset branch then after
calloc emission we shouldn't observe any performance hits. Therefore finding "null pointer check" pattern
before memset basic block sounds like good justification for performing transformation.
Also that method was already suggested by GCC folks [4]. Main reason for change is that for now
to be safe we check for post dominance relation which is way too conservative approach making transformation
"almost" disabled in practice. This patch tends to enable transformation again but with extra care.

[1] https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc
[2] https://vorpus.org/blog/why-does-calloc-exist/
[3] http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html
[4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83022

Differential Revision: https://reviews.llvm.org/D110021
2021-10-10 21:47:14 +02:00
Sanjay Patel 05281d95f2 [InstCombine] move fold for "(X-Y) == 0"; NFC
This consolidates related folds that all have a
similar use restriction that may not be necessary.
2021-10-10 11:26:03 -04:00
Sanjay Patel da210f5d34 [InstCombine] canonicalize "(C2 - Y) > C" as (Y + ~C2) < ~C
The test diffs show that we have better analysis/folds for 'add'
(although we should at least have the simplifications
independently, so we don't have the one-use restriction).

This is related to solving regressions that would appear in
transforms related to D111410, and that is part of a series
of enhancements that may eventually helpi solve PR34047.

https://alive2.llvm.org/ce/z/3tB9KG

  define i1 @src(i8 %x, i8 %C, i8 %C2) {
    %sub = sub nuw i8 %C2, %x
    %r = icmp slt i8 %sub, %C
    ret i1 %r
  }

  define i1 @tgt(i8 %x, i8 %C, i8 %C2) {
    %Cnot = xor i8 %C, -1
    %C2not = xor i8 %C2, -1
    %add = add nuw i8 %x, %C2not
    %r = icmp sgt i8 %add, %Cnot
    ret i1 %r
  }
2021-10-10 11:06:49 -04:00
william woodruff e7fc254875 [BitcodeAnalyzer] allow a motivated user to dump BLOCKINFO
This adds the `--dump-blockinfo` flag to `llvm-bcanalyzer`, allowing a sufficiently motivated user to dump (parts of) the `BLOCKINFO_BLOCK` block. The default behavior is unchanged, and `--dump-blockinfo` only takes effect in the same context as other flags that control dump behavior (i.e., requires that `--dump` is also passed).

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D107536
2021-10-10 10:15:14 +05:30
Amara Emerson f95d9c95bb [GlobalISel] Fix the stores of truncates -> wide store combine for non-evenly dividing type sizes.
If the wide store we'd generate is not a multiple of the memory type of the
narrow stores (e.g. s48 and s32), we'd assert. Fix that.
2021-10-09 21:18:20 -07:00
Sanjay Patel acafde09a3 [InstCombine] enhance icmp with sub folds
There were 2 related but over-specified folds for:
C1 - X == C

One allowed multi-use but was limited to equal constants.
The other allowed different constants but disallowed multi-use.

This combines the 2 folds into a more general match.
The test diffs show the multi-use cases that were falling
through the cracks.

https://alive2.llvm.org/ce/z/4_hEt2

  define i1 @src(i8 %x, i8 %subC, i8 %C) {
    %s = sub i8 %subC, %x
    %r = icmp eq i8 %s, %C
    ret i1 %r
  }

  define i1 @tgt(i8 %x, i8 %subC, i8 %C) {
    %newC = sub i8 %subC, %C
    %isneg = icmp eq i8 %x, %newC
    ret i1 %isneg
  }
2021-10-09 11:39:49 -04:00
Dávid Bolvanský 943b304848 Fixed some errors detected by PVS Studio 2021-10-09 17:27:41 +02:00
Dávid Bolvanský 3649fb14d1 Fixed some errors detected by PVS Studio 2021-10-09 17:20:04 +02:00
Nikita Popov ea12adc169 [CanonicalizeFreeze] Drop IVUsers.h include (NFC)
Looking for users of IVUsers, this was a false positive. Only LSR
uses IVUsers.
2021-10-09 17:01:26 +02:00
David Green adec922361 [AArch64] Make -mcpu=generic schedule for an in-order core
We would like to start pushing -mcpu=generic towards enabling the set of
features that improves performance for some CPUs, without hurting any
others. A blend of the performance options hopefully beneficial to all
CPUs. The largest part of that is enabling in-order scheduling using the
Cortex-A55 schedule model. This is similar to the Arm backend change
from eecb353d0e which made -mcpu=generic perform in-order scheduling
using the cortex-a8 schedule model.

The idea is that in-order cpu's require the most help in instruction
scheduling, whereas out-of-order cpus can for the most part out-of-order
schedule around different codegen. Our benchmarking suggests that
hypothesis holds. When running on an in-order core this improved
performance by 3.8% geomean on a set of DSP workloads, 2% geomean on
some other embedded benchmark and between 1% and 1.8% on a set of
singlecore and multicore workloads, all running on a Cortex-A55 cluster.

On an out-of-order cpu the results are a lot more noisy but show flat
performance or an improvement. On the set of DSP and embedded
benchmarks, run on a Cortex-A78 there was a very noisy 1% speed
improvement. Using the most detailed results I could find, SPEC2006 runs
on a Neoverse N1 show a small increase in instruction count (+0.127%),
but a decrease in cycle counts (-0.155%, on average). The instruction
count is very low noise, the cycle count is more noisy with a 0.15%
decrease not being significant. SPEC2k17 shows a small decrease (-0.2%)
in instruction count leading to a -0.296% decrease in cycle count. These
results are within noise margins but tend to show a small improvement in
general.

When specifying an Apple target, clang will set "-target-cpu apple-a7"
on the command line, so should not be affected by this change when
running from clang. This also doesn't enable more runtime unrolling like
-mcpu=cortex-a55 does, only changing the schedule used.

A lot of existing tests have updated. This is a summary of the important
differences:
 - Most changes are the same instructions in a different order.
 - Sometimes this leads to very minor inefficiencies, such as requiring
   an extra mov to move variables into r0/v0 for the return value of a test
   function.
 - misched-fusion.ll was no longer fusing the pairs of instructions it
   should, as per D110561. I've changed the schedule used in the test
   for now.
 - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to
   the different latencies. This seems fine to me.
 - Some SVE tests do not always remove movprfx where they did before due
   to different register allocation giving different destructive forms.
 - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll
   produce two LDR where they previously produced an LDP due to
   store-pair-suppress kicking in.
 - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD.
 - Some tests such as arm64-neon-mul-div.ll and
   ragreedy-local-interval-cost.ll have more, less or just different
   spilling.
 - In aarch64_generated_funcs.ll.generated.expected one part of the
   function is no longer outlined. Interestingly if I switch this to use
   any other scheduled even less is outlined.

Some of these are expected to happen, such as differences in outlining
or register spilling. There will be places where these result in worse
codegen, places where they are better, with the SPEC instruction counts
suggesting it is not a decrease overall, on average.

Differential Revision: https://reviews.llvm.org/D110830
2021-10-09 15:58:31 +01:00
Nikita Popov a94002cd64 [Type] Avoid APFloat.h include (NFC)
This is only used by a handful of methods working on fltSemantics,
and having these defined inline in the header does not look
particularly important.
2021-10-09 11:29:26 +02:00
Nikita Popov 55b9146848 [MCPseudoProbe] Clean up includes (NFC)
This was including various things that don't appear to be used in
the header at all.
2021-10-09 10:31:15 +02:00
luxufan 02ac5e5cf1 [Orc] Fix global variable destructor function support when --jit-kind=orc-lazy
The bug was reported here https://bugs.llvm.org/show_bug.cgi?id=52030

This patch follows the idea that @lhames commented in the above webpage.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D110990
2021-10-09 15:58:21 +08:00
Max Kazantsev 4c0da23663 [LoopDeletion] Support selects when symbolically evaluating 1st iteration
Adds support for selects for which we know value on the 1st iteration.

Differential Revision: https://reviews.llvm.org/D104111
Reviewed By: nikic
2021-10-09 14:47:44 +07:00
luxufan 590326382d [Orc] Support atexit in Orc(JITLink)
There is a bug reported at https://bugs.llvm.org/show_bug.cgi?id=48938

After looking through the glibc, I found the `atexit(f)` is the same as `__cxa_atexit(f, NULL, NULL)`. In orc runtime, we identify different JITDylib by their dso_handle value, so that a NULL dso_handle is invalid. So in this patch, I added a `PlatformJDDSOHandle` to ELFNixRuntimeState, and functions which are registered by atexit will be registered at PlatformJD.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D111413
2021-10-09 12:25:47 +08:00
william woodruff 778bf73d7b [BitcodeReader] fix a logic error in vector type element validation
The current code checks whether the vector's element type is a valid structure element type, rather than a valid vector element type. The two have separate implementations and but only accept very slightly different sets of types, which is probably why this wasn't caught before.

Differential Revision: https://reviews.llvm.org/D109655
2021-10-09 09:42:02 +05:30
Brad Smith 65df10f3cd [OpenBSD] Use cortex-a8 as default CPU for ARMv7 2021-10-08 23:57:40 -04:00
Qiu Chaofan f45d5e71d3 [APFloat] Set size of PPCDoubleDouble to 128
566690b0 uses size information in float semantics, but PPCDoubleDouble
left them empty.

As follow-up, we can consider remove PPCDoubleDoubleLegacy and fill
other fields in the future.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D111398
2021-10-09 10:12:10 +08:00
Arthur Eubanks 20a0c482e0 [LICM] Use Align instead of int 2021-10-08 18:26:15 -07:00
Arthur Eubanks a0a4935182 Make more places that use alignment use uint64_t
Followup to D110451.
2021-10-08 16:35:19 -07:00
Nick Desaulniers 9697f93587 [InlineCost] model calls to llvm.is.constant* more carefully
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values.

A common use case for llvm.is.constant comes from the higher level
__builtin_constant_p. A common usage pattern of __builtin_constant_p in
the Linux kernel is:

    void foo (int bar) {
      if (__builtin_constant_p(bar)) {
        // lots of code that will fold away to a constant.
      } else {
        // a little bit of code, usually a libcall.
      }
    }

A minor issue in InlineCost calculations is when `bar` is _not_ Constant
and still will not be after inlining, we don't discount the true branch
and the inline cost of `foo` ends up being the cost of both branches
together, rather than just the false branch.

This leads to code like the above where inlining will not help prove bar
Constant, but it still would be beneficial to inline foo, because the
"true" branch is irrelevant from a cost perspective.

For example, IPSCCP can sink a passed constant argument to foo:

    const int x = 42;
    void bar (void) { foo(x); }

This improves our inlining decisions, and fixes a few head scratching
cases were the disassembly shows a relatively small `foo` not inlined
into a lone caller.

We could further improve this modeling by tracking whether the argument
to llvm.is.constant* is a parameter of the function, and if inlining
would allow that parameter to become Constant. This idea is noted in a
FIXME comment.

Link: https://github.com/ClangBuiltLinux/linux/issues/1302

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D111272
2021-10-08 15:27:30 -07:00
Reid Kleckner b3a6d096d7 Fix shlib builds for all lib/Target/*/TargetInfo libs
They all must depend on MC now that the target registry is in MC.
Also fix llvm-cxxdump
2021-10-08 15:21:13 -07:00
Reid Kleckner 2827b1b89d Fix shared library build after TargetRegistry move 2021-10-08 15:06:03 -07:00
Reid Kleckner 89b57061f7 Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.

This allows us to ensure that Support doesn't have includes from MC/*.

Differential Revision: https://reviews.llvm.org/D111454
2021-10-08 14:51:48 -07:00
Leonard Chan 4dc462b589 [AArch64] Emit CFI instruction for updating x18 when using ShadowCallStack with exception unwinding
PR45875 notes an instance where exception handling crashes on aarch64-fuchsia
where SCS is enabled by default. The underlying issue seems to be that within libunwind,
various _Unwind_* functions, the x18 register is not updated if a function is marked
with nounwind. This removes the check for nounwind and emits the CFI instruction that updates x18.

Differential Revision: https://reviews.llvm.org/D79822
2021-10-08 14:20:26 -07:00
Nikita Popov e3129fb792 [LoopFlatten] Mark inner loop as deleted
If a loop is flattened, the inner loop is removed and the LPM
should be informed of this fact, so it can invalidate associated
analyses. To support this, we relax an assertion in LPMUpdater to
allow invalidating non-top-level loops when running in LoopNestMode,
as the pass does not know how exactly it will get scheduled.

Differential Revision: https://reviews.llvm.org/D111350
2021-10-08 23:12:15 +02:00
Lang Hames 8fe3d9df0e Revert "[ORC] Move SimpleRemoteEPCServer::Dispatcher into OrcShared."
This reverts commit dfd74db981.

SimpleRemoteEPC should share dispatch with the ExecutionSession, rather than
having two different dispatch systems on the controller side.
SimpleRemoteEPCServer::Dispatch doesn't need to be shared.
2021-10-08 13:43:42 -07:00
Lang Hames a129305b28 [ORC] Remove a stale comment.
SimpleRemoteEPCServer Service shutdown (c965fde7c2) takes care of this.
2021-10-08 13:43:42 -07:00
Andrew Browne 007d98f520 [DFSan] Fix warning: getArgsFunctionType defined but not used
Warning introduced in 61ec2148c5
2021-10-08 11:58:36 -07:00
Arthur Eubanks a3358fcff1 More followup type changes after 05392466 2021-10-08 11:51:36 -07:00
Lang Hames dfd74db981 [ORC] Move SimpleRemoteEPCServer::Dispatcher into OrcShared.
Renames SimpleRemoteEPCServer::Dispatcher to SimpleRemoteEPCDispatcher and
moves it into OrcShared. SimpleRemoteEPCServer::ThreadDispatcher is similarly
moved and renamed to DynamicThreadPoolSimpleRemoteEPCDispatcher.

This will allow these classes to be reused by SimpleRemoteEPC on the controller
side of the connection.
2021-10-08 11:29:57 -07:00
Amara Emerson 17b89f9daa [GlobalISel] Improve G_UMHULH -> LSHR combine to accept non-uniform constant vectors. 2021-10-08 11:25:26 -07:00
Andrew Browne 61ec2148c5 [DFSan] Remove -dfsan-args-abi support in favor of TLS.
ArgsABI was originally added in https://reviews.llvm.org/D965

Current benchmarking does not show a significant difference.
There is no need to maintain both ABIs.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D111097
2021-10-08 11:18:36 -07:00
Craig Topper a9700653ab [RegisterScavenging] Use a Twine in a call to report_fatal_error instead of going from std::string to c_str. NFC
The std::string was built on the line above. Might as well just
build it as a Twine in the call.
2021-10-08 11:04:08 -07:00
Philip Reames edf31b4db1 [IPT] Add a statistic to track instructions scanned to answer queries
I'm planning some changes to the invalidation mechanism here, and having a concrete mechanism to track progress is key.
2021-10-08 10:59:35 -07:00
Philip Reames de5477ed42 Add a statistic to track number of times we rebuild instruction ordering
The goal here is to assist some future tuning work both on instruction ordering invalidation, and on some client code which uses it.
2021-10-08 10:59:34 -07:00
Arthur Eubanks 9405217999 Revert "Recommit "[LoopPeel] Peel loops with deoptimizing exits""
This reverts commit d68b59f3eb.

This is causing crashes, see D110922 for details.
2021-10-08 10:53:23 -07:00
Philip Reames b4498e6b8d [IPT] Narrow scope of removeInstruction invalidation [NFC]
We only need to invalidate if the instruction being removed is the cached "first special instruction".  If the instruction is before that one, it can't (by assumption) be special.  If it is after that one, it wasn't the first.
2021-10-08 10:35:03 -07:00
Andreas Schwab a706a5ef22 [Support] Define sys::getHostCPUName for RISC-V
The RISCV target doesn't define a "generic" cpu, only "generic-rv32" and
"generic-rv64".  Define sys::getHostCPUName for RISC-V that returns the
correct cpu for the host.

Reviewed By: craig.topper, MaskRay

Differential Revision: https://reviews.llvm.org/D105274
2021-10-08 10:08:39 -07:00
Philip Reames d694dd0f0d Add iterator range variants of isGuaranteedToTransferExecutionToSuccessor [mostly-nfc]
This factors out utilities for scanning a bounded block of instructions since we have this code repeated in a bunch of places.  The change to InlineFunction isn't strictly NFC as the limit mechanism there didn't handle debug instructions correctly.
2021-10-08 09:50:10 -07:00
Bradley Smith 7c68d4b8ff Revert "[SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR."
This reverts commit 3e8d2008f7.

The code removed in this commit is actually required for extracting
fixed types from illegal scalable types, hence this commit causes
assertion failures in such extracts.
2021-10-08 14:53:26 +00:00
David Stuttard 69f7d81d0a [AMDGPU] Set number vgprs used in PS shaders based on input registers actually used
For PS shaders we can use the input SPI_PS_INPUT_ENA and SPI_PS_INPUT_ADDR
registers

Calculate the number of VGPR registers used as input VGPRs based on these
registers rather than the arguments passed in (this conservatively always
allocates the maximum).

Differential Revision: https://reviews.llvm.org/D101633

Change-Id: Idf7c060cbbd5f7e3300102c55ecee3c07f209de6
2021-10-08 14:24:35 +01:00
Mirko Brkusanin d20840c937 [GlobalISel] Combine for eliminating redundant operand negations
Differential Revision: https://reviews.llvm.org/D111319
2021-10-08 14:29:22 +02:00
Max Kazantsev d68b59f3eb Recommit "[LoopPeel] Peel loops with deoptimizing exits"
Removed obsolete DT verification that should not be there because the
strategy of DT updates has changed.

Differential Revision: https://reviews.llvm.org/D110922
2021-10-08 17:54:27 +07:00
Jingu Kang 30caca39f4 Third Recommit "[AArch64] Split bitmask immediate of bitwise AND operation"
This reverts the revert commit fc36fb4d23 with
bug fixes.

Differential Revision: https://reviews.llvm.org/D109963
2021-10-08 11:28:49 +01:00