Summary:
This makes it much easier to verify that the implementation matches the
documentation. It uncovered a bug in the unit tests where we were
accidentally setting fast math flags on a load instruction.
Reviewers: spatel, wristow, arsenm, hfinkel, aemerson, efriedma, cameron.mcinally, mcberg2017, jmolloy
Subscribers: wdng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69176
llvm-svn: 375252
D68992 / rL375086 refactored the packetizer and removed a bunch of logic. Unfortunately it creates an Automaton object whenever a DFAPacketizer is required. These objects have no longevity, and in particular on a debug build the population of the Automaton's transition map from the underlying table is very slow (because it is called ~10 times per MachineFunction, in the testcase I'm looking at).
This patch changes Automaton to wrap its underlying constant data in std::shared_ptr, which allows trivial copy construction. The DFAPacketizer creation function now creates a static archetypical Automaton and copies that whenever a new DFAPacketizer is required.
This takes a testcase down from ~20s to ~0.5s in debug mode.
llvm-svn: 375240
Summary:
This will allow updating MinidumpYAML and LLDB to use this common
definition.
Reviewers: labath, jhenderson, clayborg
Reviewed By: labath
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68656
llvm-svn: 375239
Summary:
It looks like this is the only missing statistic in the CVP pass.
Since we prove NSW and NUW separately i'd think we should count them separately too.
Reviewers: nikic, spatel, reames
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68740
llvm-svn: 375230
Summary:
The PMMIR_EL1 register is present in Armv8.4 with PMU extension.
This patch adds support for it.
Reviewers: t.p.northover, dnsampaio
Reviewed By: dnsampaio
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68940
llvm-svn: 375228
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.
Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.
At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.
I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.
Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy
Reviewed By: jmolloy
Differential Revision: https://reviews.llvm.org/D47775
llvm-svn: 375222
For arm64, D18619 introduced the ability to combine bumping the stack pointer
upfront in case it needs to be bumped for both the callee-save area as well as
the local stack area.
That diff already remarks that "This change can cause an increase in
instructions", but argues that even when that happens, it should be still be a
performance benefit because the number of micro-ops is reduced.
We have observed that this code-size increase can be significant in practice.
This diff disables combining stack bumping for methods that are marked as
optimize-for-size.
Example of a prologue with the behavior before this diff (combining stack bumping when possible):
sub sp, sp, #0x40
stp d9, d8, [sp, #0x10]
stp x20, x19, [sp, #0x20]
stp x29, x30, [sp, #0x30]
add x29, sp, #0x30
[... compute x8 somehow ...]
stp x0, x8, [sp]
And after this diff, if the method is marked as optimize-for-size:
stp d9, d8, [sp, #-0x30]!
stp x20, x19, [sp, #0x10]
stp x29, x30, [sp, #0x20]
add x29, sp, #0x20
[... compute x8 somehow ...]
stp x0, x8, [sp, #-0x10]!
Note that without combining the stack bump there are two auto-decrements,
nicely folded into the stp instructions, whereas otherwise there is a single
sub sp, ... instruction, but not folded.
Patch by Nikolai Tillmann!
Differential Revision: https://reviews.llvm.org/D68530
llvm-svn: 375217
The default promotion for the add_sat/sub_sat nodes currently does:
ANY_EXTEND iN to iM
SHL by M-N
[US][ADD|SUB]SAT
L/ASHR by M-N
If the promoted add_sat or sub_sat node is not legal, this can produce code
that effectively does a lot of shifting (and requiring large constants to be
materialised) just to use the overflow flag. It is simpler to just do the
saturation manually, using the higher bitwidth addition and a min/max against
the saturating bounds. That is what this patch attempts to do.
Differential Revision: https://reviews.llvm.org/D68926
llvm-svn: 375211
Summary:
Implements the following intrinsics:
- int_aarch64_sve_sunpkhi
- int_aarch64_sve_sunpklo
- int_aarch64_sve_uunpkhi
- int_aarch64_sve_uunpklo
This patch also adds AArch64ISD nodes for UNPK instead of implementing
the intrinsics directly, as they are required for a future patch which
implements the sign/zero extension of legal vectors.
This patch includes tests for the Subdivide2Argument type added by D67549
Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka
Reviewed By: greened
Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D67550
llvm-svn: 375210
Adding the reproducer from https://bugs.llvm.org/show_bug.cgi?id=43689,
showing that instcombine is doing a bad transform. It transforms
%0 = insertelement <2 x i16> undef, i16 %a, i32 0
%1 = srem <2 x i16> %0, <i16 2, i16 1>
%2 = shufflevector <2 x i16> %1, <2 x i16> undef, <2 x i32> <i32 undef, i32 0>
into
%1 = insertelement <2 x i16> undef, i16 %a, i32 1
%2 = srem <2 x i16> %1, <i16 undef, i16 2>
The undef denominator makes the whole srem undefined.
llvm-svn: 375207
This will allow us to serialize just the result object instead of the
whole lit.Test object back from the worker to the main lit process.
llvm-svn: 375195
There's no need to have more than one of these (there can be two
DwarfFiles - one for the .o, one for the .dwo - but only one loc/loclist
section (either in the .o or the .dwo) & certainly one per
DebugLocStream, which is currently singular in DwarfDebug)
llvm-svn: 375183
This relands r374931 (reverted in r375088). It fixes 32-bit builds by using the right format string specifier for uint64_t (PRIu64) instead of `%d`.
Original description:
When listing the index in `llvm-objdump -h`, use a zero-based counter instead of the actual section index (e.g. shdr->sh_index for ELF).
While this is effectively a noop for now (except one unit test for XCOFF), the index values will change in a future patch that filters certain sections out (e.g. symbol tables). See D68669 for more context. Note: the test case in `test/tools/llvm-objdump/X86/section-index.s` already covers the case of incrementing the section index counter when sections are skipped.
Reviewers: grimar, jhenderson, espindola
Reviewed By: grimar
Subscribers: emaste, sbc100, arichardson, aheejin, arphaman, seiya, llvm-commits, MaskRay
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68848
llvm-svn: 375178
Summary:
The current implementation eats the current errors and just outputs
the message parameter passed to llvm::cantFail. This change appends
the original error message(s), so the user can see exactly why
cantFail failed. New logic is conditional on NDEBUG.
Reviewed By: lhames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69057
llvm-svn: 375176
We already have isFloatType helper, and they are out of sync.
Drop one and merge the type list.
Differential Revision: https://reviews.llvm.org/D69138
llvm-svn: 375175
Summary: GNU objcopy accepts the --wildcard flag to allow wildcard matching on symbol-related flags. (Note: it's implicitly true for section flags).
The basic syntax is to allow *, ?, \, and [] which work similarly to how they work in a shell. Additionally, starting a wildcard with ! causes that wildcard to prevent it from matching a flag.
Use an updated GlobPattern in libSupport to handle these patterns. It does not fully match the `fnmatch` used by GNU objcopy since named character classes (e.g. `[[:digit:]]`) are not supported, but this should support most existing use cases (mostly just `*` is what's used anyway).
Reviewers: jhenderson, MaskRay, evgeny777, espindola, alexshap
Reviewed By: MaskRay
Subscribers: nickdesaulniers, emaste, arichardson, hiraditya, jakehehrlich, abrachet, seiya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66613
llvm-svn: 375169
We always want to use a deadline when calling `result.await`. Let's
synthesize an artificial deadline (now plus one year) to simplify code
and do less busy waiting.
Thanks to Reid Kleckner for diagnosing that a deadline for of "positive
infinity" does not work with Python 3 anymore. See commit:
4ff1e34b60
I tested this patch with Python 2 and Python 3.
llvm-svn: 375165
We're passing LLVM_EXTERNAL_PROJECTS to cross-compilation configures, so
we also need to pass the source directories of those projects, otherwise
configuration can fail from not finding them.
Differential Revision: https://reviews.llvm.org/D69076
llvm-svn: 375157
Header64.offset/Header64.size are uint64_t, thus we should not
truncate them to unit32_t. Moreover, there are a number of places
where we sum the offset and the size (e.g. in various checks in MachOUniversal.cpp),
the truncation causes issues since the offset/size can perfectly fit into uint32_t,
while the sum overflows.
Differential revision: https://reviews.llvm.org/D69126
Test plan: make check-all
llvm-svn: 375154
Quite a while ago, we implemented a pass that will reduce the number of
CR-logical operations we emit. It does so by converting a CR-logical operation
into a branch. We have kept this off by default because it seemed to cause a
significant regression with one benchmark.
However, that regression turned out to be due to a completely unrelated
reason - AADB introducing a self-copy that is a priority-setting nop and it was
just exacerbated by this pass.
Now that we understand the reason for the only degradation, we can turn this
pass on by default. We have long since fixed the cause for the degradation.
Differential revision: https://reviews.llvm.org/D52431
llvm-svn: 375152
Reland r375051 (reverted in r375052) after fixing lld tests on Windows in r375126 and r375131.
Original description: Update GlobPattern in libSupport to handle a few more cases. It does not fully match the `fnmatch` used by GNU objcopy since named character classes (e.g. `[[:digit:]]`) are not supported, but this should support most existing use cases (mostly just `*` is what's used anyway).
This will be used to implement the `--wildcard` flag in llvm-objcopy to be more compatible with GNU objcopy.
This is split off of D66613 to land the libSupport changes separately. The llvm-objcopy part will land soon.
Reviewers: jhenderson, MaskRay, evgeny777, espindola, alexshap
Reviewed By: MaskRay
Subscribers: nickdesaulniers, emaste, arichardson, hiraditya, jakehehrlich, abrachet, seiya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66613
llvm-svn: 375149
Python on Windows raises this OverflowError:
gotit = waiter.acquire(True, timeout)
OverflowError: timestamp too large to convert to C _PyTime_t
So it seems this API behave the same way on every OS.
Also reverts the dependent commit a660dc590a.
llvm-svn: 375143
In the process of writing D69009, I realized we have two distinct sets of invariants within this single function, and basically no shared logic. The optimize loop exit transforms (including the new one in D69009) only care about *analyzeable* exits. Loop predication, on the other hand, has to reason about *all* exits. At the moment, we have the property (due to the requirement for an exact btc) that all exits are analyzeable, but that will likely change in the future as we add widenable condition support.
llvm-svn: 375138
Summary:
In the long run we should come up with another mechanism for marking
call instructions as heap allocation sites, and remove this workaround.
For now, we've had two bug reports about this, so let's apply this
workaround. SLH (the other client of instruction labels) probably has
the same bug, but the solution there is more likely to be to mark the
call instruction as not duplicatable, which doesn't work for debug info.
Reviewers: akhuang
Subscribers: aprantl, hiraditya, aganea, chandlerc, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69068
llvm-svn: 375137