Commit Graph

14 Commits

Author SHA1 Message Date
Eli Friedman 4532a50899 Infer alignment of unmarked loads in IR/bitcode parsing.
For IR generated by a compiler, this is really simple: you just take the
datalayout from the beginning of the file, and apply it to all the IR
later in the file. For optimization testcases that don't care about the
datalayout, this is also really simple: we just use the default
datalayout.

The complexity here comes from the fact that some LLVM tools allow
overriding the datalayout: some tools have an explicit flag for this,
some tools will infer a datalayout based on the code generation target.
Supporting this properly required plumbing through a bunch of new
machinery: we want to allow overriding the datalayout after the
datalayout is parsed from the file, but before we use any information
from it. Therefore, IR/bitcode parsing now has a callback to allow tools
to compute the datalayout at the appropriate time.

Not sure if I covered all the LLVM tools that want to use the callback.
(clang? lli? Misc IR manipulation tools like llvm-link?). But this is at
least enough for all the LLVM regression tests, and IR without a
datalayout is not something frontends should generate.

This change had some sort of weird effects for certain CodeGen
regression tests: if the datalayout is overridden with a datalayout with
a different program or stack address space, we now parse IR based on the
overridden datalayout, instead of the one written in the file (or the
default one, if none is specified). This broke a few AVR tests, and one
AMDGPU test.

Outside the CodeGen tests I mentioned, the test changes are all just
fixing CHECK lines and moving around datalayout lines in weird places.

Differential Revision: https://reviews.llvm.org/D78403
2020-05-14 13:03:50 -07:00
David Green 3a6eb5f160 [ARM] Disable VLD4 under MVE
Alas, using half the available vector registers in a single instruction
is just too much for the register allocator to handle. The mve-vldst4.ll
test here fails when these instructions are enabled at present. This
patch disables the generation of VLD4 and VST4 by adding a
mve-max-interleave-factor option, which we currently default to 2.

Differential Revision: https://reviews.llvm.org/D71109
2019-12-08 10:37:29 +00:00
David Green 882f23caea [ARM] MVE interleaving load and stores.
Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
for MVE. This works the same way as Neon, recognising the load/shuffles
combination and converting them into intrinsics in a pre-isel pass,
which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
and lowerInterleavedStore.

The main difference to Neon is that we do not have a VLD3 instruction.
Otherwise most of the code works very similarly, with just some minor
differences in the form of the intrinsics to work around. VLD3 is
disabled by making isLegalInterleavedAccessType return false for those
cases.

We may need some other future adjustments, such as VLD4 take up half the
available registers so should maybe cost more. This patch should get the
basics in though.

Differential Revision: https://reviews.llvm.org/D69392
2019-11-19 18:37:30 +00:00
David Green 411bfe476b [ARM] Add and update a lot of VLDn tests. NFC 2019-11-19 18:37:30 +00:00
Eric Christopher cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Eli Friedman 96f295e23b [InterleavedAccessPass] Don't increase the number of bytes loaded.
Even if the interleaving transform would otherwise be legal, we shouldn't
introduce an interleaved load that is wider than the original load: it might
have undefined behavior.

It might be possible to perform some sort of mask-narrowing transform in
some cases (using a narrower interleaved load, then extending the
results using shufflevectors).  But I haven't tried to implement that,
at least for now.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41245 .

Differential Revision: https://reviews.llvm.org/D59954

llvm-svn: 357212
2019-03-28 20:44:50 +00:00
Matthew Simpson 12eaef75ce [ARM] Implement interleaved access bug fix from r306334
r306334 fixed a bug in AArch64 dealing with wide interleaved accesses having
pointer types. The bug also exists in ARM, so this patch copies over the fix.

llvm-svn: 307409
2017-07-07 16:15:05 +00:00
Matthew Simpson 1468d3e04e [ARM/AArch64] Ensure valid vector element types for interleaved accesses
This patch refactors and strengthens the type checks performed for interleaved
accesses. The primary functional change is to ensure that the interleaved
accesses have valid element types. The added test cases previously failed
because the element type is f128.

Differential Revision: https://reviews.llvm.org/D31817

llvm-svn: 299864
2017-04-10 18:34:37 +00:00
Matthew Simpson 1bfa159db9 [ARM/AArch64] Support wide interleaved accesses
This patch teaches (ARM|AArch64)ISelLowering.cpp to match illegal vector types
to interleaved access intrinsics as long as the types are multiples of the
vector register width. A "wide" access will now be mapped to multiple
interleave intrinsics similar to the way in which non-interleaved accesses with
illegal types are legalized into multiple accesses. I'll update the associated
TTI costs (in getInterleavedMemoryOpCost) as a follow-on.

Differential Revision: https://reviews.llvm.org/D29466

llvm-svn: 296750
2017-03-02 15:11:20 +00:00
Ahmed Bougacha fc979dc9dd [ARM] Don't lower f16 interleaved accesses.
There are no vldN/vstN f16 variants, even with +fullfp16.
We could use the i16 variants, but, in practice, even with +fullfp16,
the f16 sequence leading to the i16 shuffle usually gets scalarized.
We'd need to improve our support for f16 codegen before getting there.

Reject f16 interleaved accesses.  If we try to emit the f16 intrinsics,
we'll just end up with a selection failure.

llvm-svn: 294818
2017-02-11 01:53:00 +00:00
Ahmed Bougacha f37fb89edc [ARM] Unique some redundant CHECK lines. NFC.
llvm-svn: 294817
2017-02-11 01:52:57 +00:00
Matthias Braun 01fa962226 InterleaveAccessPass: Avoid constructing invalid shuffle masks
Fix a bug where we would construct shufflevector instructions addressing
invalid elements.

Differential Revision: https://reviews.llvm.org/D29313

llvm-svn: 293673
2017-01-31 18:37:53 +00:00
Matthew Simpson 3650df13be [ARM/AArch64] Relocate and update InterleavedAccessPass tests (NFC)
The interleaved access pass is an IR-to-IR transformation that runs before code
generation. It matches interleaved memory operations to target-specific
intrinsics (that are later lowered to load and store multiple instructions on
ARM/AArch64). We place tests for similar passes (e.g., GlobalMergePass) under
test/Transforms. This patch moves the InterleavedAccessPass tests out of
test/CodeGen and into target-specific directories under
test/Transforms/InterleavedAccess.

Although the pass is an IR pass, many of the existing tests were llc tests
rather opt tests. For example, the tests would check for ldN/stN instructions
generated by llc rather than the intrinsic calls the pass actually inserts.
Thus, this patch updates all tests to be opt tests that check for the inserted
intrinsics. We already have separate CodeGen tests that ensure we lower the
interleaved access intrinsics to their corresponding ldN/stN instructions. In
addition to migrating the tests to opt, this patch also performs some minor
clean-up (to ensure consistent naming, etc.).

Differential Revision: https://reviews.llvm.org/D29184

llvm-svn: 293309
2017-01-27 17:33:16 +00:00