Commit Graph

320111 Commits

Author SHA1 Message Date
Diego Novillo 688afeb884 Update phis in AMDGPUUnifyDivergentExitNodes
Original patch https://reviews.llvm.org/D63659 from
Steven Perron <stevenperron@google.com>

The pass AMDGPUUnifyDivergentExitNodes does not update the phi nodes in
the successors of blocks that is splits. This is fixed by calling
BasicBlock::splitBasicBlock to split the block instead of doing it
manually. This does extra work because a new conditional branch is
created in BB which is immediately replaced, but I think the simplicity
is worth it. It also helps make the code more future proof in case other
things need to be updated.

llvm-svn: 364342
2019-06-25 18:55:16 +00:00
Sanjay Patel fcfa056ceb [InstCombine] reduce checks for power-of-2-or-zero using ctpop
This follows up the transform from rL363956 to use the ctpop intrinsic when checking for power-of-2-or-zero.

This is matching the isPowerOf2() patterns used in PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314

But there's at least 1 instcombine follow-up needed to match the alternate form:

(v & (v - 1)) == 0;

We should have all of the backend expansions handled with:
rL364319
(x86-specific changes still needed for optimal code based on subtarget)

And the larger patterns to exclude zero as a power-of-2 are joining with this change after:
rL364153 ( D63660 )
rL364246

Differential Revision: https://reviews.llvm.org/D63777

llvm-svn: 364341
2019-06-25 18:51:44 +00:00
Richard Smith 30519a68d5 Add regression test for PR41576 (which is already fixed in trunk,
perhaps by r361300).

llvm-svn: 364340
2019-06-25 18:42:53 +00:00
Stanislav Mekhanoshin 4be636ebb3 [AMDGPU] Removed dead SIMachineFunctionInfo::getWorkItemIDVGPR()
Differential Revision: https://reviews.llvm.org/D63780

llvm-svn: 364339
2019-06-25 18:33:53 +00:00
Sam Clegg 61d70e4a93 [WebAssembly] Error on archives without a symbol index
This is fairly common with wasm since GNU ar (most likely the system ar)
doesn't support the wasm object format so user who don't override AR
will end up with archives without an index.  We don't want to silently
ignore this issue.

In the future we could choose to instead behave like the ELF backend and
read the symbols from each object file in the archive if they are all of
the same type.  However, error'ing out seem like a conservative approach
for now.

Fixes: PR42376

Differential Revision: https://reviews.llvm.org/D63739

llvm-svn: 364338
2019-06-25 17:49:35 +00:00
Craig Topper 4577b8c17c [X86] Remove isel patterns that look for (vzext_movl (scalar_to_vector (load)))
I believe these all get canonicalized to vzext_movl. The only case where that wasn't true was when the load was loadi32 and the load was an extload aligned to 32 bits. But that was fixed in r364207.

Differential Revision: https://reviews.llvm.org/D63701

llvm-svn: 364337
2019-06-25 17:31:52 +00:00
Philip Reames be0dedb2e1 [Peephole] Allow folding loads into instructions w/multiple uses (such as test64rr)
Peephole opt has a one use limitation which appears to be accidental. The function being used was incorrectly documented as returning whether the def had one *user*, but instead returned true only when there was one *use*. Add a corresponding hasOneNonDbgUser helper, and adjust peephole-opt to use the appropriate one.

All of the actual folding code handles multiple uses within a single instruction. That codepath is well exercised through instruction selection.

Differential Revision: https://reviews.llvm.org/D63656

llvm-svn: 364336
2019-06-25 17:29:18 +00:00
Jonas Devlieghere 99a4491527 [Python] Flush prompt before reading input
Make sure the prompt has been flushed before reading commands. Buffering
is different in Python 3, which led to the prompt not being displayed in
the Xcode console.

llvm-svn: 364335
2019-06-25 17:27:38 +00:00
Davide Italiano 97017a8ef9 [CMake] Check that a certificate for lldb is present at build time.
Reviewers: JDevlieghere, sgraenitz, aprantl, friss

Subscribers: mgorny, lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D63745

llvm-svn: 364334
2019-06-25 17:13:24 +00:00
Craig Topper 14ea14ae85 [X86] Add a DAG combine to turn vzmovl+load into vzload if the load isn't volatile. Remove isel patterns for vzmovl+load
We currently have some isel patterns for treating vzmovl+load the same as vzload, but that shrinks the load which we shouldn't do if the load is volatile.

Rather than adding isel checks for volatile. This patch removes the patterns and teachs DAG combine to merge them into vzload when its legal to do so.

Differential Revision: https://reviews.llvm.org/D63665

llvm-svn: 364333
2019-06-25 17:08:26 +00:00
Kostya Kortchinsky 37340e3cd6 [scudo][standalone] Introduce the C & C++ wrappers
Summary:
This CL adds C & C++ wrappers and associated tests. Those use default
configurations for a Scudo combined allocator that will likely be
tweaked in the future.

This is the final CL required to have a functional C & C++ allocator
based on Scudo.

The structure I have chosen is to define the core C allocation
primitives in an `.inc` file that can be customized through defines.
This allows to easily have 2 (or more) sets of wrappers backed by
different combined allocators, as demonstrated by the `Bionic`
wrappers: one set for the "default" allocator, one set for the "svelte"
allocator.

Currently all the tests added have been gtests, but I am planning to
add some more lit tests as well.

Reviewers: morehouse, eugenis, vitalybuka, hctim, rengolin

Reviewed By: morehouse

Subscribers: srhines, mgorny, delcypher, jfb, #sanitizers, llvm-commits

Tags: #llvm, #sanitizers

Differential Revision: https://reviews.llvm.org/D63612

llvm-svn: 364332
2019-06-25 16:51:27 +00:00
Simon Tatham e8de8ba6a6 [ARM] Support inline assembler constraints for MVE.
"To" selects an odd-numbered GPR, and "Te" an even one. There are some
8.1-M instructions that have one too few bits in their register fields
and require registers of particular parity, without necessarily using
a consecutive even/odd pair.

Also, the constraint letter "t" should select an MVE q-register, when
MVE is present. This didn't need any source changes, but some extra
tests have been added.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60709

llvm-svn: 364331
2019-06-25 16:49:32 +00:00
Ayke van Laethem 88139c143c [AVR] Adjust to Register class change
A refactor in r364191 changed register types from an unsigned int to the
llvm:Register class. Adjust the AVR backend to this change.

This fixes build errors when building with the experimental AVR backend
enabled.

Differential Revision: https://reviews.llvm.org/D63776

llvm-svn: 364330
2019-06-25 16:49:22 +00:00
Simon Tatham a4b415a683 [ARM] Code-generation infrastructure for MVE.
This provides the low-level support to start using MVE vector types in
LLVM IR, loading and storing them, passing them to __asm__ statements
containing hand-written MVE vector instructions, and *if* you have the
hard-float ABI turned on, using them as function parameters.

(In the soft-float ABI, vector types are passed in integer registers,
and combining all those 32-bit integers into a q-reg requires support
for selection DAG nodes like insert_vector_elt and build_vector which
aren't implemented yet for MVE. In fact I've also had to add
`arm_aapcs_vfpcc` to a couple of existing tests to avoid that
problem.)

Specifically, this commit adds support for:

 * spills, reloads and register moves for MVE vector registers

 * ditto for the VPT predication mask that lives in VPR.P0

 * make all the MVE vector types legal in ISel, and provide selection
   DAG patterns for BITCAST, LOAD and STORE

 * make loads and stores of scalar FP types conditional on
   `hasFPRegs()` rather than `hasVFP2Base()`. As a result a few
   existing tests needed their llc command lines updating to use
   `-mattr=-fpregs` as their method of turning off all hardware FP
   support.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60708

llvm-svn: 364329
2019-06-25 16:48:46 +00:00
Kevin P. Neal d0f96be2c7 [FPEnv] A missing crucial step was undocumented.
llvm-svn: 364328
2019-06-25 16:09:39 +00:00
Alexey Bataev a90fc6617f [OPENMP]Fix PR41966: type mismatch in runtime functions.
Target-based runtime functions use int64_t type for sizes, while the
compiler uses size_t type. It leads to miscompilation in 32 bit mode.

llvm-svn: 364327
2019-06-25 16:00:43 +00:00
Simon Pilgrim 9762b26032 [DAGCombine] combineRepeatedFPDivisors - recognize -1.0 / X as a reciprocal
Fixes issue identified by @nemanjai (Nemanja Ivanovic) in D62963 / rL363040 - infinite loop due to GetNegatedExpression fighting combineRepeatedFPDivisors resulting in fneg(fdiv(x,splat)) -> fneg(fmul(x,1.0/splat)) -> fmul(x,-1.0/splat) -> fmul(x,(-1.0 * 1.0)/splat) ......

llvm-svn: 364326
2019-06-25 16:00:16 +00:00
Jonas Devlieghere 635eb80662 [Python 3] Decode check_ouput result as UTF-8
llvm-svn: 364325
2019-06-25 15:58:32 +00:00
Fangrui Song 96a192ea53 [PPC32] Support PLT calls for -msecure-plt -fpic
Summary:
In Secure PLT ABI, -fpic is similar to -fPIC. The differences are that:

* -fpic stores the address of _GLOBAL_OFFSET_TABLE_ in r30, while -fPIC stores .got2+0x8000.
* -fpic uses an addend of 0 for R_PPC_PLTREL24, while -fPIC uses 0x8000.

Reviewers: hfinkel, jhibbits, joerg, nemanjai, spetrovic

Reviewed By: jhibbits

Subscribers: adalava, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63563

llvm-svn: 364324
2019-06-25 15:56:32 +00:00
Sam Parker bcf0eb7a64 [ARM] Fix for DLS/LE CodeGen
The expensive buildbots highlighted the mir tests were broken, which
I've now updated and added --verify-machineinstrs to them. This also
uncovered a couple of bugs in the backend pass, so these have also
been fixed.

llvm-svn: 364323
2019-06-25 15:11:17 +00:00
Xing Xue ece53d0ae5 Improve zero-size allocation with safe_malloc, etc.
Summary:
The current implementations of the memory allocation functions mistake a nullptr returned from std::malloc, std::calloc, or std::realloc as a failure. The behaviour for each of std::malloc, std::calloc, and std::realloc when the size is 0 is implementation defined (ISO/IEC 9899:2018 7.22.3), and may return a nullptr.

This patch checks if space requested is zero when a nullptr is returned, retry requesting non-zero if it is.

Authored By: andusy

Reviewers: hubert.reinterpretcast, xingxue, jasonliu

Reviewed By: hubert.reinterpretcast, xingxue, abrachet

Subscribers: abrachet, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63668

llvm-svn: 364322
2019-06-25 15:08:28 +00:00
Hans Wennborg 36c23cad15 Revert r362743 "Revert "Revert "Reland D61583 [ELF] Error on relocations to STT_SECTION symbols if the sections were discarded"""
(In effect, reverting "[ELF] Error on relocations to STT_SECTION symbols if the sections were discarded".)

It caused debug info problems in LibreOffice [1] and Chromium/V8 [2].
Reverting until those can be fixed.

It also reverts r362497 "STT_SECTION symbol should be defined" on .eh_frame, .debug*, .zdebug* and .gcc_except_table"
which was landed as a follow-up to the above.

> With -r or --emit-relocs, we warn `STT_SECTION symbol should be defined`
> on relocations to discarded section symbol. This was added as an error
> in rLLD319404, but was not so effective before D61583 (it turned the
> error to a warning).
>
> Relocations from .eh_frame .debug* .zdebug* .gcc_except_table to
> discarded .text are very common and somewhat expected. Don't warn/error
> on them. As a reference, ld.bfd has a similar logic in
> _bfd_elf_default_action_discarded() to allow these cases.
>
> Delete invalid-undef-section-symbol.test because what it intended to
> check is now covered by the updated comdat-discarded-reloc.s
>
> Delete relocatable-eh-frame.s because we allow relocations from
> .eh_frame as a special case now.

And finally it reverts r362218 "[ELF] Replace a dead test in getSymVA() with assert()"
as that also depended on the main change reverted here.

> Symbols relative to discarded comdat sections are Undefined instead of
> Defined now (after D59649 and D61583). The `== &InputSection::Discarded`
> test becomes dead. I cannot find a test related to this behavior.

 [1] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190603/659848.html
 [2] https://bugs.chromium.org/p/chromium/issues/detail?id=978067

llvm-svn: 364321
2019-06-25 14:58:46 +00:00
Simon Pilgrim e98f8cf78f [SLPVectorizer] Precommit of supernode.ll test for D63661
This is a pre-commit of the tests introduced by the SuperNode SLP patch D63661.

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D63664

llvm-svn: 364320
2019-06-25 14:58:20 +00:00
Sanjay Patel 685c5cbc65 [SDAG] expand ctpop != 1
Change the generic ctpop expansion to more efficiently handle a
check for not-a-power-of-two value:
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)

This is the inverted predicate sibling pattern that was added with:
D63004

This should have been done before I changed IR canonicalization to
favor this form with:
rL364246
...so if this requires revert/changing, the earlier commit may also
need to modified.

llvm-svn: 364319
2019-06-25 14:46:52 +00:00
Michael Liao f0a665afca [AMDGPU] Null checking on TS to avoid crashing in clang tests.
- `test/Misc/backend-resource-limit-diagnostics.cl` crashes as null
  streamer is used.

llvm-svn: 364318
2019-06-25 14:06:34 +00:00
Pavel Labath 34cac0955d Options: Correctly check for missing arguments
Relying on the value of optind for detecting missing arguments is
unreliable because its value after a failed parse is an implementation
detail. A more correct way to achieve this is to pass ':' at the
beginning of option string, which tells getopt to return ':' for missing
arguments.

For this to work, I also had to add a nullptr at the end of the argv
vector, as some getopt implementations did not work without that. This
is also an implementation detail, as getopt should normally be called
with argc+argc "as to main function" (i.e. null-terminated).

Thanks to Michał Górny for testing this patch out on NetBSD.

llvm-svn: 364317
2019-06-25 14:02:39 +00:00
Matt Arsenault f4e51dd2cd AMDGPU/GlobalISel: Fix broken test
llvm-svn: 364316
2019-06-25 13:57:53 +00:00
Nikolai Kosjar 181f252d53 [clang-tidy] Update documentation for Qt Creator integration.
Subscribers: xazax.hun, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D63763

llvm-svn: 364315
2019-06-25 13:50:09 +00:00
Sanjay Patel 0baacea2c7 [AArch64][x86] add tests for ctpop != 1; NFC
This is the inverted predicate pattern for D63004.

llvm-svn: 364314
2019-06-25 13:37:16 +00:00
Simon Pilgrim aae4b68703 [X86] lowerShuffleAsSpecificZeroOrAnyExtend - add ANY_EXTEND TODO.
lowerShuffleAsSpecificZeroOrAnyExtend should be able to lower to ANY_EXTEND_VECTOR_INREG as well as ZER_EXTEND_VECTOR_INREG.

llvm-svn: 364313
2019-06-25 13:36:53 +00:00
Fangrui Song 807d2f442a [ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D60692
llvm-svn: 364312
2019-06-25 13:28:44 +00:00
Simon Pilgrim 1a18bb6f25 [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support
Add 'lowest' demanded elt -> bitcast fold to all *_EXTEND_VECTOR_INREG cases.

Reapplies rL363856.

llvm-svn: 364311
2019-06-25 13:25:57 +00:00
Whitney Tsang 7c1deeff4a Expand cloneLoopWithPreheader() to support cloning loop nest
Summary: cloneLoopWithPreheader() currently only support innermost loop,
and assert otherwise.
Reviewers: Meinersbur, fhahn, kbarton
Reviewed By: Meinersbur
Subscribers: hiraditya, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63446

llvm-svn: 364310
2019-06-25 13:23:13 +00:00
Matt Arsenault dcd8b72e1a AMDGPU/GlobalISel: Fix duplicated test
Somehow ended up with copies of the same tests in AMDGPU and
AMDGPU/GlobalISel

llvm-svn: 364309
2019-06-25 13:23:08 +00:00
Matt Arsenault d7ffa2a948 AMDGPU: Select G_SEXT/G_ZEXT/G_ANYEXT
llvm-svn: 364308
2019-06-25 13:18:11 +00:00
James Henderson 083d949036 [llvm-objcopy][llvm-strip] Fix help text typo for --allow-broken-links
llvm-svn: 364307
2019-06-25 13:14:18 +00:00
James Henderson b96d9d8bda [docs][llvm-readobj] Improve llvm-readobj documentation
There were a number of issues with the llvm-readobj documentation. The
following points were raised in https://bugs.llvm.org/show_bug.cgi?id=42255,
and have been fixed in this patch:

 1. The description section claimed "The tool and its output is
    primarily designed for use in FileCheck-based tests" which is not
    really the case any more.
 2. The documentation used single-dash long options for option names,
    but references in the help text to other options exclusively used
    double-dashes. Fixed by standardising on double-dashes for all
    long-form options.
 3. The majority of options available and in the help text were not
    present in the documentation. This patch adds them.
 4. Several aliases, both long and short, were missing, e.g. --relocs.

Additionally, this patch improves the documentation by:

 1. Splitting the options into categories based on the file format they
    are specific to.
 2. Updating the Exit Status section to correctly mention that errors
    lead to a non-zero exit code.
 3. Adding a See Also section referencing other similar LLVM tools.
 4. Improving/correcting some of the descriptions of options that did
    not quite match up with what llvm-readobj does.

Reviewed by: peter.smith, MaskRay, mtrent

Differential Revision: https://reviews.llvm.org/D63719

llvm-svn: 364306
2019-06-25 13:12:38 +00:00
Simon Tatham ec18f0f64c [ARM] Re-enable misspelled RUN: lines in fullfp16.s.
rL364293 committed a couple of lines that just said "// RUN llvm-mc ..."
without the all-important ':' after RUN, so those test lines weren't
actually running anything.

llvm-svn: 364305
2019-06-25 13:10:29 +00:00
Matt Arsenault d1dc1f4901 AMDGPU: Make amdgcn.s.get.waveid.in.workgroup inaccessiblememonly
This should probably be readnone, even though the instruction looks
like a load.

llvm-svn: 364304
2019-06-25 13:03:06 +00:00
Simon Pilgrim 36953ce769 [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.

Matches what we already do for ZERO_EXTEND.

Reapplies rL363850 but now with legality checks added at rL364290

llvm-svn: 364303
2019-06-25 12:57:43 +00:00
Sanjay Patel e4ef62291b [SDAG] improve expansion of ctpop+setcc
This should not cause any visible change in output, but it's
more efficient because we were producing non-canonical 'sub x, 1'
and 'setcc ugt x, 0'. As mentioned in the TODO, we should also
be handling the inverse predicate.

llvm-svn: 364302
2019-06-25 12:49:35 +00:00
Simon Pilgrim 42f44b387e Fix frame.s test dir-separator checks
Handle / and \ separators

llvm-svn: 364301
2019-06-25 12:35:38 +00:00
Simon Tatham 287f0403e3 [ARM] Fix buildbot failure due to -Werror.
Including both 'case ARM_AM::uxtw' and 'default' in the getShiftOp
switch caused a buildbot to fail with

error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]
llvm-svn: 364300
2019-06-25 12:23:46 +00:00
Simon Pilgrim 69fc111184 [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG
Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero.

Matches what we already do for SIGN_EXTEND.

Reapplies rL363802 but now with legality checks added at rL364290

llvm-svn: 364299
2019-06-25 12:19:12 +00:00
Sjoerd Meijer 74ec25a197 [ARM] MVE VPT Blocks
A minor iteration on the MVE VPT Block pass to enable more efficient VPT Block
code generation: consecutive VPT predicated statements, predicated on the same
condition, will be placed within the same VPT Block. This essentially is also
an exercise to write some more tests for the next step, which should be more
generic also merging instructions when they are not consecutive.

Differential Revision: https://reviews.llvm.org/D63711

llvm-svn: 364298
2019-06-25 12:04:31 +00:00
Nicolai Haehnle 2710171a15 AMDGPU: Write LDS objects out as global symbols in code generation
Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then expected to resolve
relocations, which are also emitted.

Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.

Some notes:

- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered
  to a constant at compile times, which means some tests can no longer
  be applied.

  The current "solution" is a terrible hack, but the intrinsic isn't
  used by Mesa, so we can keep it for now.

- We no longer know the full LDS size per kernel at compile time, which
  means that we can no longer generate a relevant error message at
  compile time. It would be possible to add a check for the size of
  individual variables, but ultimately the linker will have to perform
  the final check.

Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275

Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin

Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61494

llvm-svn: 364297
2019-06-25 11:52:30 +00:00
Nicolai Haehnle 08e8cb5760 AMDGPU/MC: Add .amdgpu_lds directive
Summary:
The directive defines a symbol as an group/local memory (LDS) symbol.
LDS symbols behave similar to common symbols for the purposes of ELF,
using the processor-specific SHN_AMDGPU_LDS as section index.

It is the linker and/or runtime loader's job to "instantiate" LDS symbols
and resolve relocations that reference them.

It is not possible to initialize LDS memory (not even zero-initialize
as for .bss).

We want to be able to link together objects -- starting with relocatable
objects, but possible expanding to shared objects in the future -- that
access LDS memory in a flexible way.

LDS memory is in an address space that is entirely separate from the
address space that contains the program image (code and normal data),
so having program segments for it doesn't really make sense.

Furthermore, we want to be able to compile multiple kernels in a
compilation unit which have disjoint use of LDS memory. In that case,
we may want to place LDS symbols differently for different kernels
to save memory (LDS memory is very limited and physically private to
each kernel invocation), so we can't simply place LDS symbols in a
.lds section.

Hence this solution where LDS symbols always stay undefined.

Change-Id: I08cbc37a7c0c32f53f7b6123aa0afc91dbc1748f

Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61493

llvm-svn: 364296
2019-06-25 11:51:35 +00:00
Simon Pilgrim b23c942ce4 [VectorLegalizer] ExpandANY_EXTEND_VECTOR_INREG/ExpandZERO_EXTEND_VECTOR_INREG - widen source vector
The *_EXTEND_VECTOR_INREG opcodes were relaxed back around rL346784 to support source vector widths that are smaller than the output - it looks like the legalizers were never updated to account for this.

This patch inserts the smaller source vector into an undef vector of the same width of the result before performing the shuffle+bitcast to correctly handle this.

Part of the yak shaving to solve the crashes from rL364264 and rL364272

llvm-svn: 364295
2019-06-25 11:31:37 +00:00
Simon Tatham 4cf18c2849 [ARM] Explicit lowering of half <-> double conversions.
If an FP_EXTEND or FP_ROUND isel dag node converts directly between
f16 and f32 when the target CPU has no instruction to do it in one go,
it has to be done in two steps instead, going via f32.

Previously, this was done implicitly, because all such CPUs had the
storage-only implementation of f16 (i.e. the only thing you can do
with one at all is to convert it to/from f32). So isel would legalize
the f16 into an f32 as soon as it saw it, by inserting an fp16_to_fp
node (or vice versa), and then the fp_extend would already be f32->f64
rather than f16->f64.

But that technique can't support a target CPU which has full f16
support but _not_ f64, such as some variants of Arm v8.1-M. So now we
provide custom lowering for FP_EXTEND and FP_ROUND, which checks
support for f16 and f64 and decides on the best thing to do given the
combination of flags it gets back.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60692

llvm-svn: 364294
2019-06-25 11:24:50 +00:00
Simon Tatham d9654723ad [ARM] Extra MVE-related testing.
This adds some extra RUN lines to existing test files, to check that
things that worked in previous architecture versions haven't
accidentally stopped working in 8.1-M. Also we add some new tests: a
test of scalar floating point instructions that could be easily
confused with the similar-looking vector ones at assembly time, a test
of basic load/store/move access to the FP registers (which has to work
even in integer-only MVE); and one final check of the really obvious
case where turning off MVE should make sure MVE instructions really
are rejected.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62682

llvm-svn: 364293
2019-06-25 11:24:42 +00:00