Commit Graph

124161 Commits

Author SHA1 Message Date
Fedor Sergeev 1a3dc76186 [InlineCost] cleanup calculations of Cost and Threshold
Summary:
Doing better separation of Cost and Threshold.
Cost counts the abstract complexity of live instructions, while Threshold is an upper bound of complexity that inlining is comfortable to pay.
There are two parts:
     - huge 15K last-call-to-static bonus is no longer subtracted from Cost
       but rather is now added to Threshold.

       That makes much more sense, as the cost of inlining (Cost) is not changed by the fact
       that internal function is called once. It only changes the likelyhood of this inlining
       being profitable (Threshold).

     - bonus for calls proved-to-be-inlinable into callee is no longer subtracted from Cost
       but added to Threshold instead.

While calculations are somewhat different,  overall InlineResult should stay the same since Cost >= Threshold compares the same.

Reviewers: eraman, greened, chandlerc, yrouban, apilipenko
Reviewed By: apilipenko
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60740

llvm-svn: 364422
2019-06-26 13:24:24 +00:00
Roman Lebedev 13889145f0 [X86][Codegen] X86DAGToDAGISel::matchBitExtract(): consistently capture lambdas by value
llvm-svn: 364420
2019-06-26 12:19:52 +00:00
Roman Lebedev fbb2e40d5c [X86] X86DAGToDAGISel::matchBitExtract(): pattern c: truncation awareness
Summary:
The one thing of note here is that the 'bitwidth' constant (32/64) was previously pessimistic.
Given `x & (-1 >> (C - z))`, we were taking `C` to be `bitwidth(x)`, but in reality
we want `(-1 >> (C - z))` pattern to mean "low z bits must be all-ones".
And for that, `C` should be `bitwidth(-1 >> (C - z))`, i.e. of the shift operation itself.

Last pattern D does not seem to exhibit any of these truncation issues.
Although it has the opposite problem - if we extract low bits (no shift) from i64,
and then truncate to i32, then we fail to shrink this 64-bit extraction into 32-bit extraction.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62806

llvm-svn: 364419
2019-06-26 12:19:47 +00:00
Roman Lebedev b0ecc1cc6b [X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awareness
Summary:
(Not so) boringly identical to pattern a (D62786)
Not yet sure how do deal with the last pattern c.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62793

llvm-svn: 364418
2019-06-26 12:19:39 +00:00
Roman Lebedev 8b9a03973a [X86] X86DAGToDAGISel::matchBitExtract(): pattern a: truncation awareness
Summary:
Finally tying up loose ends here.

The problem is quite simple:
If we have pattern `(x >> start) &  (1 << nbits) - 1`,
and then truncate the result, that truncation will be propagated upwards,
into the `and`. And that isn't currently handled.

I'm only fixing pattern `a` here,
the same fix will be needed for patterns `b`/`c` too.

I *think* this isn't missing any extra legality checks,
since we only look past truncations. Similary, i don't think
we can get any other truncation there other than i64->i32.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62786

llvm-svn: 364417
2019-06-26 12:19:11 +00:00
Clement Courbet 2851248fa1 Revert "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."
Breaks sanitizers:
    libFuzzer :: cxxstring.test
    libFuzzer :: memcmp.test
    libFuzzer :: recommended-dictionary.test
    libFuzzer :: strcmp.test
    libFuzzer :: value-profile-mem.test
    libFuzzer :: value-profile-strcmp.test

llvm-svn: 364416
2019-06-26 12:13:13 +00:00
Chen Zheng aa99952896 [HardwareLoops] NFC - move loop with irreducible control flow checking logic to HarewareLoopInfo.
llvm-svn: 364415
2019-06-26 12:02:43 +00:00
Hans Wennborg 6876de90e8 Fix the build after r364401
It was failing with:

/b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:18772:66:
error: call of overloaded 'makeArrayRef(<brace-enclosed initializer list>)' is ambiguous
     scaleShuffleMask<int>(Scale, makeArrayRef<int>({ 0, 2, 1, 3 }), Mask);
                                                                  ^
/b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:18772:66: note: candidates are:
In file included from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/CodeGen/MachineFunction.h:20:0,
                 from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/CodeGen/CallingConvLower.h:19,
                 from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.h:17,
                 from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:14:
/b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/ADT/ArrayRef.h:480:15:
note: llvm::ArrayRef<T> llvm::makeArrayRef(const std::vector<_RealType>&) [with T = int]
   ArrayRef<T> makeArrayRef(const std::vector<T> &Vec) {
               ^
/b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/ADT/ArrayRef.h:485:37:
note: llvm::ArrayRef<T> llvm::makeArrayRef(const llvm::ArrayRef<T>&) [with T = int]
   template <typename T> ArrayRef<T> makeArrayRef(const ArrayRef<T> &Vec) {
                                     ^

llvm-svn: 364414
2019-06-26 11:56:38 +00:00
Clement Courbet 7b3a5f0e6d [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.
This allows later passes (in particular InstCombine) to optimize more
cases.

One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0.

llvm-svn: 364412
2019-06-26 11:50:18 +00:00
Simon Pilgrim c0711af7f9 [X86][AVX] combineExtractSubvector - 'little to big' extract_subvector(bitcast()) support
Ideally this needs to be a generic combine in DAGCombiner::visitEXTRACT_SUBVECTOR but there's some nasty regressions in aarch64 due to neon shuffles not handling bitcasts at all.....

llvm-svn: 364407
2019-06-26 11:21:09 +00:00
Simon Pilgrim a6319e5f83 [DAGCombine] visitEXTRACT_SUBVECTOR - add TODO for extract_subvector(bitcast()) support
We support 'big to little' (e.g. extract_subvector(v16i8 bitcast(v2i64))) but not 'little to big' cases  (e.g. extract_subvector(v2i64 bitcast(v16i8)))

llvm-svn: 364405
2019-06-26 11:17:38 +00:00
Mikhail Maltsev 6dcbb3161e [ARM] Handle fixup_arm_pcrel_9 correctly on big-endian targets
Summary:
The getFixupKindContainerSizeBytes function returns the size of the
instruction containing a given fixup. Currently fixup_arm_pcrel_9 is
not handled in this function, this causes an assertion failure in
the debug build and incorrect codegen in the release build.

This patch fixes the problem.

Reviewers: ostannard, simon_tatham

Reviewed By: ostannard

Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63778

llvm-svn: 364404
2019-06-26 10:48:40 +00:00
Lewis Revill cf74881329 [RISCV] Add pseudo instruction for calls with explicit register
This patch adds the PseudoCALLReg instruction which allows using an
explicit register operand as the destination for the return address.

GCC can successfully parse this form of the call instruction, which
would be used for calls to functions which do not use ra as the return
address register, such as the __riscv_save libcalls. This patch forms
the first part of an implementation of -msave-restore for RISC-V.

Differential Revision: https://reviews.llvm.org/D62685

llvm-svn: 364403
2019-06-26 10:35:58 +00:00
Simon Pilgrim 3845a4f849 [X86][AVX] truncateVectorWithPACK - avoid bitcasted shuffles
truncateVectorWithPACK is often used in conjunction with ComputeNumSignBits which struggles when peeking through bitcasts.

This fix tries to avoid bitcast(shuffle(bitcast())) patterns in the 256-bit 64-bit sublane shuffles so we can still see through at least until lowering when the shuffles will need to be bitcasted to widen the shuffle type.

llvm-svn: 364401
2019-06-26 09:50:11 +00:00
Florian Hahn 4c11b5268c [LoopUnroll] Add support for loops with exiting headers and uncond latches.
This patch generalizes the UnrollLoop utility to support loops that exit
from the header instead of the latch. Usually, LoopRotate would take care
of must of those cases, but in some cases (e.g. -Oz), LoopRotate does
not kick in.

Codesize impact looks relatively neutral on ARM64 with -Oz + LTO.

Program                                         master     patch     diff
 External/S.../CFP2006/447.dealII/447.dealII   629060.00  627676.00  -0.2%
 External/SPEC/CINT2000/176.gcc/176.gcc        1245916.00 1244932.00 -0.1%
 MultiSourc...Prolangs-C/simulator/simulator   86100.00   86156.00    0.1%
 MultiSourc...arks/Rodinia/backprop/backprop   66212.00   66252.00    0.1%
 MultiSourc...chmarks/Prolangs-C++/life/life   67276.00   67312.00    0.1%
 MultiSourc...s/Prolangs-C/compiler/compiler   69824.00   69788.00   -0.1%
 MultiSourc...Prolangs-C/assembler/assembler   86672.00   86696.00    0.0%

Reviewers: efriedma, vsk, paquette

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D61962

llvm-svn: 364398
2019-06-26 09:16:57 +00:00
Chen Zheng 46ce9e4fff [HardwareLoops] NFC - move loop with irreducible control flow checking logic to isHardwareLoopProfitable()
llvm-svn: 364397
2019-06-26 09:12:52 +00:00
Clement Courbet be98e0ab78 [ExpandMemCmp] Honor prefer-vector-width.
Reviewers: gchatelet, echristo, spatel, atdt

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63769

llvm-svn: 364384
2019-06-26 07:06:49 +00:00
Kai Luo d6a8bc7a12 [PowerPC] Fixed missing change flag of emitRLDICWhenLoweringJumpTables
PPCMIPeephole::emitRLDICWhenLoweringJumpTables should return a bool
value to indicate optimization is conducted or not.

Differential Revision: https://reviews.llvm.org/D63801

llvm-svn: 364383
2019-06-26 05:25:16 +00:00
QingShan Zhang e0e7d4c366 Teach the DAGCombine to fold this pattern(c1 and c2 is constant).
// fold (sext (select cond, c1, c2)) -> (select cond, sext c1, sext c2)
// fold (zext (select cond, c1, c2)) -> (select cond, zext c1, zext c2)
// fold (aext (select cond, c1, c2)) -> (select cond, sext c1, sext c2)
Sign extend the operands if it is any_extend, to keep the signess of the operands that, the other combine rule would apply. The any_extend is handled as zero extend for constants. i.e.

t1: i8 = select t0, Constant:i8<-1>, Constant:i8<0>
t2: i64 = any_extend t1
 -->
t3: i64 = select t0, Constant:i64<-1>, Constant:i64<0>
 -->
t4: i64 = sign_extend_inreg t3

Differential Revision: https://reviews.llvm.org/D63318

llvm-svn: 364382
2019-06-26 05:12:53 +00:00
Fangrui Song 6a4c68e187 [ARM] Fix -Wimplicit-fallthrough after D60709/r364331
llvm-svn: 364376
2019-06-26 02:34:10 +00:00
Nemanja Ivanovic 8265e8ff36 [PowerPC] Mark FCOPYSIGN legal for FP vectors
This was just an omission in the back end. We have had the instructions for both
single and double precision for a few HW generations, but never got around to
legalizing these.

Differential revision: https://reviews.llvm.org/D63634

llvm-svn: 364373
2019-06-26 01:48:57 +00:00
Kai Luo 174b4ff781 [PowerPC][NFC] Move peephole optimization of RLDICR into a method.
llvm-svn: 364372
2019-06-26 01:34:37 +00:00
Saleem Abdulrasool 06036dbc6e MC: correct the emission of weak aliases in COFF
The weak alias should have the characteristics set to
`IMAGE_EXTERN_WEAK_SEARCH_ALIAS` to indicate that the weak external here
is a symbol alias and that the symbol is aliased to a locally defined
symbol.  We were previously setting the characteristics to
`IMAGE_EXTERN_WEAK_SEARCH_LIBRARY` which indicates that the symbol
should be looked for in the libraries.

llvm-svn: 364370
2019-06-26 01:09:52 +00:00
Keno Fischer cadcb9eb61 [WebAssembly] Fix list of relocations with addends in lld
Summary:
The list of relocations with addend in lld was missing `R_WASM_MEMORY_ADDR_REL_SLEB`,
causing `wasm-ld` to generate corrupted output. This fixes that problem and while
we're at it pulls the list of such relocations into the Wasm.h header, to avoid
duplicating it in multiple places.

Reviewers: sbc100
Differential Revision: https://reviews.llvm.org/D63696

llvm-svn: 364367
2019-06-26 00:52:42 +00:00
Heejin Ahn 65d8d6357b [WebAssembly] Remove catch_all from AsmParser
Summary:
`catch_all` is from the first version of EH proposal and now has been
removed. There were no tests covering this, and thus no tests to remove
or fix.

Reviewers: aardappel

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63737

llvm-svn: 364360
2019-06-25 23:04:12 +00:00
Reid Kleckner a3eeca333b Dump what value failed byval attribute verification
This verifier check is failing for us while doing ThinLTO on Chrome for
x86, see https://crbug.com/978218, and this helps to debug the problem.

llvm-svn: 364357
2019-06-25 22:33:32 +00:00
Jinsong Ji fee855b5bc [MachinePipeliner] Fix risky iterator usage R++, --R
When we calculate MII, we use two loops, one with iterator R++ to
check whether we can reserve the resource, then --R to move back
the iterator to do reservation.

This is risky, as R++, --R may not point to the same element at all.
The can cause wrong MII.

Differential Revision: https://reviews.llvm.org/D63536

llvm-svn: 364353
2019-06-25 21:50:56 +00:00
Matt Arsenault 8fcc70f141 Don't look for the TargetFrameLowering in the implementation
The same oddity was apparently copy-pasted between multiple targets.

llvm-svn: 364349
2019-06-25 20:53:35 +00:00
Huihui Zhang b90cb57b63 [InstCombine] Simplify icmp ult/uge (shl %x, C2), C1 iff C1 is power of two -> icmp eq/ne (and %x, (lshr -C1, C2)), 0.
Simplify 'shl' inequality test into 'and' equality test.

This pattern happens in the middle-end while simplifying bitfield access,
Exposed in https://reviews.llvm.org/D63505

https://rise4fun.com/Alive/6uz

Reviewers: lebedev.ri, efriedma

Reviewed By: lebedev.ri

Subscribers: spatel, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63675

llvm-svn: 364348
2019-06-25 20:44:52 +00:00
Philip Reames c42a357178 [LFTR] Adjust debug output to include extensions (if any)
llvm-svn: 364346
2019-06-25 20:14:08 +00:00
Diego Novillo 688afeb884 Update phis in AMDGPUUnifyDivergentExitNodes
Original patch https://reviews.llvm.org/D63659 from
Steven Perron <stevenperron@google.com>

The pass AMDGPUUnifyDivergentExitNodes does not update the phi nodes in
the successors of blocks that is splits. This is fixed by calling
BasicBlock::splitBasicBlock to split the block instead of doing it
manually. This does extra work because a new conditional branch is
created in BB which is immediately replaced, but I think the simplicity
is worth it. It also helps make the code more future proof in case other
things need to be updated.

llvm-svn: 364342
2019-06-25 18:55:16 +00:00
Sanjay Patel fcfa056ceb [InstCombine] reduce checks for power-of-2-or-zero using ctpop
This follows up the transform from rL363956 to use the ctpop intrinsic when checking for power-of-2-or-zero.

This is matching the isPowerOf2() patterns used in PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314

But there's at least 1 instcombine follow-up needed to match the alternate form:

(v & (v - 1)) == 0;

We should have all of the backend expansions handled with:
rL364319
(x86-specific changes still needed for optimal code based on subtarget)

And the larger patterns to exclude zero as a power-of-2 are joining with this change after:
rL364153 ( D63660 )
rL364246

Differential Revision: https://reviews.llvm.org/D63777

llvm-svn: 364341
2019-06-25 18:51:44 +00:00
Stanislav Mekhanoshin 4be636ebb3 [AMDGPU] Removed dead SIMachineFunctionInfo::getWorkItemIDVGPR()
Differential Revision: https://reviews.llvm.org/D63780

llvm-svn: 364339
2019-06-25 18:33:53 +00:00
Craig Topper 4577b8c17c [X86] Remove isel patterns that look for (vzext_movl (scalar_to_vector (load)))
I believe these all get canonicalized to vzext_movl. The only case where that wasn't true was when the load was loadi32 and the load was an extload aligned to 32 bits. But that was fixed in r364207.

Differential Revision: https://reviews.llvm.org/D63701

llvm-svn: 364337
2019-06-25 17:31:52 +00:00
Philip Reames be0dedb2e1 [Peephole] Allow folding loads into instructions w/multiple uses (such as test64rr)
Peephole opt has a one use limitation which appears to be accidental. The function being used was incorrectly documented as returning whether the def had one *user*, but instead returned true only when there was one *use*. Add a corresponding hasOneNonDbgUser helper, and adjust peephole-opt to use the appropriate one.

All of the actual folding code handles multiple uses within a single instruction. That codepath is well exercised through instruction selection.

Differential Revision: https://reviews.llvm.org/D63656

llvm-svn: 364336
2019-06-25 17:29:18 +00:00
Craig Topper 14ea14ae85 [X86] Add a DAG combine to turn vzmovl+load into vzload if the load isn't volatile. Remove isel patterns for vzmovl+load
We currently have some isel patterns for treating vzmovl+load the same as vzload, but that shrinks the load which we shouldn't do if the load is volatile.

Rather than adding isel checks for volatile. This patch removes the patterns and teachs DAG combine to merge them into vzload when its legal to do so.

Differential Revision: https://reviews.llvm.org/D63665

llvm-svn: 364333
2019-06-25 17:08:26 +00:00
Simon Tatham e8de8ba6a6 [ARM] Support inline assembler constraints for MVE.
"To" selects an odd-numbered GPR, and "Te" an even one. There are some
8.1-M instructions that have one too few bits in their register fields
and require registers of particular parity, without necessarily using
a consecutive even/odd pair.

Also, the constraint letter "t" should select an MVE q-register, when
MVE is present. This didn't need any source changes, but some extra
tests have been added.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60709

llvm-svn: 364331
2019-06-25 16:49:32 +00:00
Ayke van Laethem 88139c143c [AVR] Adjust to Register class change
A refactor in r364191 changed register types from an unsigned int to the
llvm:Register class. Adjust the AVR backend to this change.

This fixes build errors when building with the experimental AVR backend
enabled.

Differential Revision: https://reviews.llvm.org/D63776

llvm-svn: 364330
2019-06-25 16:49:22 +00:00
Simon Tatham a4b415a683 [ARM] Code-generation infrastructure for MVE.
This provides the low-level support to start using MVE vector types in
LLVM IR, loading and storing them, passing them to __asm__ statements
containing hand-written MVE vector instructions, and *if* you have the
hard-float ABI turned on, using them as function parameters.

(In the soft-float ABI, vector types are passed in integer registers,
and combining all those 32-bit integers into a q-reg requires support
for selection DAG nodes like insert_vector_elt and build_vector which
aren't implemented yet for MVE. In fact I've also had to add
`arm_aapcs_vfpcc` to a couple of existing tests to avoid that
problem.)

Specifically, this commit adds support for:

 * spills, reloads and register moves for MVE vector registers

 * ditto for the VPT predication mask that lives in VPR.P0

 * make all the MVE vector types legal in ISel, and provide selection
   DAG patterns for BITCAST, LOAD and STORE

 * make loads and stores of scalar FP types conditional on
   `hasFPRegs()` rather than `hasVFP2Base()`. As a result a few
   existing tests needed their llc command lines updating to use
   `-mattr=-fpregs` as their method of turning off all hardware FP
   support.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60708

llvm-svn: 364329
2019-06-25 16:48:46 +00:00
Simon Pilgrim 9762b26032 [DAGCombine] combineRepeatedFPDivisors - recognize -1.0 / X as a reciprocal
Fixes issue identified by @nemanjai (Nemanja Ivanovic) in D62963 / rL363040 - infinite loop due to GetNegatedExpression fighting combineRepeatedFPDivisors resulting in fneg(fdiv(x,splat)) -> fneg(fmul(x,1.0/splat)) -> fmul(x,-1.0/splat) -> fmul(x,(-1.0 * 1.0)/splat) ......

llvm-svn: 364326
2019-06-25 16:00:16 +00:00
Fangrui Song 96a192ea53 [PPC32] Support PLT calls for -msecure-plt -fpic
Summary:
In Secure PLT ABI, -fpic is similar to -fPIC. The differences are that:

* -fpic stores the address of _GLOBAL_OFFSET_TABLE_ in r30, while -fPIC stores .got2+0x8000.
* -fpic uses an addend of 0 for R_PPC_PLTREL24, while -fPIC uses 0x8000.

Reviewers: hfinkel, jhibbits, joerg, nemanjai, spetrovic

Reviewed By: jhibbits

Subscribers: adalava, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63563

llvm-svn: 364324
2019-06-25 15:56:32 +00:00
Sam Parker bcf0eb7a64 [ARM] Fix for DLS/LE CodeGen
The expensive buildbots highlighted the mir tests were broken, which
I've now updated and added --verify-machineinstrs to them. This also
uncovered a couple of bugs in the backend pass, so these have also
been fixed.

llvm-svn: 364323
2019-06-25 15:11:17 +00:00
Sanjay Patel 685c5cbc65 [SDAG] expand ctpop != 1
Change the generic ctpop expansion to more efficiently handle a
check for not-a-power-of-two value:
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)

This is the inverted predicate sibling pattern that was added with:
D63004

This should have been done before I changed IR canonicalization to
favor this form with:
rL364246
...so if this requires revert/changing, the earlier commit may also
need to modified.

llvm-svn: 364319
2019-06-25 14:46:52 +00:00
Michael Liao f0a665afca [AMDGPU] Null checking on TS to avoid crashing in clang tests.
- `test/Misc/backend-resource-limit-diagnostics.cl` crashes as null
  streamer is used.

llvm-svn: 364318
2019-06-25 14:06:34 +00:00
Simon Pilgrim aae4b68703 [X86] lowerShuffleAsSpecificZeroOrAnyExtend - add ANY_EXTEND TODO.
lowerShuffleAsSpecificZeroOrAnyExtend should be able to lower to ANY_EXTEND_VECTOR_INREG as well as ZER_EXTEND_VECTOR_INREG.

llvm-svn: 364313
2019-06-25 13:36:53 +00:00
Fangrui Song 807d2f442a [ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D60692
llvm-svn: 364312
2019-06-25 13:28:44 +00:00
Simon Pilgrim 1a18bb6f25 [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support
Add 'lowest' demanded elt -> bitcast fold to all *_EXTEND_VECTOR_INREG cases.

Reapplies rL363856.

llvm-svn: 364311
2019-06-25 13:25:57 +00:00
Whitney Tsang 7c1deeff4a Expand cloneLoopWithPreheader() to support cloning loop nest
Summary: cloneLoopWithPreheader() currently only support innermost loop,
and assert otherwise.
Reviewers: Meinersbur, fhahn, kbarton
Reviewed By: Meinersbur
Subscribers: hiraditya, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63446

llvm-svn: 364310
2019-06-25 13:23:13 +00:00
Matt Arsenault d7ffa2a948 AMDGPU: Select G_SEXT/G_ZEXT/G_ANYEXT
llvm-svn: 364308
2019-06-25 13:18:11 +00:00
Simon Pilgrim 36953ce769 [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.

Matches what we already do for ZERO_EXTEND.

Reapplies rL363850 but now with legality checks added at rL364290

llvm-svn: 364303
2019-06-25 12:57:43 +00:00
Sanjay Patel e4ef62291b [SDAG] improve expansion of ctpop+setcc
This should not cause any visible change in output, but it's
more efficient because we were producing non-canonical 'sub x, 1'
and 'setcc ugt x, 0'. As mentioned in the TODO, we should also
be handling the inverse predicate.

llvm-svn: 364302
2019-06-25 12:49:35 +00:00
Simon Tatham 287f0403e3 [ARM] Fix buildbot failure due to -Werror.
Including both 'case ARM_AM::uxtw' and 'default' in the getShiftOp
switch caused a buildbot to fail with

error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]
llvm-svn: 364300
2019-06-25 12:23:46 +00:00
Simon Pilgrim 69fc111184 [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG
Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero.

Matches what we already do for SIGN_EXTEND.

Reapplies rL363802 but now with legality checks added at rL364290

llvm-svn: 364299
2019-06-25 12:19:12 +00:00
Sjoerd Meijer 74ec25a197 [ARM] MVE VPT Blocks
A minor iteration on the MVE VPT Block pass to enable more efficient VPT Block
code generation: consecutive VPT predicated statements, predicated on the same
condition, will be placed within the same VPT Block. This essentially is also
an exercise to write some more tests for the next step, which should be more
generic also merging instructions when they are not consecutive.

Differential Revision: https://reviews.llvm.org/D63711

llvm-svn: 364298
2019-06-25 12:04:31 +00:00
Nicolai Haehnle 2710171a15 AMDGPU: Write LDS objects out as global symbols in code generation
Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then expected to resolve
relocations, which are also emitted.

Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.

Some notes:

- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered
  to a constant at compile times, which means some tests can no longer
  be applied.

  The current "solution" is a terrible hack, but the intrinsic isn't
  used by Mesa, so we can keep it for now.

- We no longer know the full LDS size per kernel at compile time, which
  means that we can no longer generate a relevant error message at
  compile time. It would be possible to add a check for the size of
  individual variables, but ultimately the linker will have to perform
  the final check.

Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275

Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin

Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61494

llvm-svn: 364297
2019-06-25 11:52:30 +00:00
Nicolai Haehnle 08e8cb5760 AMDGPU/MC: Add .amdgpu_lds directive
Summary:
The directive defines a symbol as an group/local memory (LDS) symbol.
LDS symbols behave similar to common symbols for the purposes of ELF,
using the processor-specific SHN_AMDGPU_LDS as section index.

It is the linker and/or runtime loader's job to "instantiate" LDS symbols
and resolve relocations that reference them.

It is not possible to initialize LDS memory (not even zero-initialize
as for .bss).

We want to be able to link together objects -- starting with relocatable
objects, but possible expanding to shared objects in the future -- that
access LDS memory in a flexible way.

LDS memory is in an address space that is entirely separate from the
address space that contains the program image (code and normal data),
so having program segments for it doesn't really make sense.

Furthermore, we want to be able to compile multiple kernels in a
compilation unit which have disjoint use of LDS memory. In that case,
we may want to place LDS symbols differently for different kernels
to save memory (LDS memory is very limited and physically private to
each kernel invocation), so we can't simply place LDS symbols in a
.lds section.

Hence this solution where LDS symbols always stay undefined.

Change-Id: I08cbc37a7c0c32f53f7b6123aa0afc91dbc1748f

Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61493

llvm-svn: 364296
2019-06-25 11:51:35 +00:00
Simon Pilgrim b23c942ce4 [VectorLegalizer] ExpandANY_EXTEND_VECTOR_INREG/ExpandZERO_EXTEND_VECTOR_INREG - widen source vector
The *_EXTEND_VECTOR_INREG opcodes were relaxed back around rL346784 to support source vector widths that are smaller than the output - it looks like the legalizers were never updated to account for this.

This patch inserts the smaller source vector into an undef vector of the same width of the result before performing the shuffle+bitcast to correctly handle this.

Part of the yak shaving to solve the crashes from rL364264 and rL364272

llvm-svn: 364295
2019-06-25 11:31:37 +00:00
Simon Tatham 4cf18c2849 [ARM] Explicit lowering of half <-> double conversions.
If an FP_EXTEND or FP_ROUND isel dag node converts directly between
f16 and f32 when the target CPU has no instruction to do it in one go,
it has to be done in two steps instead, going via f32.

Previously, this was done implicitly, because all such CPUs had the
storage-only implementation of f16 (i.e. the only thing you can do
with one at all is to convert it to/from f32). So isel would legalize
the f16 into an f32 as soon as it saw it, by inserting an fp16_to_fp
node (or vice versa), and then the fp_extend would already be f32->f64
rather than f16->f64.

But that technique can't support a target CPU which has full f16
support but _not_ f64, such as some variants of Arm v8.1-M. So now we
provide custom lowering for FP_EXTEND and FP_ROUND, which checks
support for f16 and f64 and decides on the best thing to do given the
combination of flags it gets back.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60692

llvm-svn: 364294
2019-06-25 11:24:50 +00:00
Simon Tatham 86b7a1e660 [ARM] Add remaining miscellaneous MVE instructions.
This final batch includes the tail-predicated versions of the
low-overhead loop instructions (LETP); the VPSEL instruction to select
between two vector registers based on the predicate mask without
having to open a VPT block; and VPNOT which complements the predicate
mask in place.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62681

llvm-svn: 364292
2019-06-25 11:24:33 +00:00
Simon Tatham e6824160dd [ARM] Add MVE vector load/store instructions.
This adds the rest of the vector memory access instructions. It
includes contiguous loads/stores, with an ordinary addressing mode
such as [r0,#offset] (plus writeback variants); gather loads and
scatter stores with a scalar base address register and a vector of
offsets from it (written [r0,q1] or similar); and gather/scatters with
a vector of base addresses (written [q0,#offset], again with
writeback). Additionally, some of the loads can widen each loaded
value into a larger vector lane, and the corresponding stores narrow
them again.

To implement these, we also have to add the addressing modes they
need. Also, in AsmParser, the `isMem` query function now has
subqueries `isGPRMem` and `isMVEMem`, according to which kind of base
register is used by a given memory access operand.

I've also had to add an extra check in `checkTargetMatchPredicate` in
the AsmParser, without which our last-minute check of `rGPR` register
operands against SP and PC was failing an assertion because Tablegen
had inserted an immediate 0 in place of one of a pair of tied register
operands. (This matches the way the corresponding check for `MCK_rGPR`
in `validateTargetOperandClass` is guarded.) Apparently the MVE load
instructions were the first to have ever triggered this assertion, but
I think only because they were the first to have a combination of the
usual Arm pre/post writeback system and the `rGPR` class in particular.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62680

llvm-svn: 364291
2019-06-25 11:24:18 +00:00
Simon Pilgrim 49b3778e32 [TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ZERO/ANY_EXTEND
As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations || isOperationLegal cases.

We'll improve X86 legality in future commits.

llvm-svn: 364290
2019-06-25 10:51:15 +00:00
Nemanja Ivanovic 47b7d13459 [PowerPC] Emit XXSEL for vec_sel and code that has the same pattern
As pointed out in https://bugs.llvm.org/show_bug.cgi?id=41777
we do not emit a vector select even when the pretty much asks for one.
This patch changes that.

Differential revision: https://reviews.llvm.org/D61658

llvm-svn: 364289
2019-06-25 10:46:13 +00:00
Sam Parker a6fd919cb3 [ARM] DLS/LE low-overhead loop code generation
Introduce three pseudo instructions to be used during DAG ISel to
represent v8.1-m low-overhead loops. One maps to set_loop_iterations
while loop_decrement_reg is lowered to two, so that we can separate
the decrement and branching operations. The pseudo instructions are
expanded pre-emission, where we can still decide whether we actually
want to generate a low-overhead loop, in a new pass:
ARMLowOverheadLoops. The pass currently bails, reverting to an sub,
icmp and br, in the cases where a call or stack spill/restore happens
between the decrement and branching instructions, or if the loop is
too large.

Differential Revision: https://reviews.llvm.org/D63476

llvm-svn: 364288
2019-06-25 10:45:51 +00:00
Roman Lebedev cdd43eac4f [Codegen] TargetLowering::SimplifySetCC(): omit urem when possible
Summary:
This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll`
The missing fold, at least partially, looks trivial:
https://rise4fun.com/Alive/Zsln
i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two,
and the `urem` is of something that may at most have a single bit set (or no bits set at all),
the `urem` is not needed.

Reviewers: RKSimon, craig.topper, xbolva00, spatel

Reviewed By: xbolva00, spatel

Subscribers: xbolva00, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63390

llvm-svn: 364286
2019-06-25 10:01:42 +00:00
Clement Courbet 3bc5ad551a [ExpandMemCmp] Move all options to TargetTransformInfo.
Split off from D60318.

llvm-svn: 364281
2019-06-25 08:04:13 +00:00
Craig Topper 079924b0b7 Revert r363802, r363850, and r363856 "[TargetLowering] SimplifyDemandedBits..."
This reverts the following patches.
"[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support"

We can end up with an any_extend_vector_inreg with a 256 bit result type
and a 128 bit result type. This is allowed by the ISD opcode, but the
generic operation legalizer is only able to expand cases where the
total vector width is the same.

The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg.
The SimplifyDemandedBits changes are allowing those nodes to become
aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom
handling and never lets them get to the generic legalizer. We need to do the same
for aext_vec_inreg.

llvm-svn: 364264
2019-06-25 01:32:42 +00:00
Matt Arsenault 25bc27965a AMDGPU/GlobalISel: Fix regbankselect for amdgcn.class
llvm-svn: 364262
2019-06-25 01:07:22 +00:00
Huihui Zhang 4626613ffe [InstCombine] Fold icmp eq/ne (and %x, C), 0 iff (-C) is power of two -> %x u</u>= (-C) earlier.
Summary:
To generate simplified IR, make sure fold
  (X & ~C) ==/!= 0 --> X u</u>= C+1

is scheduled before fold
  ((X << Y) & C) == 0 -> (X & (C >> Y)) == 0.

https://rise4fun.com/Alive/7ZN

Reviewers: lebedev.ri, efriedma, spatel, craig.topper

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63505

llvm-svn: 364255
2019-06-25 00:09:10 +00:00
David Blaikie f895e1bded DataExtractor: use decodeSLEB128 to implement getSLEB128
Should've been NFC, but turns out DataExtractor had better test coverage
for decoding SLEB128 than the decodeSLEB128 did - revealing a couple of
bugs (one in the error handling, another in sign extension). So fixed
those to get the DataExtractor tests passing again.

llvm-svn: 364253
2019-06-24 23:45:18 +00:00
Sanjay Patel 2675b0c8ab [InstCombine] squash is-not-power-of-2 using ctpop
This is the Demorgan'd 'not' of the pattern handled in:
D63660 / rL364153

This is another intermediate IR step towards solving PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314

We can test if a value is not a power-of-2 using ctpop(X) > 1,
so combining that with an is-zero check of the input is the
same as testing if not exactly 1 bit is set:

(X == 0) || (ctpop(X) u> 1) --> ctpop(X) != 1

llvm-svn: 364246
2019-06-24 22:35:26 +00:00
Vasileios Porpodas 3081f78776 [SLP] NFC: Fixed typo in comment
llvm-svn: 364237
2019-06-24 21:40:48 +00:00
Matt Arsenault 8025842599 InstCombine: Preserve nuw when reassociating nuw ops [3/3]
Alive says this is OK.

llvm-svn: 364235
2019-06-24 21:37:03 +00:00
Matt Arsenault 5d82ecd5d9 InstCombine: Preserve nuw when reassociating nuw ops [2/3]
Alive says this is OK.

llvm-svn: 364234
2019-06-24 21:37:02 +00:00
Matt Arsenault 5a89ba7343 InstCombine: Preserve nuw when reassociating nuw ops [1/3]
Alive says this is OK.

llvm-svn: 364233
2019-06-24 21:36:59 +00:00
David Blaikie 8242f35d50 NFC: DataExtractor: use decodeULEB128 to implement getULEB128
llvm-svn: 364230
2019-06-24 20:43:36 +00:00
Nikita Popov f1ffc4305d [CVP] Reenable nowrap flag inference
Inference of nowrap flags in CVP has been disabled, because it
triggered a bug in LFTR (https://bugs.llvm.org/show_bug.cgi?id=31181).
This issue has been fixed in D60935, so we should be able to reenable
nowrap flag inference now.

Differential Revision: https://reviews.llvm.org/D62776

llvm-svn: 364228
2019-06-24 20:13:13 +00:00
Peter Collingbourne 9c8282a9b3 llvm-symbolizer: Add a FRAME command.
This command prints a description of the referenced function's stack frame.
For each formal parameter and local variable, the tool prints:

- function name
- variable name
- file/line of declaration
- FP-relative variable location (if available)
- size in bytes
- HWASAN tag offset

This information will be used by the HWASAN runtime to identify local
variables in UAR reports.

Differential Revision: https://reviews.llvm.org/D63468

llvm-svn: 364225
2019-06-24 20:03:23 +00:00
Roland Froese ea08248b2b [CodeGen] Add missing vector type legalization for ctlz_zero_undef
Widen vector result type for ctlz_zero_undef and cttz_zero_undef the same as
ctlz and cttz.

Differential Revision: https://reviews.llvm.org/D63463

llvm-svn: 364221
2019-06-24 19:27:07 +00:00
Cameron McInally fe3f15cf90 [SLP] Support unary FNeg vectorization
Differential Revision: https://reviews.llvm.org/D63609

llvm-svn: 364219
2019-06-24 19:24:23 +00:00
Matt Arsenault dbb6c03175 AMDGPU/GlobalISel: Select G_TRUNC
llvm-svn: 364215
2019-06-24 18:02:18 +00:00
Matt Arsenault 14d0b646b7 AMDGPU/GlobalISel: RegBankSelect for amdgcn.class
llvm-svn: 364214
2019-06-24 18:00:47 +00:00
Matt Arsenault 8fcd5ade3e AMDGPU/GlobalISel: Split VALU s64 G_ZEXT/G_SEXT in RegBankSelect
Scalar extends to s64 can use S_BFE_{I64|U64}, but vector extends need
to extend to the 32-bit half, and then to 64.

I'm not sure what the line should be between what RegBankSelect
handles, and what instruction select does, but for now I'm erring on
the side of RegBankSelect for future post-RBS combines.

llvm-svn: 364212
2019-06-24 17:54:12 +00:00
Tim Renouf d2fdb956e0 [AMDGPU] Allow any value in unused src0 field in v_nop
Summary:
The LLVM disassembler assumes that the unused src0 operand of v_nop is
zero. Other tools can put another value in that field, which is still
valid. This commit fixes the LLVM disassembler to recognize such an
encoding as v_nop, in the same way as we already do for s_getpc.

Differential Revision: https://reviews.llvm.org/D63724

Change-Id: Iaf0363eae26ff92fc4ebc716216476adbff37a6f
llvm-svn: 364208
2019-06-24 17:35:20 +00:00
Craig Topper 7fccb2ac5e [X86] Don't a vzext_movl in LowerBuildVectorv16i8/LowerBuildVectorv8i16 if there are no zeroes in the vector we're building.
In LowerBuildVectorv16i8 we took care to use an any_extend if the first pair is in the lower 16-bits of the vector and no elements are 0. So bits [31:16] will be undefined. But we still emitted a vzext_movl to ensure that bits [127:32] are 0. If we don't need any zeroes we should be consistent and make all of 127:16 undefined.

In LowerBuildVectorv8i16 we can just delete the vzext_movl code because we only use the scalar_to_vector when there are no zeroes. So the vzext_movl is always unnecessary.

Found while investigating whether (vzext_movl (scalar_to_vector (loadi32)) patterns are necessary. At least one of the cases where they were necessary was where the loadi32 matched 32-bit aligned 16-bit extload. Seemed weird that we required vzext_movl for that case.

Differential Revision: https://reviews.llvm.org/D63700

llvm-svn: 364207
2019-06-24 17:28:41 +00:00
Craig Topper 033774e144 [X86] Cleanups and safety checks around the isFNEG
This patch does a few things to start cleaning up the isFNEG function.

-Remove the Op0/Op1 peekThroughBitcast calls that seem unnecessary. getTargetConstantBitsFromNode has its own peekThroughBitcast inside. And we have a separate peekThroughBitcast on the return value.
-Add a check of the scalar size after the first peekThroughBitcast to ensure we haven't changed the element size and just did something like f32->i32 or f64->i64.
-Remove an unnecessary check that Op1's type is floating point after the peekThroughBitcast. We're just going to look for a bit pattern from a constant. We don't care about its type.
-Add VT checks on several places that consume the return value of isFNEG. Due to the peekThroughBitcasts inside, the type of the return value isn't guaranteed. So its not safe to use it to build other nodes without ensuring the type matches the type being used to build the node. We might be able to replace these checks with bitcasts instead, but I don't have a test case so a bail out check seemed better for now.

Differential Revision: https://reviews.llvm.org/D63683

llvm-svn: 364206
2019-06-24 17:28:26 +00:00
Matt Arsenault f8a841b88e AMDGPU/GlobalISel: Fix selecting G_IMPLICIT_DEF for s1
Try to fail for scc, since I don't think that should ever be produced.

llvm-svn: 364199
2019-06-24 16:24:03 +00:00
Matt Arsenault ae171f1e9f Hexagon: Rename another copy of Register class
For some reason clang is happy with the conflict, but MSVC is not.

llvm-svn: 364196
2019-06-24 16:16:19 +00:00
Matt Arsenault f8f1ace5bb ARC: Fix -Wimplicit-fallthrough
llvm-svn: 364195
2019-06-24 16:16:16 +00:00
Matt Arsenault faeaedf8e9 GlobalISel: Remove unsigned variant of SrcOp
Force using Register.

One downside is the generated register enums require explicit
conversion.

llvm-svn: 364194
2019-06-24 16:16:12 +00:00
Matt Arsenault e3a676e9ad CodeGen: Introduce a class for registers
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().

llvm-svn: 364191
2019-06-24 15:50:29 +00:00
Bjorn Pettersson 3260ef16bb [AMDGPU] Remove unused variable AllSGPRSpilledToVGPRs. NFC
Summary:
Removing the unused variable AllSGPRSpilledToVGPRs in
SIFrameLowering::processFunctionBeforeFrameFinalized
to avoid
  error: variable 'AllSGPRSpilledToVGPRs' set but not used
  [-Werror=unused-but-set-variable]

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63721

llvm-svn: 364190
2019-06-24 15:50:18 +00:00
Matt Arsenault 2bc35b7938 Hexagon: Rename Register class
This avoids a naming conflict in a future patch.

llvm-svn: 364188
2019-06-24 15:27:29 +00:00
Sanjay Patel 89efefb170 [InstCombine] reduce funnel-shift i16 X, X, 8 to bswap X
Prefer the more exact intrinsic to remove a use of the input value
and possibly make further transforms easier (we will still need
to match patterns with funnel-shift of wider types as pieces of
bswap, especially if we want to canonicalize to funnel-shift with
constant shift amount). Discussed in D46760.

llvm-svn: 364187
2019-06-24 15:20:49 +00:00
Matt Arsenault 5dbd9228c4 AMDGPU/GlobalISel: Fix RegBankSelect for s1 sext/zext/anyext
This needs different handling if the source is known to be a valid
condition or not. Handle turning it into shifts or a select during
regbankselect.

llvm-svn: 364186
2019-06-24 14:53:58 +00:00
Matt Arsenault 60957cb74c AMDGPU: Fold frame index into MUBUF
This matters for byval uses outside of the entry block, which appear
as copies.

Previously, the only folding done was during selection, which could
not see the underlying frame index. For any uses outside the entry
block, the frame index was materialized in the entry block relative to
the global scratch wave offset.

This may produce worse code in cases where the offset ends up not
fitting in the MUBUF offset field. A better heuristic would be helpfu
for extreme frames.

llvm-svn: 364185
2019-06-24 14:53:56 +00:00
Matt Arsenault 942404d01b AMDGPU: Cleanup checking when spills need emergency slots
Address fixme, which should no longer be a problem since r363757.

llvm-svn: 364182
2019-06-24 14:34:40 +00:00
Simon Pilgrim b617b0808d [InstCombine] SliceUpIllegalIntegerPHI - bail on out of range shifts
trunc(lshr) handling - if the shift is out of range (undefined) then bail like we do for non-constant shifts.

Fixes OSS Fuzz #15217

llvm-svn: 364181
2019-06-24 13:13:36 +00:00
Simon Pilgrim 69144a925e [DAGCombine] visitMUL - allow shift by zero in MulByConstant.
This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward.

Fixes OSS Fuzz #15429

llvm-svn: 364179
2019-06-24 12:47:17 +00:00
Bjorn Pettersson 485a421876 [ConstantFolding] Use hasVectorInstrinsicScalarOpd. NFC
Summary:
Use the hasVectorInstrinsicScalarOpd helper function
in ConstantFoldVectorCall.

Reviewers: rengolin, RKSimon, dblaikie

Reviewed By: rengolin, RKSimon

Subscribers: tschuett, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63705

llvm-svn: 364178
2019-06-24 12:07:17 +00:00
Bjorn Pettersson 512b118779 [Scalarizer] Add scalarizer support for smul.fix.sat
Summary:
Handle smul.fix.sat in the scalarizer. This is done by
adding smul.fix.sat to the set of "isTriviallyVectorizable"
intrinsics.

The addition of smul.fix.sat in isTriviallyVectorizable and
hasVectorInstrinsicScalarOpd can also be seen as a preparation
to be able to use hasVectorInstrinsicScalarOpd in ConstantFolding.

Reviewers: rengolin, RKSimon, dblaikie

Reviewed By: rengolin

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63704

llvm-svn: 364177
2019-06-24 12:07:11 +00:00
Simon Tatham fe8017621e [ARM] Add MVE interleaving load/store family.
This adds the family of loads and stores with names like VLD20.8 and
VST42.32, which load and store parts of multiple q-registers in such a
way that executing both VLD20 and VLD21, or all four of VLD40..VLD43,
will distribute 2 or 4 vectors' worth of memory data across the lanes
of the same number of registers but in a transposed order.

In addition to the Tablegen descriptions of the instructions
themselves, this patch also adds encode and decode support for the
QQPR and QQQQPR register classes (representing the range of loaded or
stored vector registers), and tweaks to the parsing system for lists
of vector registers to make it return the right format in this case
(since, unlike NEON, MVE regards q-registers as primitive, and not
just an alias for two d-registers).

llvm-svn: 364172
2019-06-24 10:00:39 +00:00
Pavel Labath bb6d0b8e7b [Support] Fix error handling in DataExtractor::get[US]LEB128
Summary:
These functions are documented as not modifying the offset argument if
the extraction fails (just like other DataExtractor functions). However,
while reviewing D63591 we discovered that this is not the case -- if the
function reaches the end of the data buffer, it will just return the
value parsed until that point and set offset to point to the end of the
buffer.

This fixes the functions to act as advertised, and adds a regression
test.

Reviewers: dblaikie, probinson, bkramer

Subscribers: kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63645

llvm-svn: 364169
2019-06-24 09:11:24 +00:00
Craig Topper e8da65c698 [X86] Turn v16i16->v16i8 truncate+store into a any_extend+truncstore if we avx512f, but not avx512bw.
Ideally we'd be able to represent this truncate as a any_extend to
v16i32 and a truncate, but SelectionDAG doens't know how to not
fold those together.

We have isel patterns to use a vpmovzxwd+vpdmovdb for the truncate,
but we aren't able to simultaneously fold the load and the store
from the isel pattern. By pulling the truncate into the store we
can successfully hide it from the DAG combiner. Then we can isel
pattern match the truncstore and load+any_extend separately.

llvm-svn: 364163
2019-06-23 23:51:21 +00:00
Sanjoy Das e2291f5af9 Fix typo in comment; NFC
llvm-svn: 364159
2019-06-23 19:22:13 +00:00
Craig Topper c8d94e7889 [X86] Fix isel pattern that was looking for a bitcasted load. Remove what appears to be a copy/paste mistake.
DAG combine should ensure bitcasts of loads don't exist.

Also remove 3 patterns that are identical to the block above them.

llvm-svn: 364158
2019-06-23 19:17:50 +00:00
Philip Reames d22a2a9a72 [IndVars] Remove dead instructions after folding trivial loop exit
In rL364135, I taught IndVars to fold exiting branches in loops with a zero backedge taken count (i.e. loops that only run one iteration).  This extends that to eliminate the dead comparison left around.  

llvm-svn: 364155
2019-06-23 17:06:57 +00:00
Fangrui Song f955d5f623 SlotIndexes: delete unused functions
llvm-svn: 364154
2019-06-23 16:05:29 +00:00
Sanjay Patel 13a5ae58fc [InstCombine] squash is-power-of-2 that uses ctpop
This is another intermediate IR step towards solving PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314

We can test if a value is power-of-2-or-0 using ctpop(X) < 2,
so combining that with a non-zero check of the input is the
same as testing if exactly 1 bit is set:

(X != 0) && (ctpop(X) u< 2) --> ctpop(X) == 1

Differential Revision: https://reviews.llvm.org/D63660

llvm-svn: 364153
2019-06-23 14:22:37 +00:00
Fangrui Song 6620e3b2f6 SlotIndexes: simplify IdxMBBPair operators
llvm-svn: 364152
2019-06-23 13:16:03 +00:00
Craig Topper 6ddc7912b0 [SelectionDAG] Remove the code that attempts to calculate the alignment for the second half of a split masked load/store.
The code divides the alignment by 2 if the original alignment is
equal to the original VT size. But this wouldn't be correct
if the alignment was larger than the VT size.

The memory operand object already takes care of calling MinAlign
on the base alignment and the memory pointer offset. So we don't
need any special code at all.

llvm-svn: 364151
2019-06-23 07:00:46 +00:00
Craig Topper cadd826d0a [X86][SelectionDAG] Cleanup and simplify masked_load/masked_store in tablegen. Use more precise PatFrags for scalar masked load/store.
Rename masked_load/masked_store to masked_ld/masked_st to discourage
their direct use. We need to check truncating/extending and
compressing/expanding before using them. This revealed that
our scalar masked load/store patterns were misusing these.

With those out of the way, renamed masked_load_unaligned and
masked_store_unaligned to remove the "_unaligned". We didn't
check the alignment anyway so the name was somewhat misleading.

Make the aligned versions inherit from masked_load/store instead
from a separate identical version. Merge the 3 different alignments
PatFrags into a single version that uses the VT from the SDNode to
determine the size that the alignment needs to match.

llvm-svn: 364150
2019-06-23 06:06:04 +00:00
Keno Fischer 5f4ae7c457 [Support] Fix build under Emscripten
Summary:
Emscripten's libc doesn't define MNT_LOCAL, thus causing a build
failure in the fallback path. However, to the best of my knowledge,
it also doesn't support remote file system mounts, so we may simply
return `true` here (as we do for e.g. Fuchsia). With this fix, the
core LLVM libraries build correctly under emscripten (though some
of the tools and utils do not).

Reviewers: kripken
Differential Revision: https://reviews.llvm.org/D63688

llvm-svn: 364143
2019-06-23 00:29:59 +00:00
Don Hinton 64b0924531 Revert [CommandLine] Remove OptionCategory and SubCommand caches from the Option class.
This reverts r364134 (git commit a5b83bc9e3)

Caused errors in the asan bot, so the GeneralCategory global needs to
be changed to ManagedStatic.

Differential Revision: https://reviews.llvm.org/D62105

llvm-svn: 364141
2019-06-22 23:32:36 +00:00
Simon Pilgrim a962c1bc0f [X86][SSE] Fold extract_subvector(vselect(x,y,z),0) -> vselect(extract_subvector(x,0),extract_subvector(y,0),extract_subvector(z,0))
llvm-svn: 364136
2019-06-22 17:57:01 +00:00
Philip Reames 8deb84c8ef Exploit a zero LoopExit count to eliminate loop exits
This turned out to be surprisingly effective. I was originally doing this just for completeness sake, but it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.

Once this is in, I'm thinking about trying to build on the same infrastructure to eliminate provably untaken checks. There may be something generally interesting here.

Differential Revision: https://reviews.llvm.org/D63618

llvm-svn: 364135
2019-06-22 17:54:25 +00:00
Don Hinton a5b83bc9e3 [CommandLine] Remove OptionCategory and SubCommand caches from the Option class.
Summary:
This change processes `OptionCategory`s and `SubCommand`s as they
 are seen instead of caching them in the Option class and processing
them later.  Doing so simplifies the work needed to be done by the Global
parser and significantly reduces the size of the Option class to a mere 64
bytes.

Removing  the `OptionCategory` cache saved 24 bytes, and removing
the `SubCommand` cache saved an additional 48 bytes, for a total of a
72 byte reduction.

Reviewers: beanz, zturner, MaskRay, serge-sans-paille

Reviewed By: serge-sans-paille

Subscribers: serge-sans-paille, tstellar, zturner, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62105

llvm-svn: 364134
2019-06-22 17:22:50 +00:00
Hubert Tong 6f3222ed94 [NFC] Fix indentation in PPCAsmPrinter.cpp
After r248261, the indentation switches, inside a namespace definition,
between indenting and not indenting one level in for that namespace; the
abomination occurs in the middle of a class definition. Fix that.

llvm-svn: 364133
2019-06-22 16:03:29 +00:00
Hubert Tong d801cb1f54 [PowerPC][NFC] Move comment to the relevant function
A comment that applies to a virtual destructor was placed on a class
constructor. Move the comment to where it belongs.

llvm-svn: 364132
2019-06-22 16:02:02 +00:00
Nikita Popov 8c8e40f763 [NewGVN] Fix copy/paste mistake in cast
llvm-svn: 364130
2019-06-22 10:20:13 +00:00
Nikita Popov e96fda726e [NewGVN] Remove dead SwitchEdges variable; NFC
llvm-svn: 364129
2019-06-22 10:20:07 +00:00
Peter Collingbourne 8cd780b432 AArch64: Add support for reading pc using llvm.read_register.
This is useful for allowing code to efficiently take an address
that can be later mapped onto debug info. Currently the hwasan
pass achieves this by taking the address of the current function:
http://llvm-cs.pcc.me.uk/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp#921

but this costs two instructions (plus a GOT entry in PIC code) per function
with stack variables. This will allow the cost to be reduced to a single
instruction.

Differential Revision: https://reviews.llvm.org/D63471

llvm-svn: 364126
2019-06-22 03:03:25 +00:00
Fangrui Song 01d649c249 [CMake] Delete redundant DEPENDS/LINK_LIBS from LineEditor/XRay
The link dependencies are already specified in LLVMBuild.txt

llvm-svn: 364125
2019-06-22 01:50:21 +00:00
Fangrui Song 43e14390b0 Make GlobalISel depend on SelectionDAG after D63169
GlobalISel/IRTranslator.cpp now references SelectionDAG/FunctionLoweringInfo.cpp.
This fixes a link error in -DBUILD_SHARED_LIBS=on builds:

    ld.lld: error: undefined symbol: llvm::FunctionLoweringInfo::clear()
    >>> referenced by IRTranslator.cpp:2198 (../lib/CodeGen/GlobalISel/IRTranslator.cpp:2198)
    >>>               lib/CodeGen/GlobalISel/CMakeFiles/LLVMGlobalISel.dir/IRTranslator.cpp.o:(llvm::IRTranslator::finalizeFunction())

llvm-svn: 364124
2019-06-22 01:30:17 +00:00
Peter Collingbourne 4608868d2f AArch64: Prefer FP-relative debug locations in HWASANified functions.
To help produce better diagnostics for stack use-after-return, we'd like
to be able to determine the addresses of each HWASANified function's local
variables given a small amount of information recorded on entry to the
function. Currently we require all HWASANified functions to use frame pointers
and record (PC, FP) on function entry. This works better than recording SP
because FP cannot change during the function, unlike SP which can change
e.g. due to dynamic alloca.

However, most variables currently end up using SP-relative locations in their
debug info. This prevents us from recomputing the address of most variables
because the distance between SP and FP isn't recorded in the debug info. To
address this, make the AArch64 backend prefer FP-relative debug locations
when producing debug info for HWASANified functions.

Differential Revision: https://reviews.llvm.org/D63300

llvm-svn: 364117
2019-06-22 00:06:51 +00:00
Tom Tan 7ecb5145ba [COFF, ARM64] Fix encoding of debugtrap for Windows
On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is
mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break
point instruction on ARM64 which is recognized by Windows debugger and
exception handling code, so llvm.debugtrap should map to it instead of
redirecting to llvm.trap (brk #1) as the default implementation.

Differential Revision: https://reviews.llvm.org/D63635

llvm-svn: 364115
2019-06-21 23:38:05 +00:00
Reid Kleckner 592a193285 Revert [SLP] Look-ahead operand reordering heuristic.
This reverts r364084 (git commit 5698921be2)

It caused crashes while compiling a file in Chrome. Reduction
forthcoming.

llvm-svn: 364111
2019-06-21 23:10:25 +00:00
Julian Lettner 19c4d660f4 [ASan] Use dynamic shadow on 32-bit iOS and simulators
The VM layout on iOS is not stable between releases. On 64-bit iOS and
its derivatives we use a dynamic shadow offset that enables ASan to
search for a valid location for the shadow heap on process launch rather
than hardcode it.

This commit extends that approach for 32-bit iOS plus derivatives and
their simulators.

rdar://50645192
rdar://51200372
rdar://51767702

Reviewed By: delcypher

Differential Revision: https://reviews.llvm.org/D63586

llvm-svn: 364105
2019-06-21 21:01:39 +00:00
Matt Arsenault 22e3dc60a0 AMDGPU: Fix not using s33 for scratch wave offset in kernels
Fixes missing piece from r363990.

llvm-svn: 364099
2019-06-21 20:04:02 +00:00
Craig Topper 4649a051bf [X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into (insert_subvector allzeros, (vzmovl X), 0)
128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg.

This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns.

Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining.

I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload.

Differential Revision: https://reviews.llvm.org/D63512

llvm-svn: 364095
2019-06-21 19:10:21 +00:00
Craig Topper 4569cdbcf5 [X86] Don't mark v64i8/v32i16 ISD::SELECT as custom unless they are legal types.
We don't have any Custom handling during type legalization. Only
operation legalization.

Fixes PR42355

llvm-svn: 364093
2019-06-21 18:50:00 +00:00
Craig Topper ce6c06dfdd [X86] Add a debug print of the node in the default case for unhandled opcodes in ReplaceNodeResults.
This should be unreachable, but bugs can make it reachable. This
adds a debug print so we can see the bad node in the output when
the llvm_unreachable triggers.

llvm-svn: 364091
2019-06-21 18:49:21 +00:00
Simon Pilgrim 5dba4ed208 [X86][AVX] Combine INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) as shuffle
Subvector shuffling often ends up as insert/extract subvector.

llvm-svn: 364090
2019-06-21 18:35:04 +00:00
Amara Emerson 6e71b34fe6 [AArch64][GlobalISel] Implement selection support for the new G_JUMP_TABLE and G_BRJT ops.
With this we can now fully code generate jump tables, which is important for code size.

Differential Revision: https://reviews.llvm.org/D63223

llvm-svn: 364086
2019-06-21 18:10:41 +00:00
Amara Emerson fe4625fb24 [GlobalISel][IRTranslator] Change switch table translation to generate jump tables and range checks.
This change makes use of the newly refactored SwitchLoweringUtils code from
SelectionDAG to in order to generate jump tables and range checks where appropriate.

Much of this code is ported from SDAG with some modifications. We generate
G_JUMP_TABLE and G_BRJT instructions when JT opportunities are found. This means
that targets which previously relied on the naive one MBB per case stmt
translation will now start falling back until they add support for the new opcodes.

For range checks, we don't generate any previously unused operations. This
just recognizes contiguous ranges of case values and generates a single block per
range. Single case value blocks are just a special case of ranges so we get that
support almost for free.

There are still some optimizations missing that I haven't ported over, and
bit-tests are also unimplemented. This patch series is already complex enough.

Actual arm64 support for selection of jump tables is coming in a later patch.

Differential Revision: https://reviews.llvm.org/D63169

llvm-svn: 364085
2019-06-21 18:10:38 +00:00
Simon Pilgrim 5698921be2 [SLP] Look-ahead operand reordering heuristic.
This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D60897

llvm-svn: 364084
2019-06-21 17:57:01 +00:00
Craig Topper 6af1be9664 [X86] Use vmovq for v4i64/v4f64/v8i64/v8f64 vzmovl.
We already use vmovq for v2i64/v2f64 vzmovl. But we were using a
blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or
movsd+xorpd under optsize.

I think the blend with 0 or movss/d is only needed for
vXi32 where we don't have an instruction that can move 32
bits from one xmm to another while zeroing upper bits.

movq is no worse than blendpd on any known CPUs.

llvm-svn: 364079
2019-06-21 17:24:21 +00:00
Simon Pilgrim 0da13ed1f6 [DAGCombine] narrowExtractedVectorBinOp - pull out repeated getOpcode(). NFCI.
llvm-svn: 364076
2019-06-21 16:44:51 +00:00
Amara Emerson 8f25a021dd [AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal.
We sometimes get poor code size because constants of types < 32b are legalized
as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the
localizer can no longer sink them (although it's possible to extend it to do so).

On AArch64 however s8 and s16 constants can be selected in the same way as s32
constants, with a mov pseudo into a W register. If we make s8 and s16 constants
legal then we can avoid unnecessary truncates, they can be CSE'd, and the
localizer can sink them as normal.

There is a caveat: if the user of a smaller constant has to widen the sources,
we end up with an anyext of the smaller typed G_CONSTANT. This can cause
regressions because of the additional extend and missed pattern matching. To
remedy this, there's a new artifact combiner to generate the wider G_CONSTANT
if it's legal for the target.

Differential Revision: https://reviews.llvm.org/D63587

llvm-svn: 364075
2019-06-21 16:43:50 +00:00
Stanislav Mekhanoshin bdf7f81b89 [AMDGPU] hazard recognizer for fp atomic to s_denorm_mode
This requires 3 wait states unless there is a wait or VALU in
between.

Differential Revision: https://reviews.llvm.org/D63619

llvm-svn: 364074
2019-06-21 16:30:14 +00:00
David Bolvansky dbcdad51ff [InstCombine] (1 << (C - x)) -> ((1 << C) >> x) if C is bitwidth - 1
Summary:
```
%a = sub i32 31, %x
%r = shl i32 1, %a
  =>
%d = shl i32 1, 31
%r = lshr i32 %d, %x

Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/btZm

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63652

llvm-svn: 364073
2019-06-21 16:25:32 +00:00
Simon Pilgrim 96e77ce626 [X86] isBinOp - move commutative ops to isCommutativeBinOp. NFCI.
TargetLoweringBase::isBinOp checks isCommutativeBinOp as a fallback, so don't duplicate.

llvm-svn: 364072
2019-06-21 16:23:28 +00:00
Simon Pilgrim bdea88325f Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.
llvm-svn: 364068
2019-06-21 16:11:18 +00:00
David Bolvansky 4b28478389 [InstCombine] cttz(abs(x)) -> cttz(x)
Summary: Signedness does not change number of trailing zeros.

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D63546

llvm-svn: 364064
2019-06-21 15:26:22 +00:00
Sanjay Patel ddb9093684 [GVNSink] prevent crashing on mismatched instructions (PR42346)
Patch based on suggestion by James Molloy (@jmolloy) in:
https://bugs.llvm.org/show_bug.cgi?id=42346

llvm-svn: 364062
2019-06-21 15:17:24 +00:00
Simon Pilgrim ca9933c22d [DAGCombine] narrowInsertExtractVectorBinOp - reuse "extract from insert" detection code.
Move the "extract from insert detection code" into a lambda helper function.

llvm-svn: 364059
2019-06-21 14:46:21 +00:00
Jay Foad d9d3c91b48 [Scalarizer] Propagate IR flags
Summary:
The motivation for this was to propagate fast-math flags like nnan and
ninf on vector floating point operations to the corresponding scalar
operations to take advantage of follow-on optimizations. But I think
the same argument applies to all of our IR flags: if they apply to the
vector operation then they also apply to all the individual scalar
operations, and they might enable follow-on optimizations.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63593

llvm-svn: 364051
2019-06-21 14:10:18 +00:00
Sam Elliott 96c8bc7956 [RISCV] Add RISCV-specific TargetTransformInfo
Summary:
LLVM Allows Targets to provide information that guides optimisations
made to LLVM IR. This is done with callbacks on a TargetTransformInfo object.

This patch adds a TargetTransformInfo class for RISC-V. This will allow us to
implement RISC-V specific callbacks as they become necessary.

This commit also adds the getIntImmCost callbacks, and tests them with a simple
constant hoisting test. Our immediate costs are on the conservative side, for
the moment, but we prevent hoisting in most circumstances anyway.

Previous review was on D63007

Reviewers: asb, luismarques

Reviewed By: asb

Subscribers: ributzka, MaskRay, llvm-commits, Jim, benna, psnobl, jocewei, PkmX, rkruppe, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, apazos, simoncook, johnrusso, rbar, hiraditya, mgorny

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63433

llvm-svn: 364046
2019-06-21 13:36:09 +00:00
Simon Tatham 0c7af66450 [ARM] Add MVE 64-bit GPR <-> vector move instructions.
These instructions let you load half a vector register at once from
two general-purpose registers, or vice versa.

The assembly syntax for these instructions mentions the vector
register name twice. For the move _into_ a vector register, the MC
operand list also has to mention the register name twice (once as the
output, and once as an input to represent where the unchanged half of
the output register comes from). So we can conveniently assign one of
the two asm operands to be the output $Qd, and the other $QdSrc, which
avoids confusing the auto-generated AsmMatcher too much. For the move
_from_ a vector register, there's no way to get round the fact that
both instances of that register name have to be inputs, so we need a
custom AsmMatchConverter to avoid generating two separate output MC
operands. (And even that wouldn't have worked if it hadn't been for
D60695.)

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62679

llvm-svn: 364041
2019-06-21 13:17:23 +00:00
Simon Tatham bafb105e96 [ARM] Add MVE vector instructions that take a scalar input.
This adds the `MVE_qDest_rSrc` superclass and all its instances, plus
a few other instructions that also take a scalar input register or two.

I've also belatedly added custom diagnostic messages to the operand
classes for odd- and even-numbered GPRs, which required matching
changes in two of the existing MVE assembly test files.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62678

llvm-svn: 364040
2019-06-21 13:17:08 +00:00
Paul Robinson 26cc5bcb1a Fix a crash with assembler source and -g.
llvm-mc or clang with -g normally produces debug info describing the
assembler source itself; however, if that source already contains some
.file/.loc directives, we should instead emit the debug info described
by those directives.  For certain assembler sources seen in the wild
(particularly in the Chrome build) this was causing a crash due to
incorrect assumptions about legal sequences of assembler source text.

Fixes PR38994.

Differential Revision: https://reviews.llvm.org/D63573

llvm-svn: 364039
2019-06-21 13:10:19 +00:00