Commit Graph

9910 Commits

Author SHA1 Message Date
Saleem Abdulrasool 4005f9a860 ARM: handle checking aliases with out-of-bounds GEPs
A global alias may use indices which are not considered in bounds.  In
such a case, accessing the base object will fail as it only peers
through inbounds accesses.  This pattern is used by the swift compiler
to create references to preceeding members in the type metadata.  This
would cause the code generation to fail when targeting a platform that
used ELF as the object file format.  Be conservative and fail the
read-only check if we run into an alias that we cannot peer through.

llvm-svn: 345107
2018-10-24 00:00:52 +00:00
Simon Pilgrim 816e57be35 [TTI] Add generic cost handling of SK_Reverse shuffles
These can be treated as a general permute.

This required a fix for missing reverse patterns on ARM

llvm-svn: 345015
2018-10-23 09:42:10 +00:00
Eli Friedman b09c778715 Revert r344693 ("[ARM] bottom-top mul support in ARMParallelDSP")
Still causing failures on the polly-aosp buildbot; I'll follow up
with a reduced testcase.

llvm-svn: 344752
2018-10-18 19:34:30 +00:00
Sam Parker 2ef3c0dad6 [ARM] bottom-top mul support in ARMParallelDSP
Previously reverted in rL343082.

Original commit message:

On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.

Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 344693
2018-10-17 13:02:48 +00:00
Sjoerd Meijer 64cfb74a61 [ARM] Do not fuse VADD and VMUL, continued (2/2)
This is patch 2/2, following up on D53314, and is the functional change
to prevent fusing mul + add sequences into VFMAs.

Differential revision: https://reviews.llvm.org/D53315

llvm-svn: 344683
2018-10-17 10:05:44 +00:00
Sjoerd Meijer 1a213a42d6 [ARM] Follow up of rL344671, attempt to pacify a buildbot
It was rightfully complaining about an unpretty logical expression.

llvm-svn: 344677
2018-10-17 07:51:24 +00:00
Sjoerd Meijer ff3ab33ec8 [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2)
This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs
for performance reasons on the Cortex-M4 and Cortex-M33.  This is a serie of 2
patches, that is trying to achieve the same for VFMA.  The second column in the
table below shows what we were generating before rL342874, the third column
what changed with rL342874, and the last column what we want to achieve with
these 2 patches:

 --------------------------------------------------------
 | Opt   |  < rL342874   |  >= rL342874   |             |
 |------------------------------------------------------|
 |-O3    |     vmla      |      vmul      |     vmul    |
 |       |               |      vadd      |     vadd    |
 |------------------------------------------------------|
 |-Ofast |     vfma      |      vfma      |     vmul    |
 |       |               |                |     vadd    |
 |------------------------------------------------------|
 |-Oz    |     vmla      |      vmla      |     vmla    |
 --------------------------------------------------------

This patch 1/2, is a cleanup of the spaghetti predicate logic on the different
VMLA and VFMA codegen rules, so that we can make the final functional change in
patch 2/2.  This also fixes a typo in the regression test added in rL342874.

Differential revision: https://reviews.llvm.org/D53314

llvm-svn: 344671
2018-10-17 07:26:35 +00:00
Evandro Menezes 46eadcff9c [NFC][ARM] Refactor macro fusion
Simplify code for wildcards.

llvm-svn: 344625
2018-10-16 17:19:51 +00:00
Simon Pilgrim 5abb607ebe [ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281)
As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type.

This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case......

Differential Revision: https://reviews.llvm.org/D53257

llvm-svn: 344512
2018-10-15 13:20:41 +00:00
Dorit Nuzman 38bbf81ade recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman 5118c68cde revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman 8174368955 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
George Burgess IV 6ef8002c2c Replace most users of UnknownSize with LocationSize::unknown(); NFC
Moving away from UnknownSize is part of the effort to migrate us to
LocationSizes (e.g. the cleanup promised in D44748).

This doesn't entirely remove all of the uses of UnknownSize; some uses
require tweaks to assume that UnknownSize isn't just some kind of int.
This patch is intended to just be a trivial replacement for all places
where LocationSize::unknown() will Just Work.

llvm-svn: 344186
2018-10-10 21:28:44 +00:00
Peter Smith 6f36cd4d76 [ARM] Account for implicit IT when calculating inline asm size
When deciding if it is safe to optimize a conditional branch to a CBZ or
CBNZ the offsets of the BasicBlocks from the start of the function are
estimated. For inline assembly the generic getInlineAsmLength() function is
used to get a worst case estimate of the inline assembly by multiplying the
number of instructions by the max instruction size of 4 bytes. This
unfortunately doesn't take into account the generation of Thumb implicit IT
instructions. In edge cases such as when all the instructions in the block
are 4-bytes in size and there is an implicit IT then the size is
underestimated. This can cause an out of range CBZ or CBNZ to be generated.

The patch takes a conservative approach and assumes that every instruction
in the inline assembly block may have an implicit IT.

Fixes pr31805

Differential Revision: https://reviews.llvm.org/D52834

llvm-svn: 343960
2018-10-08 09:38:28 +00:00
Matthias Braun 81578e9f77 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
This rebases and recommits r343520. hwasan should be fixed now and this
shouldn't break the tests anymore.

Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343895
2018-10-05 22:00:13 +00:00
Jonas Paulsson faad1b3056 [TargetRegisterInfo] Remove temporary hook enableMultipleCopyHints()
Finally all targets are enabling multiple regalloc hints, so the hook to
disable this can now be removed.

NFC.

Review: Simon Pilgrim
https://reviews.llvm.org/D52316

llvm-svn: 343851
2018-10-05 14:23:11 +00:00
Matt Morehouse 4b1ec17fb0 Revert "X86, AArch64, ARM: Do not attach debug location to spill/reload instructions"
This reverts r343520 due to breakage of HWASan tests on Android.

llvm-svn: 343616
2018-10-02 18:35:44 +00:00
Diogo N. Sampaio eb9ca5ab18 [ARM] Emmit data symbol for constant pool data
The ARM elf emitter would omit printing data
symbol when constant data. This patch
overrides the emitFill method as to enforce that
the symbol is correctly printed.

Differential revision: https://reviews.llvm.org/D52737

llvm-svn: 343594
2018-10-02 14:55:48 +00:00
Matthias Braun 3e081703c3 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343520
2018-10-01 18:56:39 +00:00
Eli Friedman 5ab09a684f [ARM] Fix correctness checks in promoteToConstantPool.
Correctly check for relocations in the constant to promote. And don't
allow promoting a constant multiple times.

This partially fixes https://bugs.llvm.org//show_bug.cgi?id=32780 ;
it's not a complete fix because we also need to prevent
ARMConstantIslands from cloning the constant.

(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)

Differential Revision: https://reviews.llvm.org/D51472

llvm-svn: 343361
2018-09-28 20:27:31 +00:00
Eli Friedman bb993be56b [ARM] Use preferred alignment for constants in promoteToConstantPool.
This mostly affects IR generated by non-clang frontends because clang
generally sets the alignment of globals explicitly.

Fixes https://bugs.llvm.org//show_bug.cgi?id=32394 .

(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)

Differential Revision: https://reviews.llvm.org/D51469

llvm-svn: 343359
2018-09-28 20:21:51 +00:00
David Spickett ea605913be [ARM] Allow execute only code on Cortex-m23
The NoMovt feature prevents the use of MOVW/MOVT
instructions on Cortex-M23 for performance reasons.
These instructions are required for execute only code
so NoMovt should be disabled when that option is enabled.

Differential Revision: https://reviews.llvm.org/D52551

llvm-svn: 343302
2018-09-28 08:55:19 +00:00
Oliver Stannard 5f34e9e265 [ARM][v8.5A] Add speculation barriers SSBB and PSSBB
This adds two new barrier instructions which can be used to restrict
speculative execution of load instructions.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52484

llvm-svn: 343300
2018-09-28 08:27:56 +00:00
Oliver Stannard 382c935c42 [ARM][v8.5A] Add speculation barrier to ARM & Thumb instruction sets
This is a new barrier which limits speculative execution of the
instructions following it.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52477

llvm-svn: 343213
2018-09-27 13:41:14 +00:00
Fangrui Song 0cac726a00 llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)
Summary: The convenience wrapper in STLExtras is available since rL342102.

Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb

Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D52573

llvm-svn: 343163
2018-09-27 02:13:45 +00:00
Oliver Stannard 7c3c4baa3f [ARM/AArch64][v8.5A] Add Armv8.5-A target
This patch allows targeting Armv8.5-A, adding the architecture to
tablegen and setting the options to be identical to Armv8.4-A for the
time being. Subsequent patches will add support for the different
features included in the Armv8.5-A Reference Manual.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52470

llvm-svn: 343102
2018-09-26 12:48:21 +00:00
Sam Parker 75aca94093 [ARM] Fix for PR39060
When calculating whether a value can safely overflow for use by an
icmp, we weren't checking that the value couldn't wrap around. To do
this we need the icmp to be using a constant, as well as the incoming
add or sub.

bugzilla report: https://bugs.llvm.org/show_bug.cgi?id=39060

Differential Revision: https://reviews.llvm.org/D52463

llvm-svn: 343092
2018-09-26 10:56:00 +00:00
Hans Wennborg 4b2e7daa7e Revert r342870 "[ARM] bottom-top mul support ARMParallelDSP"
This broke Chromium's Android build (https://crbug.com/889390) and the
polly-aosp buildbot
(http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable).

> Originally committed in rL342210 but was reverted in rL342260 because
> it was causing issues in vectorized code, because I had forgotten to
> ensure that we're operating on scalar values.
>
> Original commit message:
>
> On failing to find sequences that can be converted into dual macs,
> try to find sequential 16-bit loads that are used by muls which we
> can then use smultb, smulbt, smultt with a wide load.
>
> Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 343082
2018-09-26 08:41:50 +00:00
Nirav Dave 0a0c2e6dd9 [ARM] Share predecessor bookkeeping in CombineBaseUpdate. NFCI.
llvm-svn: 342987
2018-09-25 15:30:47 +00:00
Evandro Menezes 0600c365a8 [ARM] Adjust the cost model for Exynos
Tune `MaxInterleaveFactor` and `LdStMultipleTiming`and remove
`PartialUpdateClearance` for the Exynos processors.

llvm-svn: 342900
2018-09-24 16:35:14 +00:00
Evandro Menezes 814c68729d [ARM] Adjust the feature set for Exynos
Enable crypto and literals fusion for the Exynos processors.

llvm-svn: 342899
2018-09-24 16:35:09 +00:00
Zhaoshi Zheng 05b46dc300 [Thumb1] Any imm8 should have cost of 1
A simple MOVS rd, imm8 can materialize [-128, 127] in signed i8 type or
[0, 255] in unsigned i8 type on Thumb1.

Differential Revision: https://reviews.llvm.org/D52257

llvm-svn: 342898
2018-09-24 16:15:23 +00:00
Luke Cheeseman ab7f9b170d [Arm][AsmParser] Restrict register list size for VSTM/VLDM
- The assembler accepts VSTM/VLDM with register lists (specifically double registers lists) with more than 16 registers specified
- The Arm architecture reference manual says this instruction must not contain more than 16 registers when the registers are doubleword registers
- This addresses one of the concerns in https://bugs.llvm.org/show_bug.cgi?id=38389

Differential Revision: https://reviews.llvm.org/D52082

llvm-svn: 342891
2018-09-24 15:13:48 +00:00
Sjoerd Meijer d986ede313 [ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33
A sequence of VMUL and VADD instructions always give the same or better
performance than a fused VMLA instruction on the Cortex-M4 and Cortex-M33.
Executing the VMUL and VADD back-to-back requires the same cycles, but
having separate instructions allows scheduling to avoid the hazard between
these 2 instructions.

Differential Revision: https://reviews.llvm.org/D52289

llvm-svn: 342874
2018-09-24 12:02:50 +00:00
Hans Wennborg 5555c00902 Revert r341932 "[ARM] Enable ARMCodeGenPrepare by default"
This caused miscompilation of WebRTC for Android: PR39060.

> We've had the pass enabled downstream for a couple of weeks and it
> seems to be okay, so enable it by default.
>
> Differential Revision: https://reviews.llvm.org/D51920

llvm-svn: 342873
2018-09-24 11:40:07 +00:00
Luke Cheeseman bda54bca39 [ARM][ARMLoadStoreOptimizer]
- The load store optimizer is currently merging multiple loads/stores into VLDM/VSTM with more than 16 doubleword registers
- This is an UNPREDICTABLE instruction and shouldn't be done
- It looks like the Limit for how many registers included in a merge got dropped at some point so I am reintroducing it in this patch
- This fixes https://bugs.llvm.org/show_bug.cgi?id=38389

Differential Revision: https://reviews.llvm.org/D52085

llvm-svn: 342872
2018-09-24 10:42:22 +00:00
Sam Parker a7b2405b06 [ARM] bottom-top mul support ARMParallelDSP
Originally committed in rL342210 but was reverted in rL342260 because
it was causing issues in vectorized code, because I had forgotten to
ensure that we're operating on scalar values.

Original commit message:

On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.

Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 342870
2018-09-24 09:34:06 +00:00
Maya Madhavan ec1efe4ee3 Fix for bug 34002 - label generated before it block is finalized. Differential Revision: https://reviews.llvm.org/D52258
llvm-svn: 342615
2018-09-20 05:11:42 +00:00
Evandro Menezes 8a6973d6ff [ARM] Adjust the feature set for Exynos
Fine tune the cost model for all Exynos processors.

llvm-svn: 342585
2018-09-19 19:51:29 +00:00
Evandro Menezes c62ab61173 [ARM] Refactor Exynos feature set (NFC)
Since all Exynos processors share the same feature set, fold them in the
implied fatures list for the subtarget.

llvm-svn: 342583
2018-09-19 19:43:23 +00:00
Alex Bradbury 79518b02cd [AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IR
This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have 
updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they 
will see no functional change. Previously this hook returned bool, but it now 
returns AtomicExpansionKind.

This hook allows targets to select how a given cmpxchg is to be expanded. 
D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic.

See my associated RFC for more info on the motivation for this change 
<http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>.

Differential Revision: https://reviews.llvm.org/D48130

llvm-svn: 342550
2018-09-19 14:51:42 +00:00
Oliver Stannard 0b835be7bb [ARM] Fix unwind information for floating point registers
Fixes the unwind information generated for floating-point registers.
Previously, all padding registers were assumed to be four bytes wide. Now, the
width of the register is used to specify the amount of padding.

Patch by Jackson Woodruff!

Differential revision: https://reviews.llvm.org/D51494

llvm-svn: 342545
2018-09-19 13:25:31 +00:00
Volodymyr Sapsai 703ab84cf5 Revert "[ARM] Cleanup ARM CGP isSupportedValue"
This reverts r342395 as it caused error

> Argument value type does not match pointer operand type!
>   %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25
>  i8in function atomic_flag_test_and_set
> fatal error: error in backend: Broken function found, compilation aborted!

on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/

More details are available at https://reviews.llvm.org/D52080

llvm-svn: 342431
2018-09-18 00:11:55 +00:00
Sam Parker 481cdab919 [ARM] Cleanup ARM CGP isSupportedValue
isSupportedValue explicitly checked and accepted many types of value,
primarily for debugging reasons. Remove most of these checks and do a
bit of refactoring now that the pass is more stable. This also enables
ZExts to be sources, but this has very little practical benefit at the
moment extend instructions will still be introduced.

Differential Revision: https://reviews.llvm.org/D52080

llvm-svn: 342395
2018-09-17 13:57:39 +00:00
Sam Parker 76d25d7f55 [ARM] Disallow icmp with negative imm and overflow
We allow overflowing instructions if they're decreasing and only used
by an unsigned compare. Add the extra condition that the icmp cannot
be using a negative immediate.

Differential Revision: https://reviews.llvm.org/D52102

llvm-svn: 342392
2018-09-17 13:48:25 +00:00
Reid Kleckner 00f0ee718f Revert r342210 "[ARM] bottom-top mul support in ARMParallelDSP"
It causes assertion failures while building Skia for Android in
Chromium:
https://ci.chromium.org/buildbot/chromium.clang/ToTAndroid/4550

Reduction forthcoming.

llvm-svn: 342260
2018-09-14 18:44:37 +00:00
Sam Parker 7b84fd7847 [ARM] bottom-top mul support in ARMParallelDSP
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.

Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 342210
2018-09-14 08:09:09 +00:00
Sam Parker aaec3c6260 [ARM] Allow truncs as sources in ARM CGP
We previously only allowed truncs as sinks, but now allow them as
sources too. We do this by checking that the result type is the
narrow type that we're trying to optimise for.

Differential Revision: https://reviews.llvm.org/D51978

llvm-svn: 342141
2018-09-13 15:14:12 +00:00
Sam Parker 96f77f142b [ARM] Fix FixConst for ARMCodeGenPrepare
Part of FixConsts wrongly assumes either a 8- or 16-bit constant
which can result in the wrong constants being generated during
promotion.

Differential Revision: https://reviews.llvm.org/D52032

llvm-svn: 342140
2018-09-13 14:48:10 +00:00
Tim Northover c15d47bb01 ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.
The Technical Reference Manuals for these two CPUs state that branching
to an unaligned 32-bit instruction incurs an extra pipeline reload
penalty. That's bad.

This also enables the optimization at -Os since it costs on average one
byte per loop in return for 1 cycle per iteration, which is pretty good
going.

llvm-svn: 342127
2018-09-13 10:28:05 +00:00
Saleem Abdulrasool aaa72c547b ARM: correct the relocation type for `bl` on WoA
The `IMAGE_REL_ARM_BRANCH20T` applies only to a `b.w` instruction.  A
thumb-2 `bl` should be relocated using a `IMAGE_REL_ARM_BRANCH24T`.
Correct the relocation that we emit in such a case.

Resolves PR38620!  Based on the patch by Jordan Rhee!

llvm-svn: 342109
2018-09-13 04:55:08 +00:00
Diogo N. Sampaio 01b916e188 [ARM] Tighten f64<->f16 conversion requirements
Fix missing Requires fields.

Patch by Bernard Ogden (bogden)

Reviewers: SjoerdMeijer, javed.absar, t.p.northover	

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D51631

llvm-svn: 342061
2018-09-12 16:24:43 +00:00
Sam Parker 1187911b0b [ARM] Follow-up to rL342033
Fixed typo which can cause segfault.

llvm-svn: 342040
2018-09-12 09:58:56 +00:00
Sam Parker a023c7a9cb [ARM] Exchange MAC operands in ARMParallelDSP
SMLAD and SMLALD instructions also come in the form of SMLADX and
SMLALDX which perform an exchange on their second operand. To support
this, more of the loads in the MAC candidates are compared for
sequential access and a boolean value has been added to BinOpChain.

AddMACCandiate has been refactored into a small pattern matching
state machine to reduce the amount of duplicated code, but also to
enable the matching to be more flexible. CreateParallelMACPairs now
iterates through all the candidates to find parallel ones.

Differential Revision: https://reviews.llvm.org/D51424

llvm-svn: 342033
2018-09-12 09:17:44 +00:00
Sam Parker 569b24549e [ARM] Allow bitcasts in ARMCodeGenPrepare
Allow bitcasts in the use-def chains, treating them as sources.

Differential Revision: https://reviews.llvm.org/D50758

llvm-svn: 342032
2018-09-12 09:11:48 +00:00
Sam Parker 01db2983cd [ARM] Add smlald support in ARMParallelDSP
Search from i64 reducing phis, as well as i32, to allow the
generation of smlald instructions.

Differential Revision: https://reviews.llvm.org/D51101

llvm-svn: 341941
2018-09-11 14:01:22 +00:00
Sam Parker 945604d511 [ARM] Enable ARMCodeGenPrepare by default
We've had the pass enabled downstream for a couple of weeks and it
seems to be okay, so enable it by default.

Differential Revision: https://reviews.llvm.org/D51920

llvm-svn: 341932
2018-09-11 12:45:43 +00:00
Benjamin Kramer 27c769d28a [Target] Untangle disassemblers
Disassemblers cannot depend on main target headers. The same is true for
MCTargetDesc, but there's a lot more cleanup needed for that.

llvm-svn: 341822
2018-09-10 12:53:46 +00:00
JF Bastien f03058e178 Fix typo in previous commit
llvm-svn: 341742
2018-09-08 04:07:41 +00:00
JF Bastien c4986cef12 ADT: add <bit> header, implement C++20 bit_cast, use
Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning.

This was originally committed as r341728 and reverted in r341730.

Reviewers: javed.absar, steven_wu, srhines

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51693

llvm-svn: 341741
2018-09-08 03:55:25 +00:00
JF Bastien 05430cc6e5 Revert "ADT: add <bit> header, implement C++20 bit_cast, use"
Bots sad. Looks like missing std::is_trivially_copyable.

llvm-svn: 341730
2018-09-07 23:23:47 +00:00
JF Bastien 28655081a4 ADT: add <bit> header, implement C++20 bit_cast, use
Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning.

Reviewers: javed.absar

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51693

llvm-svn: 341728
2018-09-07 23:08:26 +00:00
Tim Northover bb7d7b3d33 ARM: fix Thumb2 CodeGen for ldrex with folded frame-index.
Because t2LDREX (& t2STREX) were marked as AddrModeNone, but did allow a
FrameIndex operand, rewriteT2FrameIndex asserted. This gives them a
proper addressing-mode and tells the rewriter about it so that encodable
offsets are exploited and others are rejected.

Should fix PR38828.

llvm-svn: 341642
2018-09-07 09:21:25 +00:00
Eric Christopher fe83270ee9 The initial .text section generated in object files was missing the
SHF_ARM_PURECODE flag when being built with the -mexecute-only flag.
All code sections of an ELF must have the flag set for the final .text
section to be execute-only, otherwise the flag gets removed.

A HasData flag is added to MCSection to aid in the determination that
the section is empty. A virtual setTargetSectionFlags is added to
MCELFObjectTargetWriter to allow subclasses to set target specific
section flags to be added to sections which we then use in the ARM
backend to set SHF_ARM_PURECODE.

Patch by Ivan Lozano!

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D48792

llvm-svn: 341593
2018-09-06 22:09:31 +00:00
Sander de Smalen c91b27d9ee Remove FrameAccess struct from hasLoadFromStackSlot
This removes the FrameAccess struct that was added to the interface
in D51537, since the PseudoValue from the MachineMemoryOperand
can be safely casted to a FixedStackPseudoSourceValue.

Reviewers: MatzeB, thegameg, javed.absar

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D51617

llvm-svn: 341454
2018-09-05 08:59:50 +00:00
Martin Storsjo 68df812cce [MinGW] Move code for indicating "potentially not DSO local" into shouldAssumeDSOLocal. NFC.
On Windows, if shouldAssumeDSOLocal returns false, it's either a
dllimport reference, or a reference that we should treat as non-local
and create a stub for.

Clean up AArch64Subtarget::ClassifyGlobalReference a little while
touching the flag handling relating to dllimport.

Differential Revision: https://reviews.llvm.org/D51590

llvm-svn: 341402
2018-09-04 20:56:28 +00:00
Argyrios Kyrtzidis c30340b207 Add header guards to some headers that are missing them
Also adjust some of dsymutil's headers to put the header guards at the top,
otherwise the compiler will not recognize them as header guards.

llvm-svn: 341323
2018-09-03 16:22:05 +00:00
Sander de Smalen 6cab60fa06 Extend hasStoreToStackSlot with list of FI accesses.
For instructions that spill/fill to and from multiple frame-indices
in a single instruction, hasStoreToStackSlot and hasLoadFromStackSlot
should return an array of accesses, rather than just the first encounter
of such an access.

This better describes FI accesses for AArch64 (paired) LDP/STP
instructions.

Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D51537

llvm-svn: 341301
2018-09-03 09:15:58 +00:00
Martin Storsjo 2dcaa41e1e [MinGW] [ARM] Add stubs for potential automatic dllimported variables
The runtime pseudo relocations can't handle the ARM format embedded
addresses in movw/movt pairs. By using stubs, the potentially
dllimported addresses can be touched up by the runtime pseudo relocation
framework.

Differential Revision: https://reviews.llvm.org/D51450

llvm-svn: 341176
2018-08-31 08:00:25 +00:00
Eli Friedman d5d0a4d27f [ARM] Enable GEP offset splitting for 32-bit ARM.
It has essentially the same benefit it has on 64-bit ARM: it
substantially reduces the number of constants used by large GEP
operations. Seems to be generally helpful across a few different
codebases I've tried.

Differential Revision: https://reviews.llvm.org/D51462

llvm-svn: 341136
2018-08-30 22:18:27 +00:00
Evandro Menezes 64ed9a7fe8 [ARM] Adjust the feature set for Exynos
Enable `FeatureUseAA` for all Exynos processors.

llvm-svn: 341101
2018-08-30 19:22:00 +00:00
Alexander Ivchenko af96112ec6 Make TargetInstrInfo::isCopyInstr return true for regular COPY-instructions
..Move all target-dependent checks into new isCopyInstrImpl method.

This change allows us to treat MoveReg-type instructions and generic
COPY instruction in the same way

Differential Revision: https://reviews.llvm.org/D49913

llvm-svn: 341072
2018-08-30 14:32:47 +00:00
Ties Stuij 9c16d809d2 [CodeGen] emit inline asm clobber list warnings for reserved (cont)
Summary:
This is a continuation of https://reviews.llvm.org/D49727
Below the original text, current changes in the comments:

Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results.

For example:

  extern int bar(int[]);
  
  int foo(int i) {
    int a[i]; // VLA
    asm volatile(
        "mov r7, #1"
      :
      :
      : "r7"
    );
  
    return 1 + bar(a);
  }

Compiled for thumb, this gives:

  $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb
  ...
  foo:
          .fnstart
  @ %bb.0:                                @ %entry
          .save   {r4, r5, r6, r7, lr}
          push    {r4, r5, r6, r7, lr}
          .setfp  r7, sp, #12
          add     r7, sp, #12
          .pad    #4
          sub     sp, #4
          movs    r1, #7
          add.w   r0, r1, r0, lsl #2
          bic     r0, r0, #7
          sub.w   r0, sp, r0
          mov     sp, r0
          @APP
          mov.w   r7, #1
          @NO_APP
          bl      bar
          adds    r0, #1
          sub.w   r4, r7, #12
          mov     sp, r4
          pop     {r4, r5, r6, r7, pc}
  ...

r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block.

This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now.

The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter.

If we find a reserved register, we print a warning:

  repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm]
        "mov r7, #1"
        ^

Reviewers: efriedma, olista01, javed.absar

Reviewed By: efriedma

Subscribers: eraman, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D51165

llvm-svn: 341062
2018-08-30 12:52:35 +00:00
Florian Hahn 521dc4dda4 Fix "Q" and "R" inline assembly template modifiers for big-endian Arm
Consider the endianness of the target when printing register names.  This is in line with the documentation at http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers

Patch by Jackson Woodruff <jackson.woodruff@arm.com>

Reviewers: t.p.northover, echristo, javed.absar, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D49778

llvm-svn: 341052
2018-08-30 10:28:23 +00:00
Eli Friedman 96e3cd85bd [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.
The inline sequence is very long (about 70 bytes on Thumb1), so it's
not really a good idea to inline it, especially when optimizing for
size.

Differential Revision: https://reviews.llvm.org/D47917

llvm-svn: 340458
2018-08-22 21:47:14 +00:00
Martin Storsjo 5ab1d107bb [ARM] Avoid injecting constant islands in movw+movt pairs on Windows
On Windows, movw+movt pairs with relocations are handled with a single
relocation that covers them both. Therefore we can't inject anything
between these instructions, otherwise the relocation (which in LLVM
only is treated as the movw instruction's relocation, while the movt
instruction's relocation is dropped) will end up bogus.

These instructions are bundled up until right before the constant
islands pass, making this effectively the only place that can split
them apart.

Differential Revision: https://reviews.llvm.org/D51032

llvm-svn: 340451
2018-08-22 20:34:12 +00:00
Martin Storsjo d3b29223a8 [ARM] Move machine operand target flags to ARMBaseInstrInfo
This makes sure the flags are available for use for thumb MIR as well.

A test that requires this will be added in the next commit.

llvm-svn: 340450
2018-08-22 20:34:06 +00:00
Eli Friedman c11e2b9470 [ARM] Handle all-ones mask explicitly in targetShrinkDemandedConstant.
This avoids a potential infinite loop setting and unsetting bits in the
mask.

Reduced from a failure on the polly-aosp bot.

Differential Revision: https://reviews.llvm.org/D51066

llvm-svn: 340446
2018-08-22 20:13:45 +00:00
Sam Parker 4d519fc3b5 [ARM] Rotated operand patterns for *xtb16
Add intrinsic isel patterns for sxtb16, sxtab16, uxtb16 and uxtab16
so that they can perform a ror.

Differential Revision: https://reviews.llvm.org/D51034

llvm-svn: 340405
2018-08-22 12:58:36 +00:00
David Green 9dd1d451d9 [AArch64] Add Tiny Code Model for AArch64
This adds the plumbing for the Tiny code model for the AArch64 backend. This,
instead of loading addresses through the normal ADRP;ADD pair used in the Small
model, uses a single ADR. The 21 bit range of an ADR means that the code and
its statically defined symbols need to be within 1MB of each other.

This makes it mostly interesting for embedded applications where we want to fit
as much as we can in as small a space as possible.

Differential Revision: https://reviews.llvm.org/D49673

llvm-svn: 340397
2018-08-22 11:31:39 +00:00
Bernard Ogden b828bb2a15 [ARM/AArch64] Support FP16 +fp16fml instructions
Add +fp16fml feature for new FP16 instructions, which are a
mandatory part of FP16 from v8.4-A and an optional part of FP16
from v8.2-A. It doesn't seem to be possible to model this in
LLVM, but the relationship between the options is handled by
the related clang patch.

In keeping with what I think is the usual practice, the fp16fml
extension is accepted regardless of base architecture version.

Builds on/replaces Sjoerd Meijer's patch to add these instructions at
https://reviews.llvm.org/D49839.

Differential Revision: https://reviews.llvm.org/D50228

llvm-svn: 340013
2018-08-17 11:29:49 +00:00
Sjoerd Meijer 31239a4c6a [ARM][NFC] ARMCodeGenPrepare: some refactoring and algorithm description
Differential Revision: https://reviews.llvm.org/D50846

llvm-svn: 339997
2018-08-17 07:34:01 +00:00
Chandler Carruth c73c0307fe [MI] Change the array of `MachineMemOperand` pointers to be
a generically extensible collection of extra info attached to
a `MachineInstr`.

The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.

Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.

I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).

Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.

This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.

The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.

Differential Revision: https://reviews.llvm.org/D50701

llvm-svn: 339940
2018-08-16 21:30:05 +00:00
Sam Parker 0d51197051 [ARM] Ignore GEPs in ARMCodeGenPrepare
While searching through the use-def tree, ignore GetElementPtrInst
instructions because they don't need promoting and neither do their
indices. Otherwise, the wide indices prevent the transformation from
happening.

Differential Revision: https://reviews.llvm.org/D50762

llvm-svn: 339871
2018-08-16 12:24:40 +00:00
Sam Parker 0e2f0bd48e [ARM] Allow zext in ARMCodeGenPrepare
Treat zext instructions as roots, like we do for truncs.

Differential Revision: https://reviews.llvm.org/D50759

llvm-svn: 339868
2018-08-16 11:54:09 +00:00
Sam Parker 13567dbbd8 [ARM] Allow signed icmps in ARMCodeGenPrepare
Originally committed in r339755 which was reverted in r339806 due to
an asan issue. The issue was caused by my assumption that operands to
a CallInst mapped to the FunctionType Params. CallInsts are now
handled by iterating over their ArgOperands instead of Operands.
    
Original Message:
  Treat signed icmps as 'sinks', allowing them to be in the use-def
  tree, enabling more promotions to be performed. As a sink, any
  promoted incoming values need to be truncated before being used by
  the signed icmp.
    
  Differential Revision: https://reviews.llvm.org/D50067

llvm-svn: 339858
2018-08-16 10:05:39 +00:00
Vitaly Buka ed4239f482 Revert "[ARM] Allow signed icmps in ARMCodeGenPrepare"
use-after-poison in check-llvm under asan

This reverts commit r339755.

llvm-svn: 339806
2018-08-15 20:09:35 +00:00
Sam Parker fabf7fe5f8 [ARM] TypeSize lower bound for ARMCodeGenPrepare
We only try to promote types with are smaller than 16-bits, but we
also need to check that the type is not less than 8-bits.

Differential Revision: https://reviews.llvm.org/D50769

llvm-svn: 339770
2018-08-15 13:29:50 +00:00
Sam Parker 6548cd3905 [ARM] Allow signed icmps in ARMCodeGenPrepare
Treat signed icmps as 'sinks', allowing them to be in the use-def
tree, enabling more promotions to be performed. As a sink, any
promoted incoming values need to be truncated before being used by
the signed icmp.

Differential Revision: https://reviews.llvm.org/D50067

llvm-svn: 339755
2018-08-15 08:23:03 +00:00
Sam Parker 7def86bbdb [ARM] Allow pointer values in ARMCodeGenPrepare
Add pointers to the list of allowed types, but don't try to promote
them. Also fixed a bug with the promotion of undef values, so a new
value is now created instead of mutating in place. We also now only
promote if there's an instruction in the use-def chains other than
the icmp, sinks and sources.

Differential Revision: https://reviews.llvm.org/D50054

llvm-svn: 339754
2018-08-15 07:52:35 +00:00
Chandler Carruth 66654b72c9 [SDAG] Remove the reliance on MI's allocation strategy for
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.

Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.

This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.

This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.

As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]

Differential Revision: https://reviews.llvm.org/D50680

llvm-svn: 339740
2018-08-14 23:30:32 +00:00
Eli Friedman 0d12e90bf5 [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.
Intentionally excluding nodes from the DAGCombine worklist is likely to
lead to weird optimizations and infinite loops, so it's generally a bad
idea.

To avoid the infinite loops, fix DAGCombine to use the
isDesirableToCommuteWithShift target hook before performing the
transforms in question, and implement the target hook in the ARM backend
disable the transforms in question.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a
reduced testcase for that bug. But we should have sufficient test
coverage for PerformSHLSimplify given that we're not playing weird
tricks with the worklist. I can try to bugpoint it if necessary,
though.)

Differential Revision: https://reviews.llvm.org/D50667

llvm-svn: 339734
2018-08-14 22:10:25 +00:00
Sjoerd Meijer 3c859b3ec3 [ARM] ParallelDSP: add option to enable/disable the pass
Differential Revision: https://reviews.llvm.org/D50511

llvm-svn: 339645
2018-08-14 07:43:49 +00:00
Luke Geeson 4ce41d2bb7 [ARM] Added FP16 VREV Vector Instrinsic CodeGen support
llvm-svn: 339546
2018-08-13 08:37:41 +00:00
Eli Friedman 6b84a48953 Fix unused lambda capture warning from r339472.
llvm-svn: 339479
2018-08-10 22:03:25 +00:00
Eli Friedman e1687a89e8 [ARM] Adjust AND immediates to make them cheaper to select.
LLVM normally prefers to minimize the number of bits set in an AND
immediate, but that doesn't always match the available ARM instructions.
In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer
a two-instruction sequence movs+ands or movs+bics.

Some potential improvements outlined in
ARMTargetLowering::targetShrinkDemandedConstant, but seems to work
pretty well already.

The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX
instruction due to a larger-than-expected mask. (It's orthogonal, in
some sense, but as far as I can tell it's either impossible or nearly
impossible to reproduce the bug without this change.)

According to my testing, this seems to consistently improve codesize by
a small amount by forming bic more often for ISD::AND with an immediate.

Differential Revision: https://reviews.llvm.org/D50030

llvm-svn: 339472
2018-08-10 21:21:53 +00:00
Sam Parker 8c4b964c5a [ARM] Disallow zexts in ARMCodeGenPrepare
Enabling ARMCodeGenPrepare by default caused a whole load of
failures. This is due to zexts and truncs not being handled properly.
ZExts are messy so it's just easier to disable for now and truncs
are allowed only as 'sinks'. I still need to figure out why allowing
them as 'sources' causes so many failures. The other main changes are
that we are explicit in the types that we converting to, it's now
always 'TypeSize'. Type support is also now performed while checking
for valid opcodes as it unnecessarily complicated having the checks
are different stages.
    
I've moved the tests around too, so we have the zext and truncs in
their own file as well as the overflowing opcode tests.

Differential Revision: https://reviews.llvm.org/D50518

llvm-svn: 339432
2018-08-10 13:57:13 +00:00
Evandro Menezes 8c4366273c [ARM] Adjust the feature set for Exynos
Enable `FeatureZCZeroing`, `FeatureHasSlowFPVMLx`, `FeatureExpandMLx`,
`FeatureProfUnpredicate`, `FeatureSlowVDUP32`, `FeatureSlowVGETLNi32`,
`FeatureSplatVFPToNeon`, `FeatureHasRetAddrStack`, `FeatureSlowFPBrcc` for
all Exynos processors.

llvm-svn: 339356
2018-08-09 16:34:38 +00:00
Evandro Menezes 9a92fe0c9e [ARM] Replace processor check with feature
Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a
processor check.  Otherwise, NFC.

llvm-svn: 339354
2018-08-09 16:13:24 +00:00
Sjoerd Meijer 806f70d229 [ARM] FP16: codegen support for VTRN
Differential Revision: https://reviews.llvm.org/D50454

llvm-svn: 339340
2018-08-09 12:45:09 +00:00
Eli Friedman 5b45a39056 [ARM] Avoid spilling lr with Thumb1 tail calls.
Normally, if any registers are spilled, we prefer to spill lr on Thumb1
so we can fold the "bx lr" into the "pop".  However, if there are tail
calls involved, restoring lr is expensive, so skip the optimization in
that case.

The spill of r7 in the new test also isn't necessary, but that's
mostly orthogonal to this patch. (It's the same code in
ARMFrameLowering, but it's not related to tail calls.)

Differential Revision: https://reviews.llvm.org/D49459

llvm-svn: 339283
2018-08-08 20:03:10 +00:00
Sjoerd Meijer f8c394f0f5 [ARM] FP16: codegen support for VEXT
Differential Revision: https://reviews.llvm.org/D50427

llvm-svn: 339241
2018-08-08 13:26:38 +00:00
Sjoerd Meijer db5908deb9 [ARM] FP16: vector vmov and vdup support
This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants.

Differential Revision: https://reviews.llvm.org/D50329

llvm-svn: 339238
2018-08-08 13:11:31 +00:00
Sjoerd Meijer 920a453485 [ARM] FP16: vector VMUL variants
This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants.

Differential Revision: https://reviews.llvm.org/D50326

llvm-svn: 339232
2018-08-08 10:27:34 +00:00
Sjoerd Meijer b33a4c02cc [ARM] FP16: support vector INT_TO_FP and FP_TO_INT
This adds codegen support for the different vcvt_f16 variants.

Differential Revision: https://reviews.llvm.org/D50393

llvm-svn: 339227
2018-08-08 09:45:34 +00:00
Sjoerd Meijer b264944ed5 [ARM] FP16: support the vector vmin and vmax variants
Differential Revision: https://reviews.llvm.org/D50238

llvm-svn: 339221
2018-08-08 07:20:15 +00:00
Sjoerd Meijer b39cd886b9 [ARM] FP16: codegen support for VACGT
Differential Revision: https://reviews.llvm.org/D50236

llvm-svn: 339148
2018-08-07 15:11:47 +00:00
Tim Northover 9956e4a24b ARM-MachO: don't add Thumb bit for addend to non-external relocation.
ld64 supplies its own Thumb bit for Thumb functions, and intentionally zeroes
out that part of any addend in an object file. But it only does that for
symbols marked N_EXT -- i.e. external symbols. So LLVM should avoid setting
that extra bit in other cases.

llvm-svn: 339007
2018-08-06 11:32:44 +00:00
Sjoerd Meijer d62c5ec2fe [ARM] FP16: support vector zip and unzip
This is addressing PR38404.

Differential Revision: https://reviews.llvm.org/D50186

llvm-svn: 338835
2018-08-03 09:24:29 +00:00
Sjoerd Meijer 9b30213828 [ARM] FP16: support VFMA
This is addressing PR38404.

llvm-svn: 338830
2018-08-03 09:12:56 +00:00
Alexander Ivchenko 49168f6778 [GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value
This is logical continuation of https://reviews.llvm.org/D46018 (r332449)

Differential Revision: https://reviews.llvm.org/D49660

llvm-svn: 338685
2018-08-02 08:33:31 +00:00
Reid Kleckner b32ff46ff7 Revert r338354 "[ARM] Revert r337821"
Disable ARMCodeGenPrepare by default again. It is causing verifier
failues in V8 that look like:

  Duplicate integer as switch case
  switch i32 %trunc, label %if.end13 [
    i32 0, label %cleanup36
    i32 0, label %if.then8
  ], !dbg !4981
  i32 0
  fatal error: error in backend: Broken function found, compilation aborted!

I will continue reducing the test case and send it along.

llvm-svn: 338452
2018-07-31 23:09:42 +00:00
Martin Storsjo 293079f2de [ARM] Allow automatically deducing the thumb instruction size for .inst
This matches GAS, that allows unsuffixed .inst for thumb.

Differential Revision: https://reviews.llvm.org/D49937

llvm-svn: 338357
2018-07-31 09:27:07 +00:00
Martin Storsjo af18947f0a [ARM] Support the .inst directive for MachO and COFF targets
Contrary to ELF, we don't add any markers that distinguish data generated
with .short/.long from normal instructions, so the .inst directive only
adds compatibility with assembly that uses it.

Differential Revision: https://reviews.llvm.org/D49936

llvm-svn: 338356
2018-07-31 09:27:01 +00:00
Sam Parker 2a6c842fda [ARM] Revert r337821
Re-enabling ARMCodeGenPrepare by default after failing to reproduce
the bootstrap issues that I was concerned it was causing.

llvm-svn: 338354
2018-07-31 09:04:14 +00:00
Fangrui Song f78650a8de Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

llvm-svn: 338293
2018-07-30 19:41:25 +00:00
Thomas Preud'homme 6c1b075299 Fix uninitialized read in ARM's PrintAsmOperand
Summary:
Fix read of uninitialized RC variable in ARM's PrintAsmOperand when
hasRegClassConstraint returns false. This was causing
inline-asm-operand-implicit-cast test to fail in r338206.

Reviewers: t.p.northover, weimingz, javed.absar, chill

Reviewed By: chill

Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D49984

llvm-svn: 338268
2018-07-30 16:45:40 +00:00
Petr Pavlu 8b6eff4e77 [ARM] Fix over-alignment in arguments that are HA of 128-bit vectors
Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling
homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up
fully on stack, the function tries to pack all resulting items of the
aggregate as tightly as possible according to AAPCS.

Once the first item was laid out, the alignment used for consecutive
items was the size of one item. This logic went wrong for 128-bit
vectors because their alignment is normally only 64 bits, and so could
result in inserting unexpected padding between the first and second
element.

The patch fixes the problem by updating the alignment with the item size
only if this results in reducing it.

Differential Revision: https://reviews.llvm.org/D49720

llvm-svn: 338233
2018-07-30 08:49:30 +00:00
Matt Arsenault 81920b0a25 DAG: Add calling convention argument to calling convention funcs
This seems like a pretty glaring omission, and AMDGPU
wants to treat kernels differently from other calling
conventions.

llvm-svn: 338194
2018-07-28 13:25:19 +00:00
Evandro Menezes fcca45f0dd [ARM] Add new target feature to fuse literal generation
This feature enables the fusion of such operations on Cortex A57 and Cortex
A72, as recommended in their Software Optimisation Guides, sections 4.14 and
4.11, respectively.

Differential revision: https://reviews.llvm.org/D49563

llvm-svn: 338147
2018-07-27 18:16:47 +00:00
Martin Storsjo d78b394543 Add missing 'override', fixing compilation with some compilers since SVN r337950
llvm-svn: 337952
2018-07-25 19:01:36 +00:00
Martin Storsjo d2662c32fb [COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinter
In SVN r334523, the first half of comdat constant pool handling was
hoisted from X86WindowsTargetObjectFile (which despite the name only
was used for msvc targets) into the arch independent
TargetLoweringObjectFileCOFF, but the other half of the handling was
left behind in X86AsmPrinter::GetCPISymbol.

With only half of the handling in place, inconsistent comdat
sections/symbols are created, causing issues with both GNU binutils
(avoided for X86 in SVN r335918) and with the MS linker, which
would complain like this:

fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4

Differential Revision: https://reviews.llvm.org/D49644

llvm-svn: 337950
2018-07-25 18:35:31 +00:00
Eli Friedman 733f4ed1bb [ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1.
Saves materializing the immediate for the "ands".

Corresponding patterns exist for lsrs+lsls, but that seems less common
in practice.

Now implemented as a DAGCombine.

Differential Revision: https://reviews.llvm.org/D49585

llvm-svn: 337945
2018-07-25 18:22:22 +00:00
Sam Parker 8b93e82c3d [ARM] Disable ARMCodeGenPrepare by default
ARM Stage 2 builders have been suspiciously broken since the pass was
committed. Disabling to hopefully fix the bots and give me time to
debug.

llvm-svn: 337821
2018-07-24 12:04:23 +00:00
Fangrui Song 58407ca045 [ARM] Use unique_ptr to fix memory leak introduced in r337701
llvm-svn: 337714
2018-07-23 17:43:21 +00:00
Jordan Rupprecht e5daf61229 OpChain has subclasses, so add a virtual destructor.
Summary:
OpChain has subclasses, so add a virtual destructor.

This fixes an issue when deleting subclasses of OpChain (see MatchSMLAD() specifically) in r337701.

Reviewers: javed.absar

Subscribers: llvm-commits, SjoerdMeijer, samparker

Differential Revision: https://reviews.llvm.org/D49681

llvm-svn: 337713
2018-07-23 17:38:05 +00:00
Matt Morehouse d773cb99f1 [ARM] Follow-up to r337709.
Fix double-free.

llvm-svn: 337711
2018-07-23 17:22:53 +00:00
Matt Morehouse a70685f63a [ARM] Add doFinalization() to ARMCodeGenPrepare pass.
Attempt to fix the leak introduced in r337687 and make sanitizer
buildbots green again.

llvm-svn: 337709
2018-07-23 17:00:45 +00:00
Sam Parker 89a3799a69 [ARM][NFC] ParallelDSP reorganisation
In preparing to allow ARMParallelDSP pass to parallelise more than
smlads, I've restructed some elements:

- The ParallelMAC struct has been renamed to BinOpChain.
- The BinOpChain struct holds two value lists: LHS and RHS, as well
  as inheriting from the OpChain base class.
- The OpChain struct holds all the values of the represented chain
  and has had the memory locations functionality inserted into it.
- ParallelMACList becomes OpChainList and it now holds pointers
  instead of objects.

Differential Revision: https://reviews.llvm.org/D49020

llvm-svn: 337701
2018-07-23 15:25:59 +00:00
Sam Parker 3828c6ff94 [ARM] ARMCodeGenPrepare backend pass
Arm specific codegen prepare is implemented to perform type promotion
on icmp operands, which can enable the removal of uxtb and uxth
(unsigned extend) instructions. This is possible because performing
type promotion before ISel alleviates this duty from the DAG builder
which has to perform legalisation, but has a limited view on data
ranges.
    
The pass visits any instruction operand of an icmp and creates a
worklist to traverse the use-def tree to determine whether the values
can simply be promoted. Our concern is values in the registers
overflowing the narrow (i8, i16) data range, so instructions marked
with nuw can be promoted easily. For add and sub instructions, we are
able to use the parallel dsp instructions to operate on scalar data
types and avoid overflowing bits. Underflowing adds and subs are also
permitted when the result is only used by an unsigned icmp.

Differential Revision: https://reviews.llvm.org/D48832

llvm-svn: 337687
2018-07-23 12:27:47 +00:00
Evandro Menezes fffa9b5897 [ARM] Add new feature to enable optimizing the VFP registers
Enable the optimization of operations on DPR and SPR via a feature instead
of checking the target.

Differential revision: https://reviews.llvm.org/D49463

llvm-svn: 337575
2018-07-20 16:49:28 +00:00
Tim Northover a7c18ee8c4 ARM: switch armv7em MachO triple to hard-float defaults and libcalls.
We were emitting incorrect calls to libm functions that LLVM had decided it
knew about because the default is soft-float.

Recommitted without breaking ELF this time.

llvm-svn: 337450
2018-07-19 12:44:51 +00:00
Tim Northover da142d10d9 Revert "ARM: switch armv7em triple to hard-float defaults and libcalls."
This reverts commit r337385 until it can be targeted at MachO only.

llvm-svn: 337424
2018-07-18 21:32:49 +00:00
Tim Northover e00cf4fc68 ARM: stop explicitly marking armv7k libcalls as hard-float. NFC.
Since the triple's default is hard float, the libcalls will already use VFP
registers.

llvm-svn: 337386
2018-07-18 12:37:43 +00:00
Tim Northover d4abd14c1b ARM: switch armv7em triple to hard-float defaults and libcalls.
We were emitting incorrect calls to libm functions that LLVM had decided it
knew about because the default is soft-float.

llvm-svn: 337385
2018-07-18 12:37:04 +00:00
Tim Northover 097a3e3d95 ARM: deduplicate hard-float detection code. NFC.
ARMSubtarget had a copy/pasted block to determine whether the target was
hard-float, but it just delegated to triple features anyway so it's better at
the TargetMachine level.

llvm-svn: 337384
2018-07-18 12:36:25 +00:00
Sjoerd Meijer 53449daa96 [ARM] ParallelDSP: multiple reduction stmts in loop
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.

Differential revision: https://reviews.llvm.org/D49125

llvm-svn: 336795
2018-07-11 12:36:25 +00:00
Eli Friedman d2c739230c [ARM] Treat cmn immediates as legal in isLegalICmpImmediate.
The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions.  Fix the type
conversions, and perform the correct check for negative immediates.

This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.

Differential Revision: https://reviews.llvm.org/D48907

llvm-svn: 336742
2018-07-10 23:44:37 +00:00
Sjoerd Meijer b3e06faa28 [ARM] ParallelDSP: added statistics, NFC.
Added statistics for the number of SMLAD instructions created, and
als renamed the pass name to -arm-parallel-dsp.

Differential Revision: https://reviews.llvm.org/D48971

llvm-svn: 336441
2018-07-06 14:47:09 +00:00
Sjoerd Meijer 2a57b357a3 [AArch64][ARM] Armv8.4-A: Trace synchronization barrier instruction
This adds the Armv8.4-A Trace synchronization barrier (TSB) instruction.

Differential Revision: https://reviews.llvm.org/D48918

llvm-svn: 336418
2018-07-06 08:03:12 +00:00
Ivan A. Kosarev 466037900c [NEON] Fix combining of vldx_dup intrinsics with updating of base addresses
Resolves:
Unsupported ARM Neon intrinsics in Target-specific DAG combine
function for VLDDUP
https://bugs.llvm.org/show_bug.cgi?id=38031

Related diff: D48439

Differential Revision: https://reviews.llvm.org/D48920

llvm-svn: 336325
2018-07-05 08:59:49 +00:00
Sjoerd Meijer 27be58b307 [ARM] ParallelDSP: only support i16 loads for now
We were miscompiling i8 loads, so reject them as unsupported narrow operations
for now.

Differential Revision: https://reviews.llvm.org/D48944

llvm-svn: 336319
2018-07-05 08:21:40 +00:00
Volodymyr Turanskyy 17c0c4e742 [ARM] [Assembler] Support negative immediates: cover few missing cases
Support for negative immediates was implemented in
https://reviews.llvm.org/rL298380, however few instruction options were missing.

This change adds negative immediates support and respective tests
for the following:

ADD
ADDS
ADDS.W
AND.W
ANDS
BIC.W
BICS
BICS.W
SUB
SUBS
SUBS.W

Differential Revision: https://reviews.llvm.org/D48649

llvm-svn: 336286
2018-07-04 16:11:15 +00:00
Fangrui Song 68169343a5 [ARM] Fix inconsistent declaration parameter name in r336195
llvm-svn: 336223
2018-07-03 19:12:27 +00:00
Sam Parker ffc1681620 [ARM][NFC] Refactor sequential access for DSP
With a view to support parallel operations that have their results
stored to memory, refactor the consecutive access helper out so it
could support stores instructions.

Differential Revision: https://reviews.llvm.org/D48872

llvm-svn: 336195
2018-07-03 12:44:16 +00:00
Vadzim Dambrouski fd10286e04 [ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m.
Reviewers: efriedma, rogfer01, javed.absar

Reviewed By: efriedma, rogfer01

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D48846

llvm-svn: 336144
2018-07-02 21:05:26 +00:00
David Green 963401d2be [UnrollAndJam] New Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

  for i..
    ForeBlocks(i)
    for j..
      SubLoopBlocks(i, j)
    AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

  for i... i+=2
    ForeBlocks(i)
    ForeBlocks(i+1)
    for j..
      SubLoopBlocks(i, j)
      SubLoopBlocks(i+1, j)
    AftBlocks(i)
    AftBlocks(i+1)
  Remainder Loop

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953

llvm-svn: 336062
2018-07-01 12:47:30 +00:00
Sjoerd Meijer 195e904002 [ARM][AArch64] Armv8.4-A Enablement
Initial patch adding assembly support for Armv8.4-A.

Besides adding v8.4 as a supported architecture to the usual places, this also
adds target features for the different crypto algorithms. Armv8.4-A introduced
new crypto algorithms, made them optional, and allows different combinations:

- none of the v8.4 crypto functions are supported, which is independent of the
  implementation of the Armv8.0 SHA1 and SHA2 instructions.
- the v8.4 SHA512 and SHA3 support is implemented, in this case the Armv8.0
  SHA1 and SHA2 instructions must also be implemented.
- the v8.4 SM3 and SM4 support is implemented, which is independent of the
  implementation of the Armv8.0 SHA1 and SHA2 instructions.
- all of the v8.4 crypto functions are supported, in this case the Armv8.0 SHA1
  and SHA2 instructions must also be implemented.

The v8.4 crypto instructions are added to AArch64 only, and not AArch32,
and are made optional extensions to Armv8.2-A.

The user-facing Clang options will map on these new target features, their
naming will be compatible with GCC and added in follow-up patches.

The Armv8.4-A instruction sets can be downloaded here:
https://developer.arm.com/products/architecture/a-profile/exploration-tools

Differential Revision: https://reviews.llvm.org/D48625

llvm-svn: 335953
2018-06-29 08:43:19 +00:00
Eli Friedman 65d885e376 [ARM] Assert that ARMDAGToDAGISel creates valid UBFX/SBFX nodes.
We don't ever check these again (unless you're using
-fno-integrated-as), so make sure the extracted bits are well-defined.

I don't think it's possible to trigger any of the assertions on trunk,
but it's difficult to prove.  (The first one depends on DAGCombine to
minimize the number of set bits in AND masks; I think the others are
mathematically impossible to hit.)

llvm-svn: 335931
2018-06-28 21:49:41 +00:00
Eli Friedman 6613efbd4e [ARM] Add missing Thumb2 assembler diagnostics.
Mostly just adding checks for Thumb2 instructions which correspond to
ARM instructions which already had diagnostics. While I'm here, also fix
ARM-mode strd to check the input registers correctly.

Differential Revision: https://reviews.llvm.org/D48610

llvm-svn: 335909
2018-06-28 19:53:12 +00:00