Commit Graph

3434 Commits

Author SHA1 Message Date
Simon Pilgrim a39389ebff [ARM] Regenerated arg endian test
As requested on D49262

llvm-svn: 336980
2018-07-13 09:16:56 +00:00
Joel E. Denny 9fa9c9368d [FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests
See https://reviews.llvm.org/D47106 for details.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D47171

This commit drops that patch's changes to:

  llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
  llvm/test/CodeGen/NVPTX/param-load-store.ll

For some reason, the dos line endings there prevent me from commiting
via the monorepo.  A follow-up commit (not via the monorepo) will
finish the patch.

llvm-svn: 336843
2018-07-11 20:25:49 +00:00
Sjoerd Meijer 53449daa96 [ARM] ParallelDSP: multiple reduction stmts in loop
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.

Differential revision: https://reviews.llvm.org/D49125

llvm-svn: 336795
2018-07-11 12:36:25 +00:00
Eli Friedman d2c739230c [ARM] Treat cmn immediates as legal in isLegalICmpImmediate.
The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions.  Fix the type
conversions, and perform the correct check for negative immediates.

This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.

Differential Revision: https://reviews.llvm.org/D48907

llvm-svn: 336742
2018-07-10 23:44:37 +00:00
Nico Weber 038dbf3c24 Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084.
llvm-svn: 336453
2018-07-06 17:37:24 +00:00
Sjoerd Meijer b3e06faa28 [ARM] ParallelDSP: added statistics, NFC.
Added statistics for the number of SMLAD instructions created, and
als renamed the pass name to -arm-parallel-dsp.

Differential Revision: https://reviews.llvm.org/D48971

llvm-svn: 336441
2018-07-06 14:47:09 +00:00
Diogo N. Sampaio 81e9dd1ed7 Commit rL336426 cause buildbot failures
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/50537/testReport/junit/LLVM/CodeGen_AArch64/FoldRedundantShiftedMasking_ll/

This removes the comments of the function label causing this error.

llvm-svn: 336440
2018-07-06 14:41:09 +00:00
Diogo N. Sampaio 742bf1a255 [SelectionDAG] https://reviews.llvm.org/D48278
D48278

Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB

can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.

llvm-svn: 336426
2018-07-06 09:42:25 +00:00
Ivan A. Kosarev 466037900c [NEON] Fix combining of vldx_dup intrinsics with updating of base addresses
Resolves:
Unsupported ARM Neon intrinsics in Target-specific DAG combine
function for VLDDUP
https://bugs.llvm.org/show_bug.cgi?id=38031

Related diff: D48439

Differential Revision: https://reviews.llvm.org/D48920

llvm-svn: 336325
2018-07-05 08:59:49 +00:00
Mikael Holmen 8505f34b29 Partial revert of "NFC - Various typo fixes in tests"
This partially reverts r336268 since it causes buildbot failures.

Added FIXME at the places where the CHECKs are misspelled.

llvm-svn: 336323
2018-07-05 08:42:16 +00:00
Sjoerd Meijer 27be58b307 [ARM] ParallelDSP: only support i16 loads for now
We were miscompiling i8 loads, so reject them as unsupported narrow operations
for now.

Differential Revision: https://reviews.llvm.org/D48944

llvm-svn: 336319
2018-07-05 08:21:40 +00:00
Gabor Buella da4a966e1c NFC - Various typo fixes in tests
llvm-svn: 336268
2018-07-04 13:28:39 +00:00
Vadzim Dambrouski fd10286e04 [ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m.
Reviewers: efriedma, rogfer01, javed.absar

Reviewed By: efriedma, rogfer01

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D48846

llvm-svn: 336144
2018-07-02 21:05:26 +00:00
Sjoerd Meijer c89ca5582a [ARM] Parallel DSP Pass
Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of
this pass is to do some straightforward IR pattern matching to create ACLE DSP
intrinsics, which map on these 32-bit SIMD operations.

Currently, only the SMLAD instruction gets recognised. This instruction
performs two multiplications with 16-bit operands, and stores the result in an
accumulator. We will follow this up with patches to recognise SMLAD in more
cases, and also to generate other DSP instructions (like e.g. SADD16).

Patch by: Sam Parker and Sjoerd Meijer

Differential Revision: https://reviews.llvm.org/D48128

llvm-svn: 335850
2018-06-28 12:55:29 +00:00
Ivan A. Kosarev 7231598fce [NEON] Support vldNq intrinsics in AArch32 (LLVM part)
This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.

Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.

The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.

This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.

Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.

That's what we generate with this patch applied:

vld2q_dup_f16:
  vld2.16 {d0[], d2[]}, [r0]
  vld2.16 {d1[], d3[]}, [r0]

vld3q_dup_f16:
  vld3.16 {d0[], d2[], d4[]}, [r0]
  vld3.16 {d1[], d3[], d5[]}, [r0]

vld4q_dup_f16:
  vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
  vld4.16 {d1[], d3[], d5[], d7[]}, [r0]

Differential Revision: https://reviews.llvm.org/D48439

llvm-svn: 335733
2018-06-27 13:57:52 +00:00
Than McIntosh 3190993a02 [X86,ARM] Retain split-stack prolog check for sibling calls
Summary:
If a routine with no stack frame makes a sibling call, we need to
preserve the stack space check even if the local stack frame is empty,
since the call target could be a "no-split" function (in which case
the linker needs to be able to fix up the prolog sequence in order to
switch to a larger stack).

This fixes PR37807.

Reviewers: cherry, javed.absar

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D48444

llvm-svn: 335604
2018-06-26 14:11:30 +00:00
George Rimar dcf59c5480 Recommit r335333 "[MC] - Add .stack_size sections into groups and link them with .text"
With compilation fix.

Original commit message:

D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. 
    The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
    eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
   With that linker will be able to do -gc-sections on dead stack sizes sections.

Differential revision: https://reviews.llvm.org/D46874

llvm-svn: 335336
2018-06-22 10:53:47 +00:00
George Rimar 6d448da1be Revert r335332 "[MC] - Add .stack_size sections into groups and link them with .text"
It broke bots.

http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/12891
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/9443
http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-buildserver/builds/25551

llvm-svn: 335333
2018-06-22 10:27:33 +00:00
George Rimar e14485a0c6 [MC] - Add .stack_size sections into groups and link them with .text
D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. 
    The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
    eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
   With that linker will be able to do -gc-sections on dead stack sizes sections.

Differential revision: https://reviews.llvm.org/D46874

llvm-svn: 335332
2018-06-22 10:10:53 +00:00
Sjoerd Meijer 1043dffbd3 Recommit of r335326, with the test fixed that I missed.
llvm-svn: 335331
2018-06-22 10:03:03 +00:00
Sjoerd Meijer 7ee5b090de Reverting r335326 while I look at the test failure
llvm-svn: 335328
2018-06-22 09:17:08 +00:00
Sjoerd Meijer 8d2f1565b7 [ARM] ARMv6m and v8m.baseline strict align
This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline,
because it has no support for unaligned accesses.
It looks like we always pass target feature "+strict-align" from
Clang, so this is not a user facing problem, but querying the subtarget
(in e.g. llc) for unaligned access support is incorrect.

Differential Revision: https://reviews.llvm.org/D48437

llvm-svn: 335326
2018-06-22 08:48:13 +00:00
David Green 21a2973cc4 [ARM] Enable useAA() for the in-order Cortex-R52
This option allows codegen (such as DAGCombine or MI scheduling) to use alias
analysis information, which can help with the codegen on in-order cpu's,
especially machine scheduling. Here I have done things the same way as AArch64,
adding a subtarget feature to enable this for specific cores, and enabled it for
the R52 where we have a schedule to make use of it.

Differential Revision: https://reviews.llvm.org/D48074

llvm-svn: 335249
2018-06-21 15:48:29 +00:00
Sam Parker ad71333b92 [NFC][ARM] ldrd/strd negative tests
Add negative tests for load and stores of alignment 2.

llvm-svn: 335241
2018-06-21 14:53:06 +00:00
David Green a465188500 [DAGCombine] Fix alignment for offset loads/stores
The alignment parameter to getExtLoad is treated as a base alignment,
not the alignment of the load (base + offset). When we infer a better
alignment for a Ptr we need to ensure that it applies to the base to
prevent the alignment on the load from being wrong.

This fixes a bug where the alignment could then be used to incorrectly
prove noalias between a load and a store, leading to a miscompile.

Differential Revision: https://reviews.llvm.org/D48029

llvm-svn: 335210
2018-06-21 08:30:07 +00:00
Alina Sbirlea dfd14adeb0 Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor

Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)

2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:

1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.

2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
  i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
  ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
  - Since Pred is not deleted (BB is), the entry block does not need updating.
  - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
  - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
  - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
  - adding some test owner as subscribers for the interesting tests modified:
    test/CodeGen/X86/avx-cmp.ll
    test/CodeGen/AMDGPU/nested-loop-conditions.ll
    test/CodeGen/AMDGPU/si-annotate-cf.ll
    test/CodeGen/X86/hoist-spill.ll
    test/CodeGen/X86/2006-11-17-IllegalMove.ll

3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.

Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar

Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D48202

llvm-svn: 335183
2018-06-20 22:01:04 +00:00
Tim Northover 644a819534 ARM: convert ORR instructions to ADD where possible on Thumb.
Thumb has more 16-bit encoding space dedicated to ADD than ORR, allowing both a
3-address encoding and a wider range of immediates. So, particularly when
optimizing for code size (but it doesn't make things worse elsewhere) it's
beneficial to select an OR operation to an ADD if we know overflow won't occur.

This is made even better by LLVM's penchant for putting operations in canonical
form by converting the other way.

llvm-svn: 335119
2018-06-20 12:09:44 +00:00
Eli Friedman e6b4719244 [ARM] Add Thumb1 coverage for cmn testcases.
There's a missed optimization for immediates: we can save two
instructions by using adds instead of movs+mvns+cmp.

llvm-svn: 335002
2018-06-19 00:09:44 +00:00
Eli Friedman 801c2f4c3a [ARM] Testcase for missed optimization with i16 compare.
The result looks weird because the DAG actually has an explicit
shift; I haven't figured out why, exactly.

llvm-svn: 335000
2018-06-19 00:07:30 +00:00
Michael Berg 0c20447a02 easing the constraint for isNegatibleForFree and GetNegatedExpression
Summary:
Here we relax the old constraint which utilized unsafe with the TargetOption flag HonorSignDependentRoundingFPMathOption, with the assertion that unsafe is no longer needed or never was required for correctness on FDIV/FMUL.  



Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar

Reviewed By: spatel

Subscribers: efriedma, wdng, tpr

Differential Revision: https://reviews.llvm.org/D48057

llvm-svn: 334769
2018-06-14 20:54:13 +00:00
Matt Arsenault 5615fa0a87 DAG: Fix extract_subvector combine for a single element
This would fail before because 1x vectors aren't legal,
so instead just use the scalar type.

Avoids regressions in a future AMDGPU commit to add
v4i16/v4f16 as legal types.

Test update is just the one test that this triggers
on in tree now. It wasn't checking anything before.
The result is completely  changed since the selects
are eliminated. Not sure if it's considered better
or not.

llvm-svn: 334440
2018-06-11 21:27:41 +00:00
Ivan A. Kosarev 847daa11f8 [NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47447

llvm-svn: 334361
2018-06-10 09:27:27 +00:00
Eli Friedman 864df22307 [ARM] Allow CMPZ transforms even if the input has multiple uses.
It looks like this got left in by accident in r289794; I can't think of
any reason this check would be necessary.  (Maybe it was meant to be a
check that the AND has one use? But we check that a few lines earlier.)

Differential Revision: https://reviews.llvm.org/D47921

llvm-svn: 334322
2018-06-08 21:16:56 +00:00
Evandro Menezes b2c8244715 [AArch64, ARM] Add support for Samsung Exynos M4
Create a separate feature set for Exynos M4 and add test cases.

llvm-svn: 334115
2018-06-06 18:56:00 +00:00
David Green 25312b2b6c [GlobalMerge] Set the alignment on merged global structs
If no alignment is set, the abi/preferred alignment of structs will be
used which may be higher than required. This can lead to extra padding
and in the end an increase in data size.

Differential Revision: https://reviews.llvm.org/D47633

llvm-svn: 334099
2018-06-06 14:48:32 +00:00
Peter Smith 57f661bd7d [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.

Differential Revision: https://reviews.llvm.org/D44928

llvm-svn: 334078
2018-06-06 09:40:06 +00:00
Ivan A. Kosarev 60a991ed1a [NEON] Support VLD1xN intrinsics in AArch32 mode (LLVM part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47120

llvm-svn: 333825
2018-06-02 16:40:03 +00:00
Ivan A. Kosarev 73c5337a64 Revert r333819 "[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)"
The LLVM part was committed instead of the Clang part.

Differential Revision: https://reviews.llvm.org/D47121

llvm-svn: 333824
2018-06-02 16:38:38 +00:00
Ivan A. Kosarev 51f19b9ee1 [NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47121

llvm-svn: 333819
2018-06-02 16:26:42 +00:00
Eli Friedman 63fead0f43 [ARM] Enable SETCCCARRY lowering for Thumb1.
We've had Thumb1 support for ARMISD::SUBE for a while now, so this just
works.  Reduces codesize a bit for 64-bit integer comparisons.

Differential Revision: https://reviews.llvm.org/D47387

llvm-svn: 333445
2018-05-29 18:17:16 +00:00
Tim Northover 4e3eec39fa ARM: be conservative when asked load/store alignment of weird type.
Chances are we'll be asked again after type legalization, but before that point
it's better to claim misaligned accesses aren't allowed than to assert.

llvm-svn: 332840
2018-05-21 12:43:54 +00:00
Haicheng Wu 69ba0613f2 [GlobalMerge] Exit early if only one global is to be merged
To save some compilation time and prevent some unnecessary changes.

Differential Revision: https://reviews.llvm.org/D46640

llvm-svn: 332813
2018-05-19 18:00:02 +00:00
Sanjay Patel 5c48b73fff [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in https://reviews.llvm.org/D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

llvm-svn: 332638
2018-05-17 18:09:56 +00:00
Sanjay Patel 38892e5ce1 [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in https://reviews.llvm.org/D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

Follow-up to:
https://reviews.llvm.org/rL332538
...because that change wasn't enough.

llvm-svn: 332637
2018-05-17 18:08:27 +00:00
Sanjay Patel 2c1846de2d [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

llvm-svn: 332539
2018-05-16 22:20:33 +00:00
Sanjay Patel ce20ac0bc5 [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

llvm-svn: 332538
2018-05-16 22:20:26 +00:00
Sanjay Patel 066309f4ed [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

llvm-svn: 332537
2018-05-16 22:20:11 +00:00
Sanjay Patel 6dcda28ddc [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

llvm-svn: 332533
2018-05-16 21:57:19 +00:00
Sanjay Patel 82eca55953 [ARM] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

llvm-svn: 332532
2018-05-16 21:57:00 +00:00
Sirish Pande cabe50a308 [AArch64] Gangup loads and stores for pairing.
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.

Differential Revision https://reviews.llvm.org/D46477

llvm-svn: 332482
2018-05-16 15:36:52 +00:00