See https://reviews.llvm.org/D47106 for details.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D47171
This commit drops that patch's changes to:
llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
llvm/test/CodeGen/NVPTX/param-load-store.ll
For some reason, the dos line endings there prevent me from commiting
via the monorepo. A follow-up commit (not via the monorepo) will
finish the patch.
llvm-svn: 336843
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.
Differential revision: https://reviews.llvm.org/D49125
llvm-svn: 336795
The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions. Fix the type
conversions, and perform the correct check for negative immediates.
This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.
Differential Revision: https://reviews.llvm.org/D48907
llvm-svn: 336742
Added statistics for the number of SMLAD instructions created, and
als renamed the pass name to -arm-parallel-dsp.
Differential Revision: https://reviews.llvm.org/D48971
llvm-svn: 336441
D48278
Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB
can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.
llvm-svn: 336426
We were miscompiling i8 loads, so reject them as unsupported narrow operations
for now.
Differential Revision: https://reviews.llvm.org/D48944
llvm-svn: 336319
Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of
this pass is to do some straightforward IR pattern matching to create ACLE DSP
intrinsics, which map on these 32-bit SIMD operations.
Currently, only the SMLAD instruction gets recognised. This instruction
performs two multiplications with 16-bit operands, and stores the result in an
accumulator. We will follow this up with patches to recognise SMLAD in more
cases, and also to generate other DSP instructions (like e.g. SADD16).
Patch by: Sam Parker and Sjoerd Meijer
Differential Revision: https://reviews.llvm.org/D48128
llvm-svn: 335850
This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.
Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.
The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.
This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.
Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.
That's what we generate with this patch applied:
vld2q_dup_f16:
vld2.16 {d0[], d2[]}, [r0]
vld2.16 {d1[], d3[]}, [r0]
vld3q_dup_f16:
vld3.16 {d0[], d2[], d4[]}, [r0]
vld3.16 {d1[], d3[], d5[]}, [r0]
vld4q_dup_f16:
vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
vld4.16 {d1[], d3[], d5[], d7[]}, [r0]
Differential Revision: https://reviews.llvm.org/D48439
llvm-svn: 335733
Summary:
If a routine with no stack frame makes a sibling call, we need to
preserve the stack space check even if the local stack frame is empty,
since the call target could be a "no-split" function (in which case
the linker needs to be able to fix up the prolog sequence in order to
switch to a larger stack).
This fixes PR37807.
Reviewers: cherry, javed.absar
Subscribers: srhines, llvm-commits
Differential Revision: https://reviews.llvm.org/D48444
llvm-svn: 335604
With compilation fix.
Original commit message:
D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.
This change does following two things on top:
1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs.
The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
With that linker will be able to do -gc-sections on dead stack sizes sections.
Differential revision: https://reviews.llvm.org/D46874
llvm-svn: 335336
D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.
This change does following two things on top:
1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs.
The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
With that linker will be able to do -gc-sections on dead stack sizes sections.
Differential revision: https://reviews.llvm.org/D46874
llvm-svn: 335332
This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline,
because it has no support for unaligned accesses.
It looks like we always pass target feature "+strict-align" from
Clang, so this is not a user facing problem, but querying the subtarget
(in e.g. llc) for unaligned access support is incorrect.
Differential Revision: https://reviews.llvm.org/D48437
llvm-svn: 335326
This option allows codegen (such as DAGCombine or MI scheduling) to use alias
analysis information, which can help with the codegen on in-order cpu's,
especially machine scheduling. Here I have done things the same way as AArch64,
adding a subtarget feature to enable this for specific cores, and enabled it for
the R52 where we have a schedule to make use of it.
Differential Revision: https://reviews.llvm.org/D48074
llvm-svn: 335249
The alignment parameter to getExtLoad is treated as a base alignment,
not the alignment of the load (base + offset). When we infer a better
alignment for a Ptr we need to ensure that it applies to the base to
prevent the alignment on the load from being wrong.
This fixes a bug where the alignment could then be used to incorrectly
prove noalias between a load and a store, leading to a miscompile.
Differential Revision: https://reviews.llvm.org/D48029
llvm-svn: 335210
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor
Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)
2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken
After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken
Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:
1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.
2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
- Since Pred is not deleted (BB is), the entry block does not need updating.
- The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
- isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
- eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
- adding some test owner as subscribers for the interesting tests modified:
test/CodeGen/X86/avx-cmp.ll
test/CodeGen/AMDGPU/nested-loop-conditions.ll
test/CodeGen/AMDGPU/si-annotate-cf.ll
test/CodeGen/X86/hoist-spill.ll
test/CodeGen/X86/2006-11-17-IllegalMove.ll
3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.
Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar
Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D48202
llvm-svn: 335183
Thumb has more 16-bit encoding space dedicated to ADD than ORR, allowing both a
3-address encoding and a wider range of immediates. So, particularly when
optimizing for code size (but it doesn't make things worse elsewhere) it's
beneficial to select an OR operation to an ADD if we know overflow won't occur.
This is made even better by LLVM's penchant for putting operations in canonical
form by converting the other way.
llvm-svn: 335119
Summary:
Here we relax the old constraint which utilized unsafe with the TargetOption flag HonorSignDependentRoundingFPMathOption, with the assertion that unsafe is no longer needed or never was required for correctness on FDIV/FMUL.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar
Reviewed By: spatel
Subscribers: efriedma, wdng, tpr
Differential Revision: https://reviews.llvm.org/D48057
llvm-svn: 334769
This would fail before because 1x vectors aren't legal,
so instead just use the scalar type.
Avoids regressions in a future AMDGPU commit to add
v4i16/v4f16 as legal types.
Test update is just the one test that this triggers
on in tree now. It wasn't checking anything before.
The result is completely changed since the selects
are eliminated. Not sure if it's considered better
or not.
llvm-svn: 334440
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47447
llvm-svn: 334361
It looks like this got left in by accident in r289794; I can't think of
any reason this check would be necessary. (Maybe it was meant to be a
check that the AND has one use? But we check that a few lines earlier.)
Differential Revision: https://reviews.llvm.org/D47921
llvm-svn: 334322
If no alignment is set, the abi/preferred alignment of structs will be
used which may be higher than required. This can lead to extra padding
and in the end an increase in data size.
Differential Revision: https://reviews.llvm.org/D47633
llvm-svn: 334099
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.
Differential Revision: https://reviews.llvm.org/D44928
llvm-svn: 334078
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47120
llvm-svn: 333825
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47121
llvm-svn: 333819
We've had Thumb1 support for ARMISD::SUBE for a while now, so this just
works. Reduces codesize a bit for 64-bit integer comparisons.
Differential Revision: https://reviews.llvm.org/D47387
llvm-svn: 333445
Chances are we'll be asked again after type legalization, but before that point
it's better to claim misaligned accesses aren't allowed than to assert.
llvm-svn: 332840
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in https://reviews.llvm.org/D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332638
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in https://reviews.llvm.org/D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
Follow-up to:
https://reviews.llvm.org/rL332538
...because that change wasn't enough.
llvm-svn: 332637
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332539
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332538
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332537
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332533
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332532
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.
Differential Revision https://reviews.llvm.org/D46477
llvm-svn: 332482