Commit Graph

3609 Commits

Author SHA1 Message Date
Nemanja Ivanovic 92a8c36735 [DAGCombine] Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X))
The sqrt case is faster and we already do this for the case where
the exponent is 0.25. This adds the 0.75 case which is also not
sensitive to signed zeros.

Patch by Whitney Tsang (Whitney)

Differential revision: https://reviews.llvm.org/D57434

llvm-svn: 353557
2019-02-08 19:50:58 +00:00
Sam Parker 67756c09f2 [LSR] Generate cross iteration indexes
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.

Differential Revision: https://reviews.llvm.org/D55373

llvm-svn: 353403
2019-02-07 13:32:54 +00:00
Diana Picus 75a04e2a77 [ARM GlobalISel] Support G_ICMP for Thumb2
Mark as legal and use the t2* equivalents of the arm mode instructions,
e.g. t2CMPrr instead of plain CMPrr.

llvm-svn: 353392
2019-02-07 11:05:33 +00:00
Diana Picus e24b104a11 [ARM GlobalISel] Support G_GEP for Thumb2
Same as ARM, but use a different opcode in the instruction selection.

llvm-svn: 353151
2019-02-05 10:21:37 +00:00
Matt Arsenault 1f795e2c2a GlobalISel: Enforce operand types for constants
A number of of tests were using imm operands, not cimm. Since CSE
relies on the exact ConstantInt* pointer used, and implicit
conversions are generally evil, also enforce the bitsize of the types.

llvm-svn: 353113
2019-02-04 23:29:31 +00:00
Oliver Stannard bac11518cd [CodeGen] Don't scavenge non-saved regs in exception throwing functions
Previously, LiveRegUnits was assuming that if a block has no successors
and does not return, then no registers are live at the end of it
(because the end of the block is unreachable). This was causing the
register scavenger to use callee-saved registers to materialise stack
frame addresses without saving them in the prologue. This would normally
be fine, because the end of the block is unreachable, but this is not
legal if the block ends by throwing a C++ exception. If this happens,
the scratch register will be modified, but its previous value won't be
preserved, so it doesn't get restored by the exception unwinder.

Differential revision: https://reviews.llvm.org/D57381

llvm-svn: 352844
2019-02-01 09:23:51 +00:00
Sjoerd Meijer f222259c3c [ARM] Thumb2: ConstantMaterializationCost
Constants can also be materialised using the negated value and a MVN, and this
case seem to have been missed for Thumb2. To check the constant materialisation
costs, we now call getT2SOImmVal twice, once for the original constant and then
also for its negated value, and this function checks if the constant can both
be splatted or rotated.

This was revealed by a test that optimises for minsize: instead of a LDR
literal pool load and having a literal pool entry, just a MVN with an immediate
is smaller (and also faster).

Differential Revision: https://reviews.llvm.org/D57327

llvm-svn: 352737
2019-01-31 08:38:06 +00:00
Sjoerd Meijer f7cc34cae8 [SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS
And instead just generate a libcall. My motivating example on ARM was a simple:
  
  shl i64 %A, %B

for which the code bloat is quite significant. For other targets that also
accept __int128/i128 such as AArch64 and X86, it is also beneficial for these
cases to generate a libcall when optimising for minsize. On these 64-bit targets,
the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS
lowering operation action is not set to custom/expand.

Differential Revision: https://reviews.llvm.org/D57386

llvm-svn: 352736
2019-01-31 08:07:30 +00:00
Matt Arsenault 2a64598ef2 GlobalISel: Fix creating MMOs with align 0
llvm-svn: 352712
2019-01-31 01:38:47 +00:00
Matt Arsenault 547a83b4eb MIR: Reject non-power-of-4 alignments in MMO parsing
llvm-svn: 352686
2019-01-30 23:09:28 +00:00
David Green 54b0115547 [ARM] Use sub for negative offset load/store in thumb1
This attempts to optimise negative values used in load/store operands
a little. We currently try to selct them as rr, materialising the
negative constant using a MOV/MVN pair. This instead selects ri with
an immediate of 0, forcing the add node to become a simpler sub.

Differential Revision: https://reviews.llvm.org/D57121

llvm-svn: 352475
2019-01-29 10:40:31 +00:00
David Green 5c33c5da1a [ARM] Add extra testcases for D57121. NFC
llvm-svn: 352472
2019-01-29 10:25:56 +00:00
Diana Picus 574e0c5e32 [ARM GlobalISel] Support integer division for Thumb2
Support G_SDIV, G_UDIV, G_SREM and G_UREM.

The only significant difference between arm and thumb mode is that we
need to check a different subtarget feature.

llvm-svn: 352346
2019-01-28 10:37:30 +00:00
Diana Picus 8976ad12a9 [ARM GlobalISel] Support shifts for Thumb2
Same as ARM.

On this occasion we split some of the instruction select tests for more
complicated instructions into their own files, so we can reuse them for
ARM and Thumb mode. Likewise for the legalizer tests.

llvm-svn: 352188
2019-01-25 10:48:42 +00:00
Aditya Nandakumar 3ba0d94bce [GISel]: Change how CSE is enabled by default for each pass
https://reviews.llvm.org/D57178

Now add a hook in TargetPassConfig to query if CSE needs to be
enabled. By default this hook returns false only for O0 opt level but
this can be overridden by the target.
As a consequence of the default of enabled for non O0, a few tests
needed to be updated to not use CSE (by passing in -O0) to the run
line.

reviewed by: arsenm

llvm-svn: 352126
2019-01-24 23:11:25 +00:00
Sam Parker 31bef63bb4 [ARM][CGP] Check trunc type before replacing
In the last stage of type promotion, we replace any zext that uses a
new trunc with the operand of the trunc. This is okay when we only
allowed one type to be optimised, but now its the case that the trunc
maybe needed to produce a more narrow type than the one we were
optimising for. So we need to check this before doing the replacement.

Differential Revision: https://reviews.llvm.org/D57041

llvm-svn: 351935
2019-01-23 09:18:44 +00:00
Sam Parker 9a2a89d58f [DAGCombine] Enable more pre-indexed stores
The current check in CombineToPreIndexedLoadStore is too
conversative, preventing a pre-indexed store when the base pointer
is a predecessor of the value being stored. Instead, we should check
the pointer operand of the store.

Differential Revision: https://reviews.llvm.org/D56719

llvm-svn: 351933
2019-01-23 09:11:49 +00:00
Diana Picus d5c2499aec [ARM GlobalISel] Allow calls to varargs functions
Allow varargs functions to be called, both in arm and thumb mode. This
boils down to choosing the correct calling convention, which we can
easily test by making sure arm_aapcscc is used instead of
arm_aapcs_vfpcc when the callee is variadic.

llvm-svn: 351424
2019-01-17 10:11:55 +00:00
Sam Parker dd8cd6d26b [DAGCombine] Fix ReduceLoadWidth for shifted offsets
ReduceLoadWidth can trigger using a shifted mask is used and this
requires that the function return a shl node to correct for the
offset. However, the way that this was implemented meant that the
returned result could be an existing node, which would be incorrect.
This fixes the method of inserting the new node and replacing uses.

Differential Revision: https://reviews.llvm.org/D50432

llvm-svn: 351310
2019-01-16 08:40:12 +00:00
James Y Knight 693d39dd12 Remove irrelevant references to legacy git repositories from
compiler identification lines in test-cases.

(Doing so only because it's then easier to search for references which
are actually important and need fixing.)

llvm-svn: 351200
2019-01-15 16:18:52 +00:00
Diana Picus 8987d00653 [ARM GlobalISel] Import MOVi32imm into GlobalISel
Make it possible for TableGen to produce code for selecting MOVi32imm.
This allows reasonably recent ARM targets to select a lot more constants
than before.

We achieve this by adding GISelPredicateCode to arm_i32imm. It's
impossible to use the exact same code for both DAGISel and GlobalISel,
since one uses "Subtarget->" and the other "STI." to refer to the
subtarget. Moreover, in GlobalISel we don't have ready access to the
MachineFunction, so we need to add a bit of code for obtaining it from
the instruction that we're selecting. This is also the reason why it
needs to remain a PatLeaf instead of the more specific IntImmLeaf.

llvm-svn: 351056
2019-01-14 12:04:08 +00:00
Francis Visoiu Mistrih b7cef81fd3 Replace "no-frame-pointer-*" function attributes with "frame-pointer"
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"

Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.

"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".

tests are mostly updated with

// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"

// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"

Patch by Yuanfang Chen (tabloid.adroit)!

Differential Revision: https://reviews.llvm.org/D56351

llvm-svn: 351049
2019-01-14 10:55:55 +00:00
Evandro Menezes 0674762112 [AArch64] Create feature set for Exynos M4
Complete the feature set for Exynos M4 and update test cases.

llvm-svn: 350953
2019-01-11 18:54:25 +00:00
Sam Parker 53000a74a5 [ARM] Add missing patterns for DSP muls
Using a PatLeaf for sext_16_node allowed matching smulbb and smlabb
instructions once the operands had been sign extended. But we also
need to use sext_inreg operands along with sext_16_node to catch a
few more cases that enable use to remove the unnecessary sxth.

Differential Revision: https://reviews.llvm.org/D55992

llvm-svn: 350613
2019-01-08 10:12:36 +00:00
Diogo N. Sampaio f192cdb5c9 [ARM] ComputeKnownBits to handle extract vectors
This patch adds the sign/zero extension done by
vgetlane to ARM computeKnownBitsForTargetNode.

Differential revision: https://reviews.llvm.org/D56098

llvm-svn: 350553
2019-01-07 19:01:47 +00:00
Simon Pilgrim 6aac0ec21f Regenerate test.
Prep work towards enabling SimplifyDemandedBits vector support for TRUNCATE as discussed on D56118.

llvm-svn: 350514
2019-01-07 12:21:13 +00:00
Florian Hahn 8c9f865e3d [ARM] Set Defs = [CPSR] for COPY_STRUCT_BYVAL, as it clobbers CPSR.
Fixes PR35023.

Reviewers: MatzeB, t.p.northover, sunfish, qcolombet, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D55909

llvm-svn: 349935
2018-12-21 18:07:10 +00:00
Diana Picus 6c35a1e5af [ARM GlobalISel] Support G_CONSTANT for Thumb2
All we have to do is mark it as legal.

This allows us to select a lot of new patterns handled by TableGen. This
patch adds tests for them and splits up the existing test file for
binary operators into 2 files, one for arithmetic ops and one for
logical ones.

llvm-svn: 349610
2018-12-19 09:55:10 +00:00
Tim Northover ae3b66b7b0 ARM: use acquire/release instruction variants when available.
These features (fairly) recently got split out into their own feature, so we
should make CodeGen use them when available. The main change here is that the
check used to be based on the triple, but now it's based on CPU features.

llvm-svn: 349355
2018-12-17 15:05:32 +00:00
Sanjay Patel f24900b934 [DAGCombiner] allow hoisting vector bitwise logic ahead of truncates
The transform performs a bitwise logic op in a wider type followed by
truncate when both inputs are truncated from the same source type:
logic_op (truncate x), (truncate y) --> truncate (logic_op x, y)

There are a bunch of other checks that should prevent doing this when 
it might be harmful.

We already do this transform for scalars in this spot. The vector 
limitation was shared with a check for the case when the operands are 
extended. I'm not sure if that limit is needed either, but that would 
be a separate patch.

Differential Revision: https://reviews.llvm.org/D55448

llvm-svn: 349303
2018-12-16 14:57:04 +00:00
Sanjay Patel b7e2d6e493 [ARM] make test immune to scalarization improvements; NFC
llvm-svn: 349177
2018-12-14 18:47:04 +00:00
Diana Picus 02c8343c75 [ARM GlobalISel] Thumb2: casts between int and ptr
Mark as legal and add tests. Nothing special to do.

llvm-svn: 349147
2018-12-14 13:45:38 +00:00
Diana Picus acca60b49e [ARM GlobalISel] Remove duplicate test. NFCI
Fixup for r349026. I forgot to delete these test functions from the
original file when I moved them to arm-legalize-exts.mir.

llvm-svn: 349146
2018-12-14 13:28:34 +00:00
Diana Picus 14dc3b2959 [ARM GlobalISel] Allow simple binary ops in Thumb2
Mark G_ADD, G_SUB, G_MUL, G_AND, G_OR and G_XOR as legal for both ARM
and Thumb2.

Extract the legalizer tests for these opcodes into another file.

Add tests for the instruction selector.

llvm-svn: 349142
2018-12-14 11:58:14 +00:00
Diana Picus 99cd644b6c [ARM GlobalISel] Support exts and truncs for Thumb2
Mark G_SEXT, G_ZEXT and G_ANYEXT to 32 bits as legal and add support for
them in the instruction selector. This uses handwritten code again
because the patterns that are generated with TableGen are tuned for what
the DAG combiner would produce and not for simple sext/zext nodes.
Luckily, we only need to update the opcodes to use the Thumb2 variants,
everything else can be reused from ARM.

llvm-svn: 349026
2018-12-13 12:06:54 +00:00
Clement Courbet 76f4ae1092 [CodeGen] Allow mempcy/memset to generate small overlapping stores.
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.

Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D55365

llvm-svn: 349016
2018-12-13 09:56:19 +00:00
Diana Picus 59720b422a [ARM GlobalISel] Select load/store for Thumb2
Unfortunately we can't use TableGen for this because it doesn't yet
support predicates on the source pattern root. Therefore, add a bit of
handwritten code to the instruction selector to handle the most basic
cases.

Also mark them as legal and extract their legalizer test cases to a new
test file.

llvm-svn: 348920
2018-12-12 10:32:15 +00:00
Amara Emerson 5ec146046c [GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes.
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.

This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.

Differential Revisions: https://reviews.llvm.org/D53629

llvm-svn: 348788
2018-12-10 18:44:58 +00:00
Tim Northover 4bf394be3a ARM: use correct offset from base pointer (r6) in call frame regions.
When we had dynamic call frames (i.e. sp adjustment around each call) we
were including that adjustment into offsets calculated based on r6, even
though it's only sp that changes. This led to incorrect stack slot
accesses.

llvm-svn: 348591
2018-12-07 13:43:55 +00:00
David Green ca29c271d2 [Targets] Add errors for tiny and kernel codemodel on targets that don't support them
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.

Differential Revision: https://reviews.llvm.org/D50141

llvm-svn: 348585
2018-12-07 12:10:23 +00:00
Sam Parker 993326da19 [ARM][NFC] Adding another test for armcgp
llvm-svn: 348489
2018-12-06 15:13:44 +00:00
Sam Parker 9fa793dbe4 [ARM][NFC] Added extra arm-cgp test
llvm-svn: 348482
2018-12-06 12:58:58 +00:00
Diana Picus 8a1b4f57c9 [ARM GlobalISel] Implement call lowering for Thumb2
The only things that are different from arm are:
* different opcodes for calls and returns
* Thumb calls take predicate operands

llvm-svn: 348347
2018-12-05 10:35:28 +00:00
Tim Northover 5745b6ac3b ARM: use target-specific SUBS node when combining cmp with cmov.
This has two positive effects. First, using a custom node prevents
recombination leading to an infinite loop since the output DAG is notionally a
little more complex than the input one. Using a flag-setting instruction also
allows the subtraction to be folded with the related comparison more easily.

https://reviews.llvm.org/D53190

llvm-svn: 348122
2018-12-03 11:16:21 +00:00
Sjoerd Meijer 5afc957eba [ARM] FP16: support vld1.16 for vector loads with post-increment
Differential Revision: https://reviews.llvm.org/D55112

llvm-svn: 348110
2018-12-03 08:26:34 +00:00
Sjoerd Meijer ecc7dcb879 [ARM] Don't expand sdiv when optimising for minsize
Don't expand SDIV with an immediate that is a power of 2 if we optimise for
minimum code size. For example:

sdiv %1, i32 4

gets expanded to a sequence of 3 instructions, but this is suboptimal for
minimum code size so instead we just generate a MOV and a SDIV if integer
division is supported.

Differential Revision: https://reviews.llvm.org/D54546

llvm-svn: 347965
2018-11-30 08:14:28 +00:00
Craig Topper b955bf382c [LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT.
SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512.

This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral.

Differential Revision: https://reviews.llvm.org/D54906

llvm-svn: 347593
2018-11-26 21:12:39 +00:00
Diana Picus 0528e2cfb3 [ARM GlobalISel] Support G_CTLZ and G_CTLZ_ZERO_UNDEF
We can now select CLZ via the TableGen'erated code, so support G_CTLZ
and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32.

Legalizer:
If the CLZ instruction is available, use it for both G_CTLZ and
G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and
lower G_CTLZ in terms of it.

In order to achieve this we need to add support to the LegalizerHelper
for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2).

We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF
if that is supported as a libcall, as opposed to just if it is Legal or
Custom. Due to a minor refactoring of the helper function in charge of
this, we will also allow the same behaviour for G_CTTZ and G_CTPOP.
This is not going to be a problem in practice since we don't yet have
support for treating G_CTTZ and G_CTPOP as libcalls (not even in
DAGISel).

Reg bank select:
Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point.

Instruction select:
Nothing to do.

llvm-svn: 347545
2018-11-26 11:07:02 +00:00
Sam Parker 5338f7aae4 [ARM] Prevent parallel macs for unsigned values
Both zext and sext are currently allowed during the search for narrow
sequences and sexts operands are later added to the mac candidates.
But operands of muls are also added, without checking whether they're
sext or zext, which means we can generate a signed smlad when we
shouldn't.

Differential Revision: https://reviews.llvm.org/D54790

llvm-svn: 347542
2018-11-26 10:22:55 +00:00
Sjoerd Meijer fc448cfd25 [ARM][NFC] codegen tests cleanup: remove dangling check prefixes
I am working on making FileCheck stricter (in D54769 and D53710) so that it
issues diagnostics when there's something wrong with tests.

This is a cleanup for dangling prefixes in the ARM codegen tests, e.g.:

--check-prefixes=A,B

where A occurs in the check file, but B doesn't. This can be innocent if A does
all the required checking, but can also be a bug in that test if it results in
the test actually not checking anything (if A for example only checks a common
label). Test CodeGen/ARM/smml.ll is such an example.

Differential Revision: https://reviews.llvm.org/D54842

llvm-svn: 347487
2018-11-23 10:08:39 +00:00