Commit Graph

8374 Commits

Author SHA1 Message Date
Chad Rosier aed910b7d7 [ARM] Minor refactoring. NFC.
llvm-svn: 249464
2015-10-06 20:51:26 +00:00
Chad Rosier 9df4aff86d [ARM] Minor refactoring. NFC.
llvm-svn: 249463
2015-10-06 20:45:45 +00:00
Chad Rosier a087fd21da [ARM] Minor refactoring to improve readability. NFC.
llvm-svn: 249454
2015-10-06 20:23:42 +00:00
Scott Douglass 953f908173 [ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.

A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.

Fixes PR9199 and PR23768.

[This is based on Peter Collingbourne's r238473 which was reverted.]

Differential Revision: http://reviews.llvm.org/D13239

Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458
Author: Peter Collingbourne <peter@pcc.me.uk>
llvm-svn: 249322
2015-10-05 14:49:54 +00:00
Rafael Espindola e3a20f57d9 Fix pr24486.
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.

With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.

In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.

This patch is a joint work by Maxim Ostapenko andy myself.

llvm-svn: 249303
2015-10-05 12:07:05 +00:00
Roman Divacky 4b5507a037 Actually switch the arch when we see .arch. PR21695
llvm-svn: 249165
2015-10-02 18:25:25 +00:00
Tim Northover 8d67b8e053 ARM: diagnose invalid local fixups on Thumb1
We previously stopped producing Thumb2 relaxations when they weren't supported,
but only diagnosed the case where an actual relocation was produced. We should
also tell people if local symbols aren't going to work rather than silently
overflowing.

llvm-svn: 249164
2015-10-02 18:07:18 +00:00
Tim Northover 956b008db6 ARM: correctly align constant pool value on Thumb1 targets.
Since we're using tLDRpci to access it, the constant pool's address must be 0
(mod 4).

llvm-svn: 249163
2015-10-02 18:07:13 +00:00
Scott Douglass 290183d734 [ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer
Differential Revision: http://reviews.llvm.org/D13240

llvm-svn: 249002
2015-10-01 11:56:19 +00:00
Artyom Skrobov 72ca6b8f3f [ARM] Support for ARMv6-Z / ARMv6-ZK missing
As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121
TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK.

In particular, there were no tests for TrustZone being supported in these
architectures.

The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes
his test case which hadn't really been testing what it was claiming to test.

Differential Revision: http://reviews.llvm.org/D13236

llvm-svn: 248921
2015-09-30 17:25:52 +00:00
Jeroen Ketema ab99b59e8c [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Andrew Kaylor 16c4da03d5 Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.
Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)

Differential Revision: http://reviews.llvm.org/D11370

llvm-svn: 248735
2015-09-28 20:33:22 +00:00
Artyom Skrobov ad8a0638f7 [ARM] Avoid redundant checks for isThumb1Only() after supportsTailCall()
supportsTailCall() has two callers. Both of them double-check isThumb1Only(),
and refuse to proceed with tail-calling in that case.
Therefore, it makes sense to move this check to
ARMSubtarget::initSubtargetFeatures, where SupportsTailCall is initialized;
and to eliminate the extra checks at the call sites.

Following a review comment, added an "assert(supportsTailCall())"
in IsEligibleForTailCall.

NFC.

llvm-svn: 248703
2015-09-28 09:44:11 +00:00
Ahmed Bougacha e81610fabb [ARM] Don't generate clrex for pre-v7 targets.
Since r248294, we emit clrex, but it doesn't exist on v6.

llvm-svn: 248640
2015-09-26 00:14:02 +00:00
Saleem Abdulrasool 8e99f50768 ARM: make -Asserts,-Werror=unused-variable build happy
The value was only used in an assertion.  Sink the variable usage into the
assertion.

llvm-svn: 248562
2015-09-25 05:41:02 +00:00
Saleem Abdulrasool fe83b50289 ARM: address WoA division limitation
We now emit the compiler generated divide by zero check that was needed for the
MSVC routines.  We construct a psuedo-instruction for the DBZ check as the
operation requires splitting up the BB.  For the 64-bit operations, we need to
custom expand the node as we need to insert the DBZ check and then emit the
libcall to the appropriate name.  Because this is target specific, it seemed
better to reproduce the expansion operation from the target-agnostic type
legalization rather than sink this there to avoid the duplication.  The division
library calls now match MSVC semantically.

llvm-svn: 248561
2015-09-25 05:15:46 +00:00
Artyom Skrobov cf296444ab [ARM] Handle +t2dsp feature as an ArchExtKind in ARMTargetParser.def
Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a
hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with
a FIXME: attached.

This patch changes the handling of +t2dsp to be in line with other
architecture extensions.

Following a revert of r248152 and new review comments, this patch also includes
renaming FeatureDSPThumb2 -> FeatureDSP, hasThumb2DSP() -> hasDSP(), etc.
The spelling of "t2dsp" is preserved, pending a further investigation of its
possible external usage.

Differential Revision: http://reviews.llvm.org/D12937

llvm-svn: 248519
2015-09-24 17:31:16 +00:00
Tim Northover beb5bccf88 ARM: fix folding stack adjustment (again again again...)
This time, the issue is that we weren't accounting for the possibility that
aligned DPRs could have been stored after the final "push" in a prologue. When
that happened we effectively moved a "sub sp, #N" from below the aligned stores
to above them, and everything went to pot.

To make it worse, I'd actually committed something testing that we produced
wrong code, so the test update is tiny.

llvm-svn: 248437
2015-09-23 22:21:09 +00:00
Oliver Stannard f2ed5c68d2 [ARM] Add option to force fast-isel
The ARM backend has some logic that only allows the fast-isel to be enabled for
subtargets where it is known to be stable. This adds a backend option to
override this and force the fast-isel to be used for any target, to allow it to
be tested.

This is an ARM-specific option, because no other backend disables the fast-isel
on a per-subtarget basis.

llvm-svn: 248369
2015-09-23 09:19:54 +00:00
Ahmed Bougacha 81616a72ea [ARM] Emit clrex in the expanded cmpxchg fail block.
ARM counterpart to r248291:

In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.

Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.

Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".

Differential Revision: http://reviews.llvm.org/D13033

llvm-svn: 248294
2015-09-22 17:22:58 +00:00
NAKAMURA Takumi 0a7d0ad95f Untabify.
llvm-svn: 248264
2015-09-22 11:15:07 +00:00
NAKAMURA Takumi a9cb538a74 Reformat blank lines.
llvm-svn: 248263
2015-09-22 11:14:39 +00:00
NAKAMURA Takumi 70ad98aca4 Reformat.
llvm-svn: 248261
2015-09-22 11:13:55 +00:00
NAKAMURA Takumi 59a16a76be ARMInstrInfo.cpp: Reformat.
llvm-svn: 248260
2015-09-22 11:10:17 +00:00
Jeroen Ketema 41681a5329 [ARM] Do not scale vext with a factor
The vext pseudo-instruction takes the number of elements that need to be
extracted, not the number of bytes. Hence, use the number of elements
directly instead of scaling them with a factor.

Reviewers: Silviu Baranga, James Molloy
(not reflected in the differential revision)

Differential Revision: http://reviews.llvm.org/D12974

llvm-svn: 248208
2015-09-21 20:28:04 +00:00
James Molloy e46da3849a Revert "[ARM] Handle +t2dsp feature as an ArchExtKind in ARMTargetParser.def"
This was committed without the code review (http://reviews.llvm.org/D12937) being approved.

This reverts commit r248152.

llvm-svn: 248174
2015-09-21 16:35:08 +00:00
Artyom Skrobov 79b0adaae4 [ARM] Handle +t2dsp feature as an ArchExtKind in ARMTargetParser.def
Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a
hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with
a FIXME: attached.

This patch changes the handling of +t2dsp to be in line with other
architecture extensions.

Following review comments, also updating the description of FeatureDSPThumb2
in ARM.td.

Differential Revision: http://reviews.llvm.org/D12937

llvm-svn: 248152
2015-09-21 12:43:10 +00:00
Craig Topper 3c76c523e1 Cleanup places that passed SMLoc by const reference to pass it by value instead. NFC
llvm-svn: 248135
2015-09-20 23:35:59 +00:00
Saleem Abdulrasool 4966f58ac2 ARM: cleanup formatting
clang-format a line which was poorly formatted.  NFC.

llvm-svn: 248110
2015-09-20 03:19:09 +00:00
Bob Wilson 8823b84fae NFC: Fix indentation and add braces to clarify nested of else-statement.
llvm-svn: 248086
2015-09-19 06:20:59 +00:00
Eric Christopher a835956bda Limit the range of processors supported by ARM fast isel to v6 or
later as that's all that is tested right now.

Fixes PR24858.

llvm-svn: 248027
2015-09-18 20:08:18 +00:00
Cong Hou f9f9ffb98b Scaling up values in ARMBaseInstrInfo::isProfitableToIfCvt() before they are scaled by a probability to avoid precision issue.
In ARMBaseInstrInfo::isProfitableToIfCvt(), there is a simple cost model in which the number of cycles is scaled by a probability to estimate the cost. However, when the number of cycles is small (which is usually the case), there is a precision issue after the computation. To avoid this issue, this patch scales those cycles by 1024 (chosen to make the multiplication a litter faster) before they are scaled by the probability. Other variables are also scaled up for the final comparison.

Differential Revision: http://reviews.llvm.org/D12742

llvm-svn: 248018
2015-09-18 18:19:40 +00:00
Eric Christopher a4e5d3cf8e constify the Function parameter to the TTI creation callback and
propagate to all callers/users/etc.

llvm-svn: 247864
2015-09-16 23:38:13 +00:00
Sanjay Patel a260701bbb propagate fast-math-flags on DAG nodes
After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, 
so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: 
if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is
one test case in this patch to prove that point.

This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I 
did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF
( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.

This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as
FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the
current global settings.

Differential Revision: http://reviews.llvm.org/D12095

llvm-svn: 247815
2015-09-16 16:31:21 +00:00
Chad Rosier 5d485db6b2 [ARM] Register ARMPreAllocLoadStoreOpt pass with LLVM pass manager.
llvm-svn: 247791
2015-09-16 13:11:31 +00:00
Daniel Sanders 50f17235dd Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Eric has replied and has demanded the patch be reverted.

llvm-svn: 247702
2015-09-15 16:17:27 +00:00
Daniel Sanders 153010c52d Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change. Thanks go to Pavel Labath for fixing LLDB for me.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247692
2015-09-15 14:08:28 +00:00
Daniel Sanders c40de48041 Revert r247684 - Replace Triple with a new TargetTuple ...
LLDB needs to be updated in the same commit.

llvm-svn: 247686
2015-09-15 13:46:21 +00:00
Daniel Sanders 18d4b0dab7 Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247683
2015-09-15 13:17:40 +00:00
Daniel Sanders c8cd6e95d2 Fix namespace indentation and missing blank lines before 'public:' in *MCAsmInfo.h. NFC.
This is to reduce noise in a following commit.

Also fixes a couple missing spaces before the reference operator.

llvm-svn: 247679
2015-09-15 12:27:06 +00:00
John Brawn 056e67865a [ARM] Extract shifts out of multiply-by-constant
Turning (op x (mul y k)) into (op x (lsl (mul y k>>n) n)) is beneficial when
we can do the lsl as a shifted operand and the resulting multiply constant is
simpler to generate.

Do this by doing the transformation when trying to select a shifted operand,
as that ensures that it actually turns out better (the alternative would be to
do it in PreprocessISelDAG, but we don't know for sure there if extracting the
shift would allow a shifted operand to be used).

Differential Revision: http://reviews.llvm.org/D12196

llvm-svn: 247569
2015-09-14 15:19:41 +00:00
Ahmed Bougacha 5246867384 [CodeGen] Refactor TLI/AtomicExpand interface to make LLSC explicit.
We used to have this magic "hasLoadLinkedStoreConditional()" callback,
which really meant two things:
- expand cmpxchg (to ll/sc).
- expand atomic loads using ll/sc (rather than cmpxchg).

Remove it, and, instead, introduce explicit callbacks:
- bool shouldExpandAtomicCmpXchgInIR(inst)
- AtomicExpansionKind shouldExpandAtomicLoadInIR(inst)

Differential Revision: http://reviews.llvm.org/D12557

llvm-svn: 247429
2015-09-11 17:08:28 +00:00
Ahmed Bougacha 9d677131c4 [CodeGen] Rename AtomicRMWExpansionKind to AtomicExpansionKind.
This lets us generalize its usage to the other atomic instructions.

llvm-svn: 247428
2015-09-11 17:08:17 +00:00
Cong Hou c536bd9e73 Pass BranchProbability/BlockMass by value instead of const& as they are small. NFC.
llvm-svn: 247357
2015-09-10 23:10:42 +00:00
James Molloy 8c995a93ce [ARM] Do not use vtrn for vectorshuffle if the order is reversed
The tests in isVTRNMask and isVTRN_v_undef_Mask should also check that the elements of the upper and lower half of the vectorshuffle occur in the correct order when both halves are used. Without this test the code assumes that it is correct to use vector transpose (vtrn) for the masks <1, 1, 0, 0> and <1, 3, 0, 2>, among others, but the transpose actually incorrectly generates shuffles for <0, 0, 1, 1> and <0, 2, 1, 3> in this case.

Patch by Jeroen Ketema!

llvm-svn: 247254
2015-09-10 08:42:28 +00:00
Chandler Carruth e4405e949f [ADT] Switch a bunch of places in LLVM that were doing single-character
splits to actually use the single character split routine which does
less work, and in a debug build is *substantially* faster.

llvm-svn: 247245
2015-09-10 06:12:31 +00:00
Matthias Braun d9da162789 Save LaneMask with livein registers
With subregister liveness enabled we can detect the case where only
parts of a register are live in, this is expressed as a 32bit lanemask.
The current code only keeps registers in the live-in list and therefore
enumerated all subregisters affected by the lanemask. This turned out to
be too conservative as the subregister may also cover additional parts
of the lanemask which are not live. Expressing a given lanemask by
enumerating a minimum set of subregisters is computationally expensive
so the best solution is to simply change the live-in list to store the
lanemasks as well. This will reduce memory usage for targets using
subregister liveness and slightly increase it for other targets

Differential Revision: http://reviews.llvm.org/D12442

llvm-svn: 247171
2015-09-09 18:08:03 +00:00
John Brawn d8b405abf7 [ARM] Get rid of SelectT2ShifterOperandReg, NFC
SelectT2ShifterOperandReg has identical behaviour to SelectImmShifterOperand,
so get rid of it and use SelectImmShifterOperand instead.

Differential Revision: http://reviews.llvm.org/D12195

llvm-svn: 246962
2015-09-07 11:45:18 +00:00
Ahmed Bougacha 699a9dd7c3 [ARM] Don't abort on variable-idx extractelt in ReconstructShuffle.
The code introduced in r244314 assumed that EXTRACT_VECTOR_ELT only
takes constant indices, but it does accept variables.
Bail out for those: we can't use them, as the shuffles we want to
reconstruct do require constant masks.

llvm-svn: 246594
2015-09-01 21:56:00 +00:00
Silviu Baranga e748c9ef55 [ARM] Turn on by default interleaved access vectorization
Summary:
This change turns on by default interleaved access vectorization on ARM,
as it has shown to be beneficial on ARM.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12146

llvm-svn: 246541
2015-09-01 11:19:15 +00:00
Chandler Carruth bb47b9a367 [Triple] Stop abusing a class to have only static methods and just use
the namespace that we are already using for the enums that are produced
by the parsing.

llvm-svn: 246367
2015-08-30 02:09:48 +00:00
James Molloy 45ee9898ec [ARM] Hoist fabs/fneg above a conversion to float.
This is especially visible in softfp mode, for example in the implementation of libm fabs/fneg functions. If we have:

%1 = vmovdrr r0, r1
%2 = fabs %1

then move the fabs before the vmovdrr:

%1 = and r1, #0x7FFFFFFF
%2 = vmovdrr r0, r1

This is never a lose, and could be a serious win because the vmovdrr may be followed by a vmovrrd, which would enable us to remove the conversion into FPRs completely.

We already do this for f32, but not for f64. Tests are added for both.

llvm-svn: 246360
2015-08-29 10:49:11 +00:00
Ahmed Bougacha f9c19da03a [CodeGen] Support (and default to) expanding READCYCLECOUNTER to 0.
For targets that didn't support this, this will let us respect the
langref instead of failing to select.

Note that we don't need to change the 32-bit x86/PPC lowerings (to
account for the result type/# difference) because they're both
custom and bypass type legalization.

llvm-svn: 246258
2015-08-28 01:49:59 +00:00
Reid Kleckner 0e2882345d [WinEH] Add some support for code generating catchpad
We can now run 32-bit programs with empty catch bodies.  The next step
is to change PEI so that we get funclet prologues and epilogues.

llvm-svn: 246235
2015-08-27 23:27:47 +00:00
Cong Hou b5ef475e5c [ARM] Use BranchProbability::scale() to scale an integer with a probability in ARMBaseInstrInfo.cpp,
Previously in isProfitableToIfCvt() in ARMBaseInstrInfo.cpp, the multiplication between an integer and a branch probability is done manually in an unsafe way that may lead to overflow. This patch corrects those cases by using BranchProbability's member function scale() to avoid overflow (which stores the intermediate result in int64).

Differential Revision: http://reviews.llvm.org/D12295

llvm-svn: 246106
2015-08-26 23:17:52 +00:00
Matthias Braun ccfc9c8d6d FastISel: Use finishCondBranch() for ARM,Mips,PowerPC FastISel
Note that after this change branch probabilities are preserved now.

llvm-svn: 245998
2015-08-26 01:55:47 +00:00
Matthias Braun b2b7ef1de8 MachineBasicBlock: Add liveins() method returning an iterator_range
llvm-svn: 245895
2015-08-24 22:59:52 +00:00
Scott Douglass bdef60462d [ARM] Use AEABI helpers for i64 div and rem
Differential Revision: http://reviews.llvm.org/D12232

llvm-svn: 245830
2015-08-24 09:17:18 +00:00
Scott Douglass d2974a6afa [ARM] Refactor LowerDivRem before adding LowerREM (nfc)
Differential Revision: http://reviews.llvm.org/D12230

llvm-svn: 245829
2015-08-24 09:17:11 +00:00
Vedant Kumar 366dd9fd2b [ARM] Fix MachO CPU Subtype selection
Differential Revision: http://reviews.llvm.org/D12040

llvm-svn: 245744
2015-08-21 21:52:48 +00:00
James Molloy bf17009a97 [ARM] Don't try and custom lower a vNi64 SETCC.
It won't go well. We've already marked 64-bit SETCCs as non-Custom, but it's just possible that a SETCC has a legal result type but an illegal operand type. If this happens, bail out before we create unselectable nodes.

Fixes PR24292. I tried to create a testcase but in 99% of cases we can't trigger this - not surprising that this bug has been latent since 2009.

llvm-svn: 245577
2015-08-20 16:33:44 +00:00
Silviu Baranga ad1b19fcb7 [ARM] Add instruction selection patterns for vmin/vmax
Summary:
The mid-end was generating vector smin/smax/umin/umax nodes, but
we were using vbsl to generatate the code. This adds the vmin/vmax
patterns and a test to check that we are now generating vmin/vmax
instructions.

Reviewers: rengolin, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D12105

llvm-svn: 245439
2015-08-19 14:11:27 +00:00
Sanjay Patel 1cd6d88e4d use minSize wrapper; NFCI
These were missed when other uses were switched over:
http://llvm.org/viewvc/llvm-project?view=revision&revision=243994

llvm-svn: 245311
2015-08-18 16:44:23 +00:00
Guozhi Wei f66d384443 Align SP adjustment in function getSPAdjust
This commit adds a new function TargetFrameLowering::alignSPAdjust
and calls it from TargetInstrInfo::getSPAdjust. It fixes PR24142.

llvm-svn: 245253
2015-08-17 22:36:27 +00:00
James Molloy 974838f294 [ARM] Fix crash when targetting CPU without NEON
We emulate a scalar vmin/vmax with NEON instructions as they don't exist in the VFP ISA. So only mark these as legal when NEON is available.

Found here: https://code.google.com/p/chromium/issues/detail?id=521671

llvm-svn: 245231
2015-08-17 19:37:12 +00:00
Silviu Baranga d5ac26937c [CostModel][ARM] Increase cost of insert/extract operations
Summary:
This change limits the minimum cost of an insert/extract
element operation to 2 in cases where this would result
in mixing of NEON and VFP code.

Reviewers: rengolin

Subscribers: mssimpso, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12030

llvm-svn: 245225
2015-08-17 15:57:05 +00:00
James Molloy c617be559a Rip out hand-rolled matching code for VMIN, VMAX, VMINNM and VMAXNM
This is no longer needed - SDAGBuilder will do this for us.

llvm-svn: 245197
2015-08-17 07:13:15 +00:00
James Y Knight 5567bafe93 Remove redundant TargetFrameLowering::getFrameIndexOffset virtual
function.

This was the same as getFrameIndexReference, but without the FrameReg
output.

Differential Revision: http://reviews.llvm.org/D12042

llvm-svn: 245148
2015-08-15 02:32:35 +00:00
Renato Golin 980b6cc42b Revert "[ARM] Fix MachO CPU Subtype selection"
This reverts commit r245081, as it breaks many builds.

llvm-svn: 245086
2015-08-14 19:35:47 +00:00
Vedant Kumar 2f079be789 [ARM] Fix MachO CPU Subtype selection
This patch makes the Darwin ARM backend take advantage of TargetParser.  It
also teaches TargetParser about ARMV7K for the first time. This makes target
triple parsing more consistent across llvm.

Differential Revision: http://reviews.llvm.org/D11996

llvm-svn: 245081
2015-08-14 18:36:47 +00:00
Rafael Espindola dbaf0498a9 Revert "Centralize the information about which object format we are using."
This reverts commit r245047.

It was failing on the darwin bots. The problem was that when running

./bin/llc -march=msp430

llc gets to

  if (TheTriple.getTriple().empty())
    TheTriple.setTriple(sys::getDefaultTargetTriple());

Which means that we go with an arch of msp430 but a triple of
x86_64-apple-darwin14.4.0 which fails badly.

That code has to be updated to select a triple based on the value of
march, but that is not a trivial fix.

llvm-svn: 245062
2015-08-14 15:48:41 +00:00
Rafael Espindola 90eb70c8a7 Centralize the information about which object format we are using.
Other than some places that were handling unknown as ELF, this should
have no change. The test updates are because we were detecting
arm-coff or x86_64-win64-coff as ELF targets before.

It is not clear if the enum should live on the Triple. At least now it lives
in a single location and should be easier to move somewhere else.

llvm-svn: 245047
2015-08-14 13:31:17 +00:00
James Molloy 31117875c2 [ARM] FMINNAN/FMAXNAN of f64 are not legal.
This was my error. We've got f32 marked as legal because they're simulated using a v2f32 instruction, but there's no equivalent for f64.

This will get test coverage imminently when D12015 lands.

llvm-svn: 244916
2015-08-13 17:28:26 +00:00
James Molloy c71f78f49f [ARM] Allow vmin/vmax of scalars to be emitted without UseNEONForFP.
This overrides the default to more closely resemble the hand-crafted matching logic in ISelLowering. It makes sense, as there is no VFP equivalent of vmin or vmax, to use them when they're available even if in general VFP ops should be preferred.

This should be NFC.

llvm-svn: 244915
2015-08-13 17:28:20 +00:00
John Brawn 68acdcb435 [ARM] Reorganise and simplify thumb-1 load/store selection
Other than PC-relative loads/store the patterns that match the various
load/store addressing modes have the same complexity, so the order that they
are matched is the order that they appear in the .td file.

Rearrange the instruction definitions in ARMInstrThumb.td, and make use of
AddedComplexity for PC-relative loads, so that the instruction matching order
is the order that results in the simplest selection logic. This also makes
register-offset load/store be selected when it should, as previously it was
only selected for too-large immediate offsets.

Differential Revision: http://reviews.llvm.org/D11800

llvm-svn: 244882
2015-08-13 10:48:22 +00:00
Alex Lorenz e40c8a2b26 PseudoSourceValue: Replace global manager with a manager in a machine function.
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.

This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.

This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.

This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.

Reviewers: Akira Hatanaka
llvm-svn: 244693
2015-08-11 23:09:45 +00:00
James Molloy d616c642bb [ARM] Match fminnan/fmaxnan for vector vmin/vmax instead of an intrinsic
Lower Intrinsic::arm_neon_vmins/vmaxs to fminnan/fmaxnan and match that instead. This is important because SDAG will soon be able to select FMINNAN itself, so we need a unified lowering path for intrinsics and SDAG.

NFCI.

llvm-svn: 244593
2015-08-11 12:06:28 +00:00
James Molloy ee868b2a3e [ARM] Match fminnum/fmaxnum for vector vminnm/vmaxnm instead of an intrinsic
Lower the intrinsic to a FMINNUM/FMAXNUM node and select that instead. This is important because soon SDAG will be able to select FMINNUM/FMAXNUM itself, so we need an integrated lowering path between SDAG and intrinsics.

NFCI.

llvm-svn: 244592
2015-08-11 12:06:25 +00:00
James Molloy ea3a687a33 [ARM] Replace ARMISD::VMINNM/VMAXNM with ISD::FMINNUM/FMAXNUM
NFCI. This replaces another custom ISDNode with a generic equivalent.

llvm-svn: 244591
2015-08-11 12:06:22 +00:00
James Molloy db8ee4b5a9 [ARM] Replace ARMISD::FMIN/FMAX with the shiny new ISD::FMINNAN/FMAXNAN.
NFCI. This removes a custom ISDNode.

llvm-svn: 244590
2015-08-11 12:06:15 +00:00
Cameron Esfahani f97999dc46 Explicitly clear the MI operand list when getInstruction() is called. Call MI.clear() within MCD::OPC_Decode case and inside of translateInstruction() for the X86 target. Remove now unnecessary MI.clear() from ARMDisassembler.
Summary: Explicitly clear the MI operand list when getInstruction() is called.

Reviewers: hfinkel, t.p.northover, hvarga, kparzysz, jyknight, qcolombet, uweigand

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11665

llvm-svn: 244557
2015-08-11 01:15:07 +00:00
Benjamin Kramer df005cbe19 Fix some comment typos.
llvm-svn: 244402
2015-08-08 18:27:36 +00:00
Chad Rosier 9659de379d [ARM] Remove an unused reference to MachineRegisterInfo. NFC.
llvm-svn: 244334
2015-08-07 17:02:29 +00:00
Silviu Baranga a07090f7fa Fix unused variable warning introduced in r244314
llvm-svn: 244315
2015-08-07 12:05:46 +00:00
Silviu Baranga 3e8e51c1a9 [ARM] Update ReconstructShuffle to handle mismatched types
Summary:
Port the ReconstructShuffle function from AArch64 to ARM
to handle mismatched incoming types in the BUILD_VECTOR
node.

This fixes an outstanding FIXME in the ReconstructShuffle
code.

Reviewers: t.p.northover, rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11720

llvm-svn: 244314
2015-08-07 11:40:46 +00:00
Pete Cooper ebcd748927 Convert a bunch of loops to foreach. NFC.
After r244074, we now have a successors() method to iterate over
all the successors of a TerminatorInst.  This commit changes a bunch
of eligible loops to use it.

llvm-svn: 244260
2015-08-06 20:22:46 +00:00
Chandler Carruth 93205eb966 [TTI] Make the cost APIs in TargetTransformInfo consistently use 'int'
rather than 'unsigned' for their costs.

For something like costs in particular there is a natural "negative"
value, that of savings or saved cost. As a consequence, there is a lot
of code that subtracts or creates negative values based on cost, all of
which is prone to awkwardness or bugs when dealing with an unsigned
type. Similarly, we *never* want these values to wrap, as that would
cause Very Bad code generation (likely percieved as an infinite loop as
we try to emit over 2^32 instructions or some such insanity).

All around 'int' seems a much better fit for these basic metrics. I've
added asserts to ensure that at least the TTI interface never returns
negative numbers here. If we ever have a use case for negative numbers,
we can remove this, but this way a bug where someone used '-1' to
produce a 'very large' cost will be caught by the assert.

This passes all tests, and is also UBSan clean.

No functional change intended.

Differential Revision: http://reviews.llvm.org/D11741

llvm-svn: 244080
2015-08-05 18:08:10 +00:00
Artyom Skrobov 6fbef2a780 ARMISelDAGToDAG.cpp had this self-contradictory code:
return StringSwitch<int>(Flags)
          .Case("g", 0x1)
          .Case("nzcvq", 0x2)
          .Case("nzcvqg", 0x3)
          .Default(-1);
...

  // The _g and _nzcvqg versions are only valid if the DSP extension is
  // available.
  if (!Subtarget->hasThumb2DSP() && (Mask & 0x2))
    return -1;

ARMARM confirms that the comment is right, and the code was wrong.

llvm-svn: 244029
2015-08-05 11:02:14 +00:00
Tanya Lattner 0d28f80bd1 Rename all references to old mailing lists to new lists.llvm.org address.
llvm-svn: 243999
2015-08-05 03:51:17 +00:00
Sanjay Patel 924879ad2c wrap OptSize and MinSize attributes for easier and consistent access (NFCI)
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).

Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.

This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.

Differential Revision: http://reviews.llvm.org/D11734

llvm-svn: 243994
2015-08-04 15:49:57 +00:00
Saleem Abdulrasool 0a2672bb43 ARM: support windows division routines
This adds the software division routines for the Windows RTABI.  These are not
expected to be used often though as most modern Windows ARM capable targets
support hardware division.  In the case that the target CPU doesnt support
hardware division, this will be the fallback.

llvm-svn: 243952
2015-08-04 03:57:56 +00:00
Saleem Abdulrasool 67697a7ea9 ARM: make Darwin libcall registration table driven (NFC)
Make the libcall updating table driven similar to the approach that the Linux
and Windows codepath does below.  NFC.

llvm-svn: 243951
2015-08-04 03:57:52 +00:00
Tim Northover 9c340ec6fd ARM: remove horrible printf left over from debugging
llvm-svn: 243907
2015-08-03 22:19:08 +00:00
Tim Northover 910dde7ab2 ARM: prefer allocating VFP regs at stride 4 on Darwin.
This is necessary for WatchOS support, where the compact unwind format assumes
this kind of layout. For now we only want this on Swift-like CPUs though, where
it's been the Xcode behaviour for ages. Also, since it can expand the prologue
we don't want it at -Oz.

llvm-svn: 243884
2015-08-03 17:20:10 +00:00
John Brawn f3324cf1a5 [ARM] Make GlobalMerge merge extern globals by default
Enabling merging of extern globals appears to be generally either beneficial or
harmless. On some benchmarks suites (on Cortex-M4F, Cortex-A9, and Cortex-A57)
it gives improvements in the 1-5% range, but in the rest the overall effect is
zero.

Differential Revision: http://reviews.llvm.org/D10966

llvm-svn: 243874
2015-08-03 12:13:33 +00:00
James Molloy 6967e5e4a3 Be less conservative about forming IT blocks.
In http://reviews.llvm.org/rL215382, IT forming was made more conservative under
the belief that a flag-setting instruction was unpredictable inside an IT block on ARMv6M.

But actually, ARMv6M doesn't even support IT blocks so that's impossible. In the ARMARM for
v7M, v7AR and v8AR it states that the semantics of such an instruction changes inside an
IT block - it doesn't set the flags. So actually it is fine to use one inside an IT block
as long as the flags register is dead afterwards.

This gives significant performance improvements in a variety of MPEG based workloads.

Differential revision: http://reviews.llvm.org/D11680

llvm-svn: 243869
2015-08-03 09:24:48 +00:00
Craig Topper e3dcce9700 De-constify pointers to Type since they can't be modified. NFC
This was already done in most places a while ago. This just fixes the ones that crept in over time.

llvm-svn: 243842
2015-08-01 22:20:21 +00:00
David Blaikie a5fd382eb3 -Wdeprecated-clean: Fix cases of violating the rule of 5 in ways that are deprecated in C++11
Various targets use std::swap on specific MCAsmOperands (ARM and
possibly Hexagon as well). It might be helpful to mark those subclasses
as final, to ensure that the availability of move/copy operations can't
lead to slicing. (same sort of requirements as the non-vitual dtor -
protected or a final class)

llvm-svn: 243820
2015-08-01 04:40:41 +00:00
Sumanth Gundapaneni 532a13691c [ARM] Lower modulo operation to generate __aeabi_divmod on Android
For a modulo (reminder) operation,
clang -target armv7-none-linux-gnueabi generates "__modsi3"
clang -target armv7-none-eabi generates "__aeabi_idivmod"
clang -target armv7-linux-androideabi generates "__modsi3"
Android bionic libc doesn't provide a __modsi3, instead it provides a
"__aeabi_idivmod". This patch fixes the LLVM ARMISelLowering to generate
the correct call when ever there is a modulo operation.

Differential Revision: http://reviews.llvm.org/D11661

llvm-svn: 243717
2015-07-31 00:45:12 +00:00
Sanjay Patel 1166f2ff9f fix memcpy/memset/memmove lowering when optimizing for size
Fixing MinSize attribute handling was discussed in D11363. 
This is a prerequisite patch to doing that.

The handling of OptSize when lowering mem* functions was broken
on Darwin because it wants to ignore -Os for these cases, but the
existing logic also made it ignore -Oz (MinSize).

The Linux change demonstrates a widespread problem. The backend
doesn't usually recognize the MinSize attribute by itself; it
assumes that if the MinSize attribute exists, then the OptSize 
attribute must also exist. 

Fixing this more generally will be a follow-on patch or two.

Differential Revision: http://reviews.llvm.org/D11568

llvm-svn: 243693
2015-07-30 21:41:50 +00:00
Nick Lewycky c3890d2969 Fix typo "fuction" noticed in comments in AssumptionCache.h, and also all the other files that have the same typo. All comments, no functionality change! (Merely a "fuctionality" change.)
Bonus change to remove emacs major mode marker from SystemZMachineFunctionInfo.cpp because emacs already knows it's C++ from the extension. Also fix typo "appeary" in AMDGPUMCAsmInfo.h.

llvm-svn: 243585
2015-07-29 22:32:47 +00:00
Akira Hatanaka 2670f4a550 [ARM] Define subtarget feature strict-align.
This commit defines subtarget feature strict-align and uses it instead of
cl::opt -arm-strict-align to decide whether strict alignment should be
forced. Also, remove the logic that was checking the OS and architecture
as clang is now responsible for setting strict-align based on the command
line options specified and the target architecute and OS.

rdar://problem/21529937

http://reviews.llvm.org/D11470

llvm-svn: 243493
2015-07-28 22:44:28 +00:00
Chih-Hung Hsieh 1e859582d6 Implement target independent TLS compatible with glibc's emutls.c.
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.

clang and driver changes in http://reviews.llvm.org/D10524

  Added -femulated-tls flag to select the emulated TLS model,
  which will be used for old targets like Android that do not
  support ELF TLS models.

Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.

Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.

TODO: Add proper DIE for emulated TLS variables.
      Added new unit tests with emulated TLS.

Differential Revision: http://reviews.llvm.org/D10522

llvm-svn: 243438
2015-07-28 16:24:05 +00:00
Alexandros Lamprineas 4ea707555a - Added support for parsing HWDiv features using Target Parser.
- Architecture extensions are represented as a bitmap.

Phabricator: http://reviews.llvm.org/D11457
llvm-svn: 243335
2015-07-27 22:26:59 +00:00
Colin LeMahieu fe2c8b8015 [llvm-mc] Pushing plumbing through for --fatal-warnings flag.
llvm-svn: 243334
2015-07-27 21:56:53 +00:00
Silviu Baranga 7581d22512 [ARM/AArch64] Fix cost model for interleaved accesses
Summary:
Fix the cost of interleaved accesses for ARM/AArch64.
We were calling getTypeAllocSize and using it to check
the number of bits, when we should have called
getTypeAllocSizeInBits instead.

This would pottentially cause the vectorizer to
generate loads/stores and shuffles which cannot
be matched with an interleaved access instruction.

No performance changes are expected for now since
matching/generating interleaved accesses is still
disabled by default.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11524

llvm-svn: 243270
2015-07-27 14:39:34 +00:00
Luke Cheeseman 4d45ff2b87 [ARM] - Fix lowering of shufflevectors in AArch32
Some shufflevectors are currently being incorrectly lowered in the AArch32
backend as the existing checks for detecting the NEON operations from the
shufflevector instruction expects the shuffle mask and the vector operands to be
of the same length.

This is not always the case as the mask may be twice as long as the operand;
here only the lower half of the shufflemask gets checked, so provided the lower
half of the shufflemask looks like a vector transpose (or even is just all -1
for undef) then the intrinsics may get incorrectly lowered into a vector
transpose (VTRN) instruction.

This patch fixes this by accommodating for both cases and adds regression tests.

Differential Revision: http://reviews.llvm.org/D11407

llvm-svn: 243103
2015-07-24 09:57:05 +00:00
Luke Cheeseman b5c627aba8 When lowering vector shifts a check is performed to see if the value to shift by
is an immediate, in this check the value is negated and stored in and int64_t.
The value can be -2^63 yet the result cannot be stored in an int64_t and this
gives some undefined behaviour causing failures. The negation is only necessary
when the values is within a certain range and so it should not need to negate
-2^63, this patch introduces this and also a regression test.

Differential Revision: http://reviews.llvm.org/D11408

llvm-svn: 243100
2015-07-24 09:31:48 +00:00
David Gross d9c1bc9955 [ARM] Register (existing) ARMLoadStoreOpt pass with LLVM pass manager.
Summary: Among other things, this allows -print-after-all/-print-before-all to dump IR around this pass.

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11373

llvm-svn: 243052
2015-07-23 22:12:46 +00:00
David Gross 2ad5d173ce Test commit.
llvm-svn: 243046
2015-07-23 21:46:09 +00:00
Quentin Colombet 48b772007f [ARM] Make the frame lowering code ready for shrink-wrapping.
Shrink-wrapping can now be tested on ARM with -enable-shrink-wrap.

Related to <rdar://problem/20821730>

llvm-svn: 242908
2015-07-22 16:34:37 +00:00
Akira Hatanaka 285815258c [ARM] Define subtarget feature "reserve-r9", which is used to decide
whether register r9 should be reserved.

This recommits r242737, which broke bots because the number of subtarget
features went over the limit of 64.

This change is needed because we cannot use a backend option to set
cl::opt "arm-reserve-r9" when doing LTO.

Out-of-tree projects currently using cl::opt option "-arm-reserve-r9" to
reserve r9 should make changes to add subtarget feature "reserve-r9" to
the IR.

rdar://problem/21529937

Differential Revision: http://reviews.llvm.org/D11320

llvm-svn: 242756
2015-07-21 01:42:02 +00:00
Matthias Braun a50d2203fa ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common code
Re-apply of r241928 which had to be reverted because of the r241926
revert.

This commit factors out common code from MergeBaseUpdateLoadStore() and
MergeBaseUpdateLSMultiple() and introduces a new function
MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a
strd/ldrd instruction into an strd/ldrd instruction with writeback where
possible.

Differential Revision: http://reviews.llvm.org/D10676

llvm-svn: 242743
2015-07-21 00:19:01 +00:00
Matthias Braun e40d89ef9b ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2
Re-apply r241926 with an additional check that r13 and r15 are not used
for LDRD/STRD. See http://llvm.org/PR24190. This also already includes
the fix from r241951.

Differential Revision: http://reviews.llvm.org/D10623

llvm-svn: 242742
2015-07-21 00:18:59 +00:00
Akira Hatanaka 42427d2c38 Revert r242737.
This caused builds to fail with the following error message:

error:Too many subtarget features! Bump MAX_SUBTARGET_FEATURES.

llvm-svn: 242740
2015-07-20 23:51:12 +00:00
Akira Hatanaka 7482d40cd5 [ARM] Define subtarget feature "reserve-r9", which is used to decide
whether register r9 should be reserved.

This change is needed because we cannot use a backend option to set
cl::opt "arm-reserve-r9" when doing LTO.

Out-of-tree projects currently using cl::opt option "-arm-reserve-r9" to
reserve r9 should make changes to add subtarget feature "reserve-r9" to
the IR.

rdar://problem/21529937

Differential Revision: http://reviews.llvm.org/D11320

llvm-svn: 242737
2015-07-20 23:21:30 +00:00
Matthias Braun 731e359e70 Revert "ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2"
This reverts commit r241926. This caused http://llvm.org/PR24190

llvm-svn: 242735
2015-07-20 23:17:20 +00:00
Matthias Braun 84e289702a Revert "ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common code"
This reverts commit r241928. This caused http://llvm.org/PR24190

llvm-svn: 242734
2015-07-20 23:17:16 +00:00
Matthias Braun 22f3960759 Revert "ARM: Use SpecificBumpPtrAllocator to fix leak introduced in r241920"
This reverts commit r241951. It caused http://llvm.org/PR24190

llvm-svn: 242733
2015-07-20 23:17:14 +00:00
JF Bastien e4d22d59d1 Targets: commonize some stack realignment code
This patch does the following:
* Fix FIXME on `needsStackRealignment`: it is now shared between multiple targets, implemented in `TargetRegisterInfo`, and isn't `virtual` anymore. This will break out-of-tree targets, silently if they used `virtual` and with a build error if they used `override`.
* Factor out `canRealignStack` as a `virtual` function on `TargetRegisterInfo`, by default only looks for the `no-realign-stack` function attribute.

Multiple targets duplicated the same `needsStackRealignment` code:
 - Aarch64.
 - ARM.
 - Mips almost: had extra `DEBUG` diagnostic, which the default implementation now has.
 - PowerPC.
 - WebAssembly.
 - x86 almost: has an extra `-force-align-stack` option, which the default implementation now has.

The default implementation of `needsStackRealignment` used to just return `false`. My current patch changes the behavior by simply using the above shared behavior. This affects:
 - AMDGPU
 - BPF
 - CppBackend
 - MSP430
 - NVPTX
 - Sparc
 - SystemZ
 - XCore
 - Out-of-tree targets
This is a breaking change! `make check` passes.

The only implementation of the `virtual` function (besides the slight different in x86) was Hexagon (which did `MF.getFrameInfo()->getMaxAlignment() > 8`), and potentially some out-of-tree targets. Hexagon now uses the default implementation.

`needsStackRealignment` was being overwritten in `<Target>GenRegisterInfo.inc`, to return `false` as the default also did. That was odd and is now gone.

Reviewers: sunfish

Subscribers: aemerson, llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11160

llvm-svn: 242727
2015-07-20 22:51:32 +00:00
Quentin Colombet 71a71485f4 [ARM] Refactor the prologue/epilogue emission to be more robust.
This is the first step toward supporting shrink-wrapping for this target.

The changes could be summarized by these items:
- Expand the tail-call return as part of the expand pseudo pass.
- Get rid of the assumptions that the epilogue is the exit block:
  * Do not assume which registers are free in the epilogue. (This indirectly
    improve the lowering of the code for the segmented stacks, see the test
    cases.)
  * Take into account that the basic block can be empty.

Related to <rdar://problem/20821730>

llvm-svn: 242714
2015-07-20 21:42:14 +00:00
Matthias Braun 9e85980658 ARM: Enable MachineScheduler and disable PostRAScheduler for swift.
Reapply r242500 now that the swift schedmodel includes LDRLIT.

This is mostly done to disable the PostRAScheduler which optimizes for
instruction latencies which isn't a good fit for out-of-order
architectures. This also allows to leave out the itinerary table in
swift in favor of the SchedModel ones.

This change leads to performance improvements/regressions by as much as
10% in some benchmarks, in fact we loose 0.4% performance over the
llvm-testsuite for reasons that appear to be unknown or out of the
compilers control. rdar://20803802 documents the investigation of
these effects.

While it is probably a good idea to perform the same switch for the
other ARM out-of-order CPUs, I limited this change to swift as I cannot
perform the benchmark verification on the other CPUs.

Differential Revision: http://reviews.llvm.org/D10513

llvm-svn: 242588
2015-07-17 23:18:30 +00:00
Matthias Braun 141d1c9d8f ARM: Add scheduling information for LDRLIT instructions to swift scheduling model
These pseudo instructions are only lowered after register allocation and
are therefore still present when the machine scheduler runs.
Add a run: line to a testcase that uses the uncommon flags necessary to
actually produce a LDRLIT instruction on swift.

llvm-svn: 242587
2015-07-17 23:18:26 +00:00
Adam Nemet 5a6d5bc17b Revert "ARM: Enable MachineScheduler and disable PostRAScheduler for swift."
This reverts commit r242500.

It broke some internal tests and Matthias asked me to revert it while he
is investigating.

llvm-svn: 242553
2015-07-17 18:14:19 +00:00
James Molloy a6702e2f14 [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
No functional change, but it preps codegen for the future when SABSDIFF
will start getting generated in anger.

llvm-svn: 242546
2015-07-17 17:10:55 +00:00
Matthias Braun 2d8315f806 ARM: Enable MachineScheduler and disable PostRAScheduler for swift.
This is mostly done to disable the PostRAScheduler which optimizes for
instruction latencies which isn't a good fit for out-of-order
architectures. This also allows to leave out the itinerary table in
swift in favor of the SchedModel ones.

This change leads to performance improvements/regressions by as much as
10% in some benchmarks, in fact we loose 0.4% performance over the
llvm-testsuite for reasons that appear to be unknown or out of the
compilers control. rdar://20803802 documents the investigation of
these effects.

While it is probably a good idea to perform the same switch for the
other ARM out-of-order CPUs, I limited this change to swift as I cannot
perform the benchmark verification on the other CPUs.

Differential Revision: http://reviews.llvm.org/D10513

llvm-svn: 242500
2015-07-17 01:44:31 +00:00
Matthias Braun da3d0d7342 Arm: Don't define a label twice with two setjmps in a function.
Constructing a name based on the function name didn't give us a unique
symbol if we had more than one setjmp in a function. Using
MCContext::createTempSymbol() always gives us a unique name.

Differential Revision: http://reviews.llvm.org/D9314

llvm-svn: 242482
2015-07-16 22:34:20 +00:00
Matthias Braun 3cd00c1739 Fix __builtin_setjmp in combination with sjlj exception handling.
llvm.eh.sjlj.setjmp was used as part of the SjLj exception handling
style but is also used in clang to implement __builtin_setjmp.  The ARM
backend needs to output additional dispatch tables for the SjLj
exception handling style, these tables however can't be emitted if
llvm.eh.sjlj.setjmp is simply used for __builtin_setjmp and no actual
landing pad blocks exist.

To solve this issue a new llvm.eh.sjlj.setup_dispatch intrinsic is
introduced which is used instead of llvm.eh.sjlj.setjmp in the SjLj
exception handling lowering, so we can differentiate between the case
where we actually need to setup a dispatch table and the case where we
just need the __builtin_setjmp semantic.

Differential Revision: http://reviews.llvm.org/D9313

llvm-svn: 242481
2015-07-16 22:34:16 +00:00
Pete Cooper f4ce569deb Revert "Add missing load/store flags to thumb2 instructions."
This reverts commit r242300.

This is causing buildbot failures which we are investigating.
I'll reapply once we know whats going on, but for now want to
get the bots green.

llvm-svn: 242428
2015-07-16 18:38:13 +00:00
Mehdi Amini bd7287ebe5 Move most user of TargetMachine::getDataLayout to the Module one
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

This patch is quite boring overall, except for some uglyness in
ASMPrinter which has a getDataLayout function but has some clients
that use it without a Module (llmv-dsymutil, llvm-dwarfdump), so
some methods are taking a DataLayout as parameter.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11090

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 242386
2015-07-16 06:11:10 +00:00
Akira Hatanaka 024d91a00b [ARM] Define a subtarget feature that is used to avoid using movt/movw
pairs for 32-bit immediates.

This change is needed to avoid emitting movt/movw pairs when doing LTO
and do so on a per-function basis.

Out-of-tree projects currently using cl::opt option -arm-use-movt=0 or
false to avoid emitting movt/movw pairs should make changes to add
subtarget feature "+no-movt" (see the changes made to clang in r242368).

rdar://problem/21529937

Differential Revision: http://reviews.llvm.org/D11026

llvm-svn: 242369
2015-07-16 00:58:23 +00:00
Pete Cooper e3c8161736 Clear kill flags in ARMLoadStoreOptimizer.
The pass here was clearing kill flags on instructions which had
their sources killed in the instruction being combined.  But
given that the new instruction is inserted after the existing ones,
any existing instructions with kill flags will lead to the verifier
complaining that we are reading an undefined physreg.

For example, what we had prior to this optimization is
	t2STRi12 %R1, %SP, 12
	t2STRi12 %R1<kill>, %SP, 16
	t2STRi12 %R0<kill>, %SP, 8

and prior to this fix that would generate
	t2STRi12 %R1<kill>, %SP, 16
	t2STRDi8 %R0<kill>, %R1, %SP, 8

This is clearly incorrect as it didn't clear the kill flag on R1
used with offset 16 because there was no kill flag on the instruction
with offset 12.

After this change we clear the kill flag on the offset 16 instruction
because we know it will be used afterwards in the new instruction.

I haven't provided a test case.  I have a small test, but even it is
very sensitive to register allocation order which isn't ideal.

llvm-svn: 242359
2015-07-16 00:09:18 +00:00
Matthias Braun 5d1f12d1f5 TargetRegisterInfo: Provide a way to check assigned registers in getRegAllocationHints()
Pass a const reference to LiveRegMatrix to getRegAllocationHints()
because some targets can prodive better hints if they can test whether a
physreg has been used for register allocation yet.

llvm-svn: 242340
2015-07-15 22:16:00 +00:00
Pete Cooper 21ca199cea Add missing load/store flags to thumb2 instructions.
These were the cause of a verifier error when building 7zip with
-verify-machineinstrs.  Running 'make check' with the verifier
triggered the same error on the test here so i've updated the test
to run the verifier on one of its runs instead of adding a new one.

While looking at this code, there was a stale comment that these
instructions were only used for disassembly.  This probably used to
be the case, but they are now used in the 'ARM load / store optimization pass' too.

llvm-svn: 242300
2015-07-15 16:36:38 +00:00
JF Bastien c8f48c19d3 WebAssembly: fix build breakage.
Summary:
processFunctionBeforeCalleeSavedScan was renamed to determineCalleeSaves and now takes a BitVector parameter as of rL242165, reviewed in http://reviews.llvm.org/D10909

WebAssembly is still marked as experimental and therefore doesn't build by default. It does, however, grep by default! I notice that processFunctionBeforeCalleeSavedScan is still mentioned in a few comments and error messages, which I also fixed.

Reviewers: qcolombet, sunfish

Subscribers: jfb, dsanders, hfinkel, MatzeB, llvm-commits

Differential Revision: http://reviews.llvm.org/D11199

llvm-svn: 242242
2015-07-14 23:06:07 +00:00
Matthias Braun 0256486532 PrologEpilogInserter: Rewrite API to determine callee save regsiters.
This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():

- Rename the function to determineCalleeSaves()
- Pass a bitset of callee saved registers by reference, thus avoiding
  the function-global PhysRegUsed bitset in MachineRegisterInfo.
- Without PhysRegUsed the implementation is fine tuned to not save
  physcial registers which are only read but never modified.

Related to rdar://21539507

Differential Revision: http://reviews.llvm.org/D10909

llvm-svn: 242165
2015-07-14 17:17:13 +00:00
Hans Wennborg 61f9efe73b ARMAsmParser: Take MCInst param by const-ref
(Broken out from http://reviews.llvm.org/D11167)

llvm-svn: 242160
2015-07-14 16:39:01 +00:00
Yaron Keren d1ba2d9d8b Generate correct asm info for mingw and cygwin ARM targets.
http://reviews.llvm.org/D11075

Patch by Martell Malone
Reviewed by Reid Kleckner

llvm-svn: 242123
2015-07-14 05:51:05 +00:00
Logan Chien 0a43abc9f8 ARM: Fix cttz expansion on vector types.
The 64/128-bit vector types are legal if NEON instructions are
available.  However, there was no matching patterns for @llvm.cttz.*()
intrinsics and result in fatal error.

This commit fixes the problem by lowering cttz to:
a. ctpop((x & -x) - 1)
b. width - ctlz(x & -x) - 1

llvm-svn: 242037
2015-07-13 15:37:30 +00:00
Scott Douglass 69bf1ce03a [ARM] Handle commutativity when converting to tADDhirr in Thumb2
Also, run thumb_rewrite.s tests in Thumb2 now that they pass.

Differential Revision: http://reviews.llvm.org/D11132

llvm-svn: 242036
2015-07-13 15:31:48 +00:00
Scott Douglass d9d8d26458 [ARM] Add Thumb2 ADD with SP narrowing from 3 operand to 2
Differential Revision: http://reviews.llvm.org/D11131

llvm-svn: 242035
2015-07-13 15:31:40 +00:00
Scott Douglass 039f768c42 [ARM] Small refactor of tryConvertingToTwoOperandForm (nfc)
Also, add more Thumb2 ADD tests requested during review of
http://reviews.llvm.org/D11053.

Differential Revision: http://reviews.llvm.org/D11130

llvm-svn: 242034
2015-07-13 15:31:33 +00:00
Aaron Ballman 6d8f785073 Removing several -Wunused-but-set-variable warnings; NFC intended.
llvm-svn: 242028
2015-07-13 14:04:30 +00:00
Renato Golin 1ef7a0f7c0 [ARM] Add support for nest attribute using r12
Register r12 ('ip') is used by GCC for this purpose
and hence is used here. As discussed on the GCC mailing
list, the register choice is an ABI issue and so
choosing the same register as GCC means
__builtin_call_with_static_chain is compatible.

A similar patch has just gone in the AArch64 backend,
so this is just the ARM counterpart, following the same
discussion.

Patch by Stephen Cross.

llvm-svn: 241996
2015-07-12 18:16:40 +00:00
Duncan P. N. Exon Smith e463e470f8 MC: Only allow changing feature bits in MCSubtargetInfo
Disallow all mutation of `MCSubtargetInfo` expect the feature bits.

Besides deleting the assignment operators -- which were dead "code" --
this restricts `InitMCProcessorInfo()` to subclass initialization
sequences, and exposes a new more limited function called
`setDefaultFeatures()` for use by the ARMAsmParser `.cpu` directive.

There's a small functional change here: ARMAsmParser used to adjust
`MCSubtargetInfo::CPUSchedModel` as a side effect of calling
`InitMCProcessorInfo()`, but I've removed that suspicious behaviour.
Since the AsmParser shouldn't be doing any scheduling, there shouldn't
be any observable change...

llvm-svn: 241961
2015-07-10 22:52:15 +00:00
Duncan P. N. Exon Smith 754e21f244 MC: Remove MCSubtargetInfo() default constructor
Force all creators of `MCSubtargetInfo` to immediately initialize it,
merging the default constructor and the initializer into an initializing
constructor.  Besides cleaning up the code a little, this makes it clear
that the initializer is never called again later.

Out-of-tree backends need a trivial change: instead of calling:

    auto *X = new MCSubtargetInfo();
    InitXYZMCSubtargetInfo(X, ...);
    return X;

they should call:

    return createXYZMCSubtargetInfoImpl(...);

There's no real functionality change here.

llvm-svn: 241957
2015-07-10 22:43:42 +00:00
Duncan P. N. Exon Smith bb57d73805 MC: Remove MCSubtargetInfo::InitCPUSched()
Remove all calls to `MCSubtargetInfo::InitCPUSched()` and merge its body
into the only relevant caller, `MCSubtargetInfo::InitMCProcessorInfo()`.
We were only calling the former after explicitly calling the latter with
the same CPU; it's confusing to have both methods exposed.

Besides a minor (surely unmeasurable) speedup in ARM and X86 from
avoiding running the logic twice, no functionality change.

llvm-svn: 241956
2015-07-10 22:33:01 +00:00
Matthias Braun e5a112f5e1 ARM: Use SpecificBumpPtrAllocator to fix leak introduced in r241920
llvm-svn: 241951
2015-07-10 22:23:57 +00:00
Matthias Braun d9bd22b2c4 ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common code
This commit factors out common code from MergeBaseUpdateLoadStore() and
MergeBaseUpdateLSMultiple() and introduces a new function
MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a
strd/ldrd instruction into an strd/ldrd instruction with writeback where
possible.

Differential Revision: http://reviews.llvm.org/D10676

llvm-svn: 241928
2015-07-10 18:37:33 +00:00
Matthias Braun e4ba6b8c24 ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2
Differential Revision: http://reviews.llvm.org/D10623

llvm-svn: 241926
2015-07-10 18:28:49 +00:00