Commit Graph

29299 Commits

Author SHA1 Message Date
Juergen Ributzka 5d6c43e294 [FastISel][AArch64] Add support for frameaddress intrinsic.
This commit implements the frameaddress intrinsic for the AArch64 architecture
in FastISel.

There were two test cases that pretty much tested the same, so I combined them
to a single test case.

Fixes <rdar://problem/17811834>

llvm-svn: 213959
2014-07-25 17:47:14 +00:00
Amara Emerson 115d2df8a4 [ARM] Emit ABI_PCS_R9_use build attribute.
Patch by Ben Foster!

Differential Revision: http://reviews.llvm.org/D4657

llvm-svn: 213944
2014-07-25 14:03:14 +00:00
Benjamin Kramer 1f8930e3d3 Run sort_includes.py on the AArch64 backend.
No functionality change.

llvm-svn: 213938
2014-07-25 11:42:14 +00:00
Chandler Carruth 3de980d2ff [SDAG] Enable the new assert for out-of-range result numbers in
SDValues, fixing the two bugs left in the regression suite.

The key for both of these was the use a single value type rather than
a VTList which caused an unintentionally single-result merge-value node.
Fix this by getting the appropriate VTList in place.

Doing this exposed that the comments in x86's code abouth how MUL_LOHI
operands are handle is wrong. The bug with the use of out-of-range
result numbers was hiding the bug about the order of operands here (as
best i can tell). There are more places where the code appears to get
this backwards still...

llvm-svn: 213931
2014-07-25 09:19:23 +00:00
Akira Hatanaka 16e47ff42e [ARM] In thumb mode, emit directive ".code 16" before file level inline
assembly instructions.

This is necessary to ensure ARM assembler switches to Thumb mode before it
starts assembling the file level inline assembly instructions at the beginning
of a .s file.

<rdar://problem/17757232>

llvm-svn: 213924
2014-07-25 05:12:49 +00:00
Lang Hames 5432649be7 [X86] Clarify some stackmap shadow optimization code as based on review
feedback from Eric Christopher.

No functional change.

llvm-svn: 213917
2014-07-25 02:29:19 +00:00
Bill Schmidt c9fa5dd618 [PATCH][PPC64LE] Correct little-endian usage of vmrgh* and vmrgl*.
Because the PowerPC vmrgh* and vmrgl* instructions have a built-in
big-endian bias, it is necessary to swap their inputs in little-endian
mode when using them to implement a vector shuffle.  This was
previously missed in the vector LE implementation.

There was already logic to distinguish between unary and "normal"
vmrg* vector shuffles, so this patch extends that logic to use a third
option:  "swapped" vmrg* vector shuffles that are used for little
endian in place of the "normal" ones.

I've updated the vec-shuffle-le.ll test to check for the expected
register ordering on the generated instructions.

This bug was discovered when testing the LE and ELFv2 patches for
safety if they were backported to 3.4.  A different vectorization
decision was made in 3.4 than on mainline trunk, and that exposed the
problem.  I've verified this fix takes care of that issue.

llvm-svn: 213915
2014-07-25 01:55:55 +00:00
Joerg Sonnenberger b5459e6e22 Don't use 128bit functions on PPC32.
llvm-svn: 213899
2014-07-24 22:20:10 +00:00
Chandler Carruth 80b869461e [x86] Make vector legalization of extloads work more like the "normal"
vector operation legalization with support for custom target lowering
and fallback to expand when it fails, and use this to implement sext and
anyext load lowering for x86 in a more principled way.

Previously, the x86 backend relied on a target DAG combine to "combine
away" sextload and extload nodes prior to legalization, or would expand
them during legalization with terrible code. This is particularly
problematic because the DAG combine relies on running over non-canonical
DAG nodes at just the right time to match several common and important
patterns. It used a combine rather than lowering because we didn't have
good lowering support, and to expose some tricks being employed to more
combine phases.

With this change it becomes a proper lowering operation, the backend
marks that it can lower these nodes, and I've added support for handling
the canonical forms that don't have direct legal representations such as
sextload of a v4i8 -> v4i64 on AVX1. With this change, our test cases
for this behavior continue to pass even after the DAG combiner beigns
running more systematically over every node.

There is some noise caused by this in the test suite where we actually
use vector extends instead of subregister extraction. This doesn't
really seem like the right thing to do, but is unlikely to be a critical
regression. We do regress in one case where by lowering to the
target-specific patterns early we were able to combine away extraneous
legal math nodes. However, this regression is completely addressed by
switching to a widening based legalization which is what I'm working
toward anyways, so I've just switched the test to that mode.

Differential Revision: http://reviews.llvm.org/D4654

llvm-svn: 213897
2014-07-24 22:09:56 +00:00
Saleem Abdulrasool 8dc8fb18d8 Target: invert condition for Windows
The Microsoft ABI and MSVCRT are considered the canonical C runtime and ABI.
The long double routines are not part of this environment.  However, cygwin and
MinGW both provide supplementary implementations.  Change the condition to
reflect this reality.

llvm-svn: 213896
2014-07-24 22:09:06 +00:00
Lang Hames f49bc3f1b1 [X86] Optimize stackmap shadows on X86.
This patch minimizes the number of nops that must be emitted on X86 to satisfy
stackmap shadow constraints.

To minimize the number of nops inserted, the X86AsmPrinter now records the
size of the most recent stackmap's shadow in the StackMapShadowTracker class,
and tracks the number of instruction bytes emitted since the that stackmap
instruction was encountered. Padding is emitted (if it is required at all)
immediately before the next stackmap/patchpoint instruction, or at the end of
the basic block.

This optimization should reduce code-size and improve performance for people
using the llvm stackmap intrinsic on X86.

<rdar://problem/14959522>

llvm-svn: 213892
2014-07-24 20:40:55 +00:00
Reid Kleckner 9a412d13c1 Replace an assertion with a fatal error
Frontends are responsible for putting inalloca on parameters that would
be passed in memory and not registers.

llvm-svn: 213891
2014-07-24 19:53:33 +00:00
Saleem Abdulrasool c61ed0474e X86: correct library call setup for Windows itanium
This target is identical to the Windows MSVC (and follows Microsoft ABI for C).
Correct the library call setup for this target.  The same set of library calls
are missing on this environment.

llvm-svn: 213883
2014-07-24 17:46:36 +00:00
Matt Arsenault 83592a2d32 R600: Add FMA instructions for Evergreen
llvm-svn: 213882
2014-07-24 17:41:01 +00:00
Saleem Abdulrasool 34610e33ae X86: silence sign comparison warning
GCC 4.8 detected a signed compare [-Wsign-compare].  Add a cast for the
destination index.  Add an assert to catch a potential overflow however unlikely
it may be.

llvm-svn: 213878
2014-07-24 17:12:06 +00:00
Matt Arsenault 83e60581c3 R600: Add new functions for splitting vector loads and stores.
These will be used in future patches and shouldn't change anything yet.

llvm-svn: 213877
2014-07-24 17:10:35 +00:00
Joerg Sonnenberger c7dbc13e77 Include relative path for header outside the current directory.
llvm-svn: 213872
2014-07-24 16:04:46 +00:00
Tim Northover 7324e845a4 AArch64: refactor ReconstructShuffle function
Quite a bit of cruft had accumulated as we realised the various different cases
it had to handle and squeezed them in where possible. This refactoring mostly
flattens the logic and special-cases. The result is slightly longer, but I
think clearer.

Should be no functionality change.

llvm-svn: 213867
2014-07-24 15:39:55 +00:00
Hal Finkel cc39b67530 AA metadata refactoring (introduce AAMDNodes)
In order to enable the preservation of noalias function parameter information
after inlining, and the representation of block-level __restrict__ pointer
information (etc.), additional kinds of aliasing metadata will be introduced.
This metadata needs to be carried around in AliasAnalysis::Location objects
(and MMOs at the SDAG level), and so we need to generalize the current scheme
(which is hard-coded to just one TBAA MDNode*).

This commit introduces only the necessary refactoring to allow for the
introduction of other aliasing metadata types, but does not actually introduce
any (that will come in a follow-up commit). What it does introduce is a new
AAMDNodes structure to hold all of the aliasing metadata nodes associated with
a particular memory-accessing instruction, and uses that structure instead of
the raw MDNode* in AliasAnalysis::Location, etc.

No functionality change intended.

llvm-svn: 213859
2014-07-24 12:16:19 +00:00
NAKAMURA Takumi 8d745ca7cc Prune redundant libdeps.
llvm-svn: 213857
2014-07-24 11:45:27 +00:00
NAKAMURA Takumi 98d18be5fe Prune dependency to MC from each target disassembler.
llvm-svn: 213856
2014-07-24 11:45:11 +00:00
Tilmann Scheller 96ef72e54a [ARM] Make the assembler reject unpredictable pre/post-indexed ARM STRH instructions.
The ARM ARM prohibits STRH instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STRH instructions with unpredictable behavior.

llvm-svn: 213850
2014-07-24 09:55:46 +00:00
Daniel Sanders bdcfab117c [mips] Fix ll and sc instructions
Summary: The ll and sc instructions for r6 and non-r6 are misplaced. This patch fixes that.

Patch by Jyun-Yan You

Differential Revision: http://reviews.llvm.org/D4578

llvm-svn: 213847
2014-07-24 09:47:14 +00:00
Matt Arsenault 9acb978105 R600: Match rcp node on pre-SI
llvm-svn: 213844
2014-07-24 06:59:24 +00:00
Matt Arsenault 0daeb63f03 R600: Fix LowerSDIV24
Use ComputeNumSignBits instead of checking for i8 / i16 which only
worked when AMDIL was lying about having legal i8 / i16.

If an integer is known to fit in 24-bits, we can
do division faster with float ops.

llvm-svn: 213843
2014-07-24 06:59:20 +00:00
NAKAMURA Takumi 9c3bd7618a Update library dependencies.
llvm-svn: 213832
2014-07-24 02:10:42 +00:00
Matt Arsenault 034d666bb7 R600: Implement enableClusterLoads()
llvm-svn: 213831
2014-07-24 02:10:17 +00:00
Kevin Qin 9a2a2c502b [AArch64] Fix a bug generating incorrect instruction when building small vector.
This bug is introduced by r211144. The element of operand may be
smaller than the element of result, but previous commit can
only handle the contrary condition. This commit is to handle this
scenario and generate optimized codes like ZIP1.

llvm-svn: 213830
2014-07-24 02:05:42 +00:00
Jiangning Liu 451f30e89f [AArch64] Disable some optimization cases for type conversion from sint to fp, because those optimization cases are micro-architecture dependent and only make sense for Cyclone. A new predicate Cyclone is introduced in .td file.
llvm-svn: 213827
2014-07-24 01:29:59 +00:00
Filipe Cabecinhas 933cccf3fa Fixed PR20411 - bug in getINSERTPS()
When we had a vector_shuffle where we had an input from each vector, we
could miscompile it because we were assuming the input from V2 wouldn't
be moved from where it was on the vector.

Added a test case.

llvm-svn: 213826
2014-07-24 01:28:21 +00:00
Rafael Espindola 45bcf8a59c Fix the build when building with only the ARM backend.
llvm-svn: 213814
2014-07-23 22:54:28 +00:00
Rafael Espindola 5addace56d Finish inverting the MC -> Object dependency.
There were still some disassembler bits in lib/MC, but their use of Object
was only visible in the includes they used, not in the symbols.

llvm-svn: 213808
2014-07-23 22:26:07 +00:00
Jim Grosbach 724e438c62 [X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants.
The transform to constant fold unary operations with an AND across a
vector comparison applies when the constant is not a splat of a scalar
as well.

llvm-svn: 213800
2014-07-23 20:41:43 +00:00
Jim Grosbach 8f6f0858ec X86: restrict combine to when type sizes are safe.
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.

llvm-svn: 213799
2014-07-23 20:41:38 +00:00
Justin Holewinski 2cb5e181d1 [NVPTX] Silence a GCC warning found by the buildbots
The cast to NVPTXTargetLowering was missing a 'const', but let's
just access the right pointer through the subtarget anyway.

llvm-svn: 213793
2014-07-23 20:23:47 +00:00
Juergen Ributzka 1b014504ab [FastISel][AArch64] Fix return type in FastLowerCall.
I used the wrong method to obtain the return type inside FinishCall. This fix
simply uses the return type from FastLowerCall, which we already determined to
be a valid type.

Reduced test case from Chad. Thanks.

llvm-svn: 213788
2014-07-23 20:03:13 +00:00
Justin Holewinski ecca715b3c [NVPTX] mul.wide generation works for any smaller integer source types, not just the next smaller power of two
llvm-svn: 213784
2014-07-23 18:46:03 +00:00
Justin Holewinski 511664dc76 [NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations are disabled
With optimizations disabled, we disable the isel patterns for mul.wide; but we
were still generating MULWIDE ISD nodes.  Now, we only try to generate MULWIDE
ISD nodes in DAGCombine if the optimization level is not zero.

llvm-svn: 213773
2014-07-23 17:40:45 +00:00
Chad Rosier 17020f96c7 [AArch64] Lower sdiv x, pow2 using add + select + shift.
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2

However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2

llvm-svn: 213758
2014-07-23 14:57:52 +00:00
Robert Khasanov 74acbb7767 [SKX] Enabling mask instructions: encoding, lowering
KMOVB, KMOVW, KMOVD, KMOVQ, KNOTB, KNOTW, KNOTD, KNOTQ

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 213757
2014-07-23 14:49:42 +00:00
Tim Northover 14ff2df05c ARM: spot SBFX-compatbile code expressed with sign_extend_inreg
We were assuming all SBFX-like operations would have the shl/asr form, but
often when the field being extracted is an i8 or i16, we end up with a
SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though.

llvm-svn: 213754
2014-07-23 13:59:12 +00:00
Tim Northover 7ad2a0e0c2 ARM: add patterns for [su]xta[bh] from just a shift.
Although the final shifter operand is a rotate, this actually only matters for
the half-word extends when the amount == 24. Otherwise folding a shift in is
just as good.

llvm-svn: 213753
2014-07-23 13:59:07 +00:00
James Molloy bc9fed82cc Enable partial libcall inlining for all targets by default.
This pass attempts to speculatively use a sqrt instruction if one exists on the target, falling back to a libcall if the target instruction returned NaN.

This was enabled for MIPS and System-Z, but is well guarded and is good for most targets - GCC does this for (that I've checked) X86, ARM and AArch64.

llvm-svn: 213752
2014-07-23 13:33:00 +00:00
Tilmann Scheller 2727279117 [ARM] Make the assembler reject unpredictable pre/post-indexed ARM STRB instructions.
The ARM ARM prohibits STRB instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STRB instructions with unpredictable behavior.

llvm-svn: 213750
2014-07-23 13:03:47 +00:00
Tim Northover 35910d7fa8 AArch64: remove "arm64_be" support in favour of "aarch64_be".
There really is no arm64_be: it was a useful fiction to test big-endian support
while both backends existed in parallel, but now the only platform that uses
the name (iOS) doesn't have a big-endian variant, let alone one called
"arm64_be".

llvm-svn: 213748
2014-07-23 12:58:11 +00:00
Tilmann Scheller 3352a58ddc [ARM] Make the assembler reject unpredictable pre/post-indexed ARM STR instructions.
The ARM ARM prohibits STR instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STR instructions with unpredictable behavior.

llvm-svn: 213745
2014-07-23 12:38:17 +00:00
Tim Northover e19bed7d33 AArch64: remove arm64 triple enumerator.
Having both Triple::arm64 and Triple::aarch64 is extremely confusing, and
invites bugs where only one is checked. In reality, the only legitimate
difference between the two (arm64 usually means iOS) is also present in the OS
part of the triple and that's what should be checked.

We still parse the "arm64" triple, just canonicalise it to Triple::aarch64, so
there aren't any LLVM-side test changes.

llvm-svn: 213743
2014-07-23 12:32:47 +00:00
Andrea Di Biagio 842355e900 Revert r211771. It was: "[X86] Improve the selection of SSE3/AVX addsub instructions".
This chang fully reverts r211771.
That revision added a canonicalization rule which has the potential to causes a
combine-cycle in the target-independent canonicalizing DAG combine.

The plan is to move the logic that forms target specific addsub nodes as part of
the lowering of shuffles.

llvm-svn: 213736
2014-07-23 11:20:24 +00:00
Tilmann Scheller c28f0d587d [ARM] Add earlyclobber constraint to pre/post-indexed ARM STRH instructions.
The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted.

The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed.

llvm-svn: 213729
2014-07-23 08:12:51 +00:00
Juergen Ributzka 2581fa505f [FastIsel][AArch64] Add support for the FastLowerCall and FastLowerIntrinsicCall target-hooks.
This commit modifies the existing call lowering functions to be used as the
FastLowerCall and FastLowerIntrinsicCall target-hooks instead.

This enables patchpoint intrinsic lowering for AArch64.

This fixes <rdar://problem/17733076>

llvm-svn: 213704
2014-07-22 23:14:58 +00:00