Commit Graph

106376 Commits

Author SHA1 Message Date
Jonathan Roelofs ef84bda531 Re-apply r214881: Fix return sequence on armv4 thumb
This reverts r214893, re-applying r214881 with the test case relaxed a bit to
satiate the build bots.

POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:

  POP {r3}
  ADD sp, #offset
  BX r3

This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:

  MOV ip, r3
  POP {r3}
  ADD sp, #offset
  MOV lr, r3
  MOV r3, ip
  BX lr

http://reviews.llvm.org/D4748

llvm-svn: 214928
2014-08-05 21:32:21 +00:00
Lang Hames ae17268a7e [MCJIT] Make llvm-rtdyld check RuntimeDyld's error state when running in -verify
mode.

This will cause -verify mode to report failure when RuntimeDyld encounters an
internal error (e.g. overflows in relocation computations). Previously we had
let these errors slip past unreported.

llvm-svn: 214925
2014-08-05 20:51:46 +00:00
Bill Schmidt 42a6936c78 [PowerPC] Swap arguments and adjust shift count for vsldoi on little endian
Commits r213915 and r214718 fix recognition of shuffle masks for vmrg*
and vpku*um instructions for a little-endian target, by swapping the
input arguments.  The vsldoi instruction requires similar treatment,
and also needs its shift count adjusted for little endian.

Reviewed by Ulrich Weigand.

This is a bug fix candidate for release 3.5 (and hopefully the last of
those for PowerPC).

llvm-svn: 214923
2014-08-05 20:47:25 +00:00
Sanjay Patel 1954f2e924 Improved test cases that were added with r214892.
1. Added ':' to CHECK-LABELs
2. Added more CHECKs
3. Added CHECK-NEXTs
4. Added verbose hex immediate comments to CHECKs

llvm-svn: 214921
2014-08-05 20:16:35 +00:00
Rafael Espindola f9e52cf015 Don't internalize all but main by default.
This is mostly a cleanup, but it changes a fairly old behavior.

Every "real" LTO user was already disabling the silly internalize pass
and creating the internalize pass itself. The difference with this
patch is for "opt -std-link-opts" and the C api.

Now to get a usable behavior out of opt one doesn't need the funny
looking command line:

opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts

llvm-svn: 214919
2014-08-05 20:10:38 +00:00
Rafael Espindola c03b6e7880 Add a test showing the interaction of linker scripts and plugin.
In particular, the linker script is processed early enough for function g
to be internalized.

llvm-svn: 214916
2014-08-05 19:56:53 +00:00
Chandler Carruth a746239be3 [x86] Fix a crasher due to shuffles which cancel each other out and add
a test case.

We also miscompile this test case which is showing a serious flaw in the
single-input v8i16 shuffle code. I've left the specific instruction
checks FIXME-ed out until I can address the bug in the single-input
code, but I wanted to separate out a significant functionality change to
produce correct code from a very simple and targeted crasher fix.

The miscompile problem stems from keeping track of inputs by value
rather than by index. As a consequence of doing this, we can't reliably
update those inputs because they might swap and we can't detect this
without copying the mask.

The blend code now uses indices for the input lists and this seems
strictly better. It also should make it easier to sort things and do
other cleanups. I think the time has come to simplify The Great Lambda
here.

llvm-svn: 214914
2014-08-05 18:45:49 +00:00
Duncan P. N. Exon Smith 6a6e9cb50c Remove dead code in condition
Whether or not it's appropriate, labels have been first-class types
since r51511.

llvm-svn: 214908
2014-08-05 18:22:58 +00:00
NAKAMURA Takumi ca562297d9 X86CodeEmitter.cpp: Add SEH_Epilogue to ignored list for legacy JIT, corresponding to r214775.
llvm-svn: 214905
2014-08-05 18:04:15 +00:00
Adam Nemet c04f3f9f73 [X86] Improve comments for r214888
A rebase somehow ate my comments. This restores them.

llvm-svn: 214903
2014-08-05 17:58:49 +00:00
Matt Arsenault 6532520fbf R600/SI: Use register class instead of list of registers
I'm not sure if this has any consequence or not.

llvm-svn: 214902
2014-08-05 17:52:40 +00:00
Matt Arsenault 2549bb4b83 R600/SI: Add exec_lo and exec_hi subregisters.
This allows accessing an SReg subregister with a normal subregister
index, instead of getting a machine verifier error.

Also be sure to include all of these subregisters in SReg_32.
This fixes inferring SGPR instead of SReg when finding a
super register class.

llvm-svn: 214901
2014-08-05 17:52:37 +00:00
Duncan P. N. Exon Smith 5a511b59c5 BitcodeReader: Fix non-determinism in use-list order
`BasicBlockFwdRefs` (and `BlockAddrFwdRefs` before it) was being emptied
in a non-deterministic order.  When predicting use-list order I've
worked around this another way, but even when parsing lazily (and we
can't recreate use-list order) use-lists should be deterministic.

Make them so by using a side-queue of functions with forward-referenced
blocks that gets visited in order.

llvm-svn: 214899
2014-08-05 17:49:48 +00:00
Philip Reames 00c9b6461f Remove dead zero store to calloc initialized memory
Optimize the following IR:

%1 = tail call noalias i8* @calloc(i64 1, i64 4)
%2 = bitcast i8* %1 to i32*
; This store is dead and should be removed
store i32 0, i32* %2, align 4

Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store.  If the store is to an out of bounds address, it is undefined and thus also removable.

Reviewed By: nicholas

Differential Revision: http://reviews.llvm.org/D3942

llvm-svn: 214897
2014-08-05 17:48:20 +00:00
Jonathan Roelofs 064eb5a177 Revert r214881 because it broke lots of build-bots
llvm-svn: 214893
2014-08-05 17:36:05 +00:00
Sanjay Patel 8e5beb6edb Optimize vector fabs of bitcasted constant integer values.
Allow vector fabs operations on bitcasted constant integer values to be optimized
in the same way that we already optimize scalar fabs.

So for code like this:
%bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000
%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast)
%ret = bitcast <2 x float> %fabs to i64

Instead of generating something like this:

movabsq (constant pool loadi of mask for sign bits)
vmovq   (move from integer register to vector/fp register)
vandps  (mask off sign bits)
vmovq   (move vector/fp register back to integer return register)

We should generate:

mov     (put constant value in return register)

I have also removed a redundant clause in the first 'if' statement:
N0.getOperand(0).getValueType().isInteger()

is the same thing as:
IntVT.isInteger()

Testcases for x86 and ARM added to existing files that deal with vector fabs.
One existing testcase for x86 removed because it is no longer ideal.

For more background, please see:
http://reviews.llvm.org/D4770

And:
http://llvm.org/bugs/show_bug.cgi?id=20354

Differential Revision: http://reviews.llvm.org/D4785

llvm-svn: 214892
2014-08-05 17:35:22 +00:00
Adam Nemet fd2161b710 [AVX512] Add masking variant and intrinsics for valignd/q
This is similar to what I did with the two-source permutation recently.  (It's
almost too similar so that we should consider generating the masking variants
with some tablegen help.)

Both encoding and intrinsic tests are added as well.  For the latter, this is
what the IR that the intrinsic test on the clang side generates.

Part of <rdar://problem/17688758>

llvm-svn: 214890
2014-08-05 17:23:04 +00:00
Adam Nemet 4688a2e5cb [X86] Increase X86_MAX_OPERANDS from 5 to 6
This controls the number of operands in the disassembler's x86OperandSets
table.  The entries describe how the operand is encoded and its type.

Not to surprisingly 5 operands is insufficient for AVX512.  Consider
VALIGNDrrik in the next patch.  These are its operand specifiers:

  { /* 328 */
    { ENCODING_DUP, TYPE_DUP1 },
    { ENCODING_REG, TYPE_XMM512 },
    { ENCODING_WRITEMASK, TYPE_VK8 },
    { ENCODING_VVVV, TYPE_XMM512 },
    { ENCODING_RM_CD64, TYPE_XMM512 },
    { ENCODING_IB, TYPE_IMM8 },
  },

llvm-svn: 214889
2014-08-05 17:23:01 +00:00
Adam Nemet 164b07fbfe [X86] Add lowering to VALIGN
This was currently part of lowering to PALIGNR with some special-casing to
make interlane shifting work.  Since AVX512F has interlane alignr (valignd/q)
and AVX512BW has vpalignr we need to support both of these *at the same time*,
e.g. for SKX.

This patch breaks out the common code and then add support to check both of
these lowering options from LowerVECTOR_SHUFFLE.

I also added some FIXMEs where I think the AVX512BW and AVX512VL additions
should probably go.

llvm-svn: 214888
2014-08-05 17:22:59 +00:00
Adam Nemet 2f10cc699d [X86] Separate DAG node for valign and palignr
They have different semantics (valign is interlane while palingr is intralane)
and palingr is still needed even in the AVX512 context.  According to the
latest spec AVX512BW provides these.

llvm-svn: 214887
2014-08-05 17:22:55 +00:00
Adam Nemet d00a05e3e2 [AVX512] alignr: Use suffix rather than name argument to multiclass
Again no functional change.  This prepares for the suffix to be used with the
intrinsic matching.

llvm-svn: 214886
2014-08-05 17:22:52 +00:00
Adam Nemet f92139dd61 [AVX512] Pull everything alignr-related into the multiclass
The packed integer pattern becomes the DAG pattern for rri and the packed
float, another Pat<> inside the multiclass.

No functional change.

llvm-svn: 214885
2014-08-05 17:22:50 +00:00
Adam Nemet 1c752d8f5e Wrap long lines
llvm-svn: 214884
2014-08-05 17:22:47 +00:00
Jonathan Roelofs f5fad3767b Fix return sequence on armv4 thumb
POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:

  POP {r3}
  ADD sp, #offset
  BX r3

This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:

  MOV ip, r3
  POP {r3}
  ADD sp, #offset
  MOV lr, r3
  MOV r3, ip
  BX lr

http://reviews.llvm.org/D4748

llvm-svn: 214881
2014-08-05 17:13:17 +00:00
David Blaikie b706b58e78 Partially revert r214761 that asserted that all concrete debug info variables had DIEs, due to a failure on Darwin.
I'll work on a reduction and fix after this.

llvm-svn: 214880
2014-08-05 16:47:23 +00:00
David Blaikie c74ffa9cab Improve test for merged global debug info by using llvm-dwarfdump.
It's a bit of a tradeoff, since llvm-dwarfdump doesn't print the name of
the global symbol being used as an address in the addressing mode, but
this avoids the dependence on hardcoded set labels that keep changing
(5+ commits over the last few years that each update the set label as it
changes due to other, unrelated differences in output). This could've,
instead, been changed to match the set name then match the name in the
string pool but that would present other issues (needing to skip over
the sets that weren't of interest, etc) and checking that the addresses
(granted, without relocations applied - so it's not the whole story)
match in the two variable location descriptions seems sufficient and
fairly stable here.

There are a few similar other tests with similar label dependence that
I'll update soonish.

llvm-svn: 214878
2014-08-05 16:20:25 +00:00
Joerg Sonnenberger c4ce42980e Add accessors for the PPC 403 bank registers.
llvm-svn: 214875
2014-08-05 15:45:15 +00:00
Renato Golin 877b9b3513 Add tests for cp10/cp11 on ARMv5/6
Tests for ARMv7/8 are already on diagnostics.s

llvm-svn: 214872
2014-08-05 15:29:41 +00:00
Keith Walker 1045717584 Specify that the thumb setend and blx <immed> instructions are not valid on an m-class target
llvm-svn: 214871
2014-08-05 15:11:59 +00:00
Keith Walker 292aa3d5f7 Define stc2/stc2l/ldc2/ldc2l as thumb2 instructions
llvm-svn: 214868
2014-08-05 14:58:05 +00:00
Joerg Sonnenberger 936a4c8ceb Accessors for SSR2 and SSR3 on PPC 403.
llvm-svn: 214867
2014-08-05 14:53:05 +00:00
Tom Stellard 229d5e669b R600/SI: Update MUBUF assembly string to match AMD proprietary compiler
llvm-svn: 214866
2014-08-05 14:48:12 +00:00
Tom Stellard b37f797678 R600/SI: Avoid generating REGISTER_LOAD instructions.
SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code
path for 8-bit and 16-bit private loads.

llvm-svn: 214865
2014-08-05 14:40:52 +00:00
Joerg Sonnenberger 412471271e Add dci/ici instructions for PPC 476 and friends.
llvm-svn: 214864
2014-08-05 14:40:32 +00:00
Joerg Sonnenberger 048284e1b6 Add mftblo and mftbhi for PPC 4xx.
llvm-svn: 214863
2014-08-05 14:18:16 +00:00
Joerg Sonnenberger 9dedceb71d Add lswi / stswi for assembler use with a warning to not add patterns
for them.

llvm-svn: 214862
2014-08-05 13:34:01 +00:00
Yi Kong e56de69500 AArch64: Add support for instruction prefetch intrinsic
Instruction prefetch is not implemented for AArch64, it is incorrectly
translated into data prefetch instruction.

Differential Revision: http://reviews.llvm.org/D4777

llvm-svn: 214860
2014-08-05 12:46:47 +00:00
James Molloy 2b8933c354 Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost.
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.

llvm-svn: 214859
2014-08-05 12:30:34 +00:00
Chandler Carruth 183771bd8e [x86] Reformat some code I moved around in a prior commit but left
poorly formatted. Sorry about that.

llvm-svn: 214853
2014-08-05 10:35:30 +00:00
Joerg Sonnenberger 6b41a9900a Allow binary and for tblgen math.
llvm-svn: 214851
2014-08-05 09:43:25 +00:00
Chandler Carruth 947cef191d [x86] Fix a crash and wrong-code bug in the new vector lowering all
found by a single test reduced out of a failure on llvm-stress.

The start of the problem (and the crash) came when we tried to use
a find of a non-used slot in the move-to half of the move-mask as the
target for two bad-half inputs. While if lucky this will be the first of
a pair of slots which we can place the bad-half inputs into, it isn't
actually guaranteed. This really isn't surprising, not sure what I was
thinking. The correct way to find the two unused slots is to look for
one of the *used* slots. We know it isn't that pair, and we can use some
modular arithmetic to find the other pair by masking off the odd bit and
adding 2 modulo 4. With this, we reliably found a viable pair of slots
for the bad-half inputs.

Sadly, that wasn't enough. We also had a wrong code bug that surfaced
when I reduced the test case for this where we would use the same slot
twice for the two bad inputs. This is because both of the bad inputs
could be in odd slots originally and thus the mod-2 mapping would
actually be the same. The whole point of the weird indexing into the
pair of empty slots was to try to leverage when the end result needed
the two bad-half inputs to be paired in a dword and pre-pair them in the
correct orrientation. This is less important with the powerful combining
we're now doing, and also easier and more reliable to achieve be noting
that we add the bad-half inputs in order. Thus, if they are in a dword
pair, the low part of that will be the first input in the sequence.
Always putting that in the low element will just do the right thing in
addition to computing the correct result.

Test case added. =]

llvm-svn: 214849
2014-08-05 08:19:21 +00:00
Juergen Ributzka 9503327756 [FastIsel][AArch64] Fix previous commit r214844 (Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended.)
The original code would fail for unsupported value types like i1, i8, and i16.
This fix changes the code to only create a sub-register copy for i64 value types
and all other types (i1/i8/i16/i32) just use the source register without any
modifications.

getRegClassFor() is now guarded by the i64 value type check, that guarantees
that we always request a register for a valid value type.

llvm-svn: 214848
2014-08-05 07:31:30 +00:00
Juergen Ributzka a126d1ef3c [FastISel][AArch64] Implement the FastLowerArguments hook.
This implements basic argument lowering for AArch64 in FastISel. It only
handles a small subset of the C calling convention. It supports simple
arguments that can be passed in GPR and FPR registers.

This should cover most of the trivial cases without falling back to
SelectionDAG.

This fixes <rdar://problem/17890986>.

llvm-svn: 214846
2014-08-05 05:43:48 +00:00
Kevin Qin ec100526e3 Revert "r214832 - MachineCombiner Pass for selecting faster instruction"
It broke compiling of most Benchmark and internal test, as clang got
clashed by segmentation fault or assertion.

llvm-svn: 214845
2014-08-05 05:43:47 +00:00
Juergen Ributzka 51f5326e25 [FastISel][AArch64] Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended.
llvm-svn: 214844
2014-08-05 05:43:44 +00:00
Juergen Ributzka 384c3b5c03 Provide convenient access to the zext/sext attributes of function arguments. NFC.
llvm-svn: 214843
2014-08-05 05:43:41 +00:00
Eric Christopher fc6de428c8 Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838
2014-08-05 02:39:49 +00:00
Gerolf Hoflehner 4dbf44b9d8 MachineCombiner Pass for selecting faster instruction
sequence on AArch64

Re-commit of r214669 without changes to test cases
LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and
LLVM:: CodeGen/AArch64/dp-3source.ll
This resolves the reported compfails of the original commit.

llvm-svn: 214832
2014-08-05 01:16:13 +00:00
Joerg Sonnenberger 755ffa9b54 Add TCR register access
llvm-svn: 214826
2014-08-04 23:53:42 +00:00
Joerg Sonnenberger 5995e0021d Add PPC 603's tlbld and tlbli instructions.
llvm-svn: 214825
2014-08-04 23:49:45 +00:00