Commit Graph

30475 Commits

Author SHA1 Message Date
Arnaud A. de Grandmaison 9b3330546b [AArch64] Cleanup A57PBQPConstraints
And add a long awaited testcase.

llvm-svn: 220381
2014-10-22 12:40:20 +00:00
Jyoti Allur 3b68607eac [Thumb/Thumb2] Implement restrictions on SP in register list on LDM, STM variants in thumb mode
llvm-svn: 220379
2014-10-22 10:41:14 +00:00
Matt Arsenault 7c93690be0 Add minnum / maxnum codegen
llvm-svn: 220342
2014-10-21 23:01:01 +00:00
Matt Arsenault 75c658e2cc R600/SI: Add missing parameter to div_fmas intrinsic
llvm-svn: 220338
2014-10-21 22:20:55 +00:00
Matt Arsenault 8c4fb7cae0 R600: Use default GlobalDirective
The overridden one wasn't inserting a space,
so you would end up with .globalfoo

llvm-svn: 220329
2014-10-21 21:08:36 +00:00
Arnaud A. de Grandmaison a61262f989 [PBQP] Teach PassConfig to tell if the default register allocator is used.
This enables targets to adapt their pass pipeline to the register
allocator in use. For example, with the AArch64 backend, using PBQP
with the cortex-a57, the FPLoadBalancing pass is no longer necessary.

llvm-svn: 220321
2014-10-21 20:47:22 +00:00
Rafael Espindola f03ae4efa7 Drop support for an old version of ld64 (from darwin 9).
llvm-svn: 220310
2014-10-21 18:31:09 +00:00
Matt Arsenault e306a32325 R600/SI: Add pattern for bswap
llvm-svn: 220304
2014-10-21 16:25:08 +00:00
NAKAMURA Takumi 9ff272f382 X86AsmInstrumentation.cpp: Dissolve initializer-ranged-for. MSC17 disliked it.
llvm-svn: 220301
2014-10-21 16:22:52 +00:00
Colin LeMahieu 7055365e77 Test commit
Fixing brief comment.

llvm-svn: 220299
2014-10-21 16:03:10 +00:00
Bill Schmidt 5c6cb813b6 [PowerPC] Avoid VSX FMA mutate when killed product reg = addend reg
With VSX enabled, test/CodeGen/PowerPC/recipest.ll exposes a bug in
the FMA mutation pass.  If we have a situation where a killed product
register is the same register as the FMA target, such as:

   %vreg5<def,tied1> = XSNMSUBADP %vreg5<tied0>, %vreg11, %vreg5,
                       %RM<imp-use>; VSFRC:%vreg5 F8RC:%vreg11 

then the substitution makes no sense.  We end up getting a crash when
we try to extend the interval associated with the killed product
register, as there is already a live range for %vreg5 there.  This
patch just disables the mutation under those circumstances.

Since recipest.ll generates different code with VMX enabled, I've
modified that test to use -mattr=-vsx.  I've borrowed the code from
that test that exposed the bug and placed it in fma-mutate.ll, where
it tests several mutation opportunities including the "bad" one.

llvm-svn: 220290
2014-10-21 13:02:37 +00:00
Oliver Stannard cdb8db8d3c [ARM] NEON 32-bit scalar moves are also available in VFPv2
The 32-bit variants of the NEON scalar<->GPR move instructions are
also available in VFPv2. The 8- and 16-bit variants do require NEON.

Note that the checks in the test file are all -DAG because they are
checking a mixture of stdout and stderr, and the ordering is not
guaranteed.

llvm-svn: 220288
2014-10-21 11:49:14 +00:00
Yuri Gorshenin 171eb8dbeb [asan-asm-instrumentation] Fixed memory accesses with rbp as a base or an index register.
Summary: Fixed memory accesses with rbp as a base or an index register.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5819

llvm-svn: 220283
2014-10-21 10:22:27 +00:00
Oliver Stannard 38e6d45a46 [Thumb2] LDRS?[BH] cannot load to the PC
The Thumb2 LDRS?[BH] instructions are not valid when the destination
register is the PC (these encodings are used for preload hints).

llvm-svn: 220278
2014-10-21 09:14:15 +00:00
Zoran Jovanovic 592239d498 [mips][microMIPS] Implement ADDU16 and SUBU16 instructions
Differential Revision: http://reviews.llvm.org/D5118

llvm-svn: 220276
2014-10-21 08:44:58 +00:00
Zoran Jovanovic 81ceebc56e [mips][microMIPS] Implement AND16, NOT16, OR16 and XOR16 instructions
Differential Revision: http://reviews.llvm.org/D5117

llvm-svn: 220275
2014-10-21 08:32:40 +00:00
Zoran Jovanovic b0852e5410 [mips][microMIPS] Implement microMIPS 16-bit instructions registers
Differential Revision: http://reviews.llvm.org/D5116

llvm-svn: 220273
2014-10-21 08:23:11 +00:00
Rafael Espindola c606bfe660 Fix a bit of confusion about .set and produce more readable assembly.
Every target we support has support for assembly that looks like

a = b - c
.long a

What is special about MachO is that the above combination suppresses the
production of a relocation.

With this change we avoid producing the intermediary labels when they don't
add any value.

llvm-svn: 220256
2014-10-21 01:17:30 +00:00
Quentin Colombet 06355199f1 [X86] Fix a bug in the lowering of the mask of VSELECT.
X86 code to lower VSELECT messed a bit with the bits set in the mask of VSELECT
when it knows it can be lowered into BLEND. Indeed, only the high bits need to be
set for those and it optimizes those accordingly.
However, when the mask is a compile time constant, the lowering will be handled
by the generic optimizer and those modifications will generate bad code in the
generic optimizer.

This patch fixes that by preventing the optimization if the VSELECT will be
handled by the generic optimizer.

<rdar://problem/18675020>

llvm-svn: 220242
2014-10-20 23:13:30 +00:00
Simon Pilgrim 2f9548a3ef [X86] Memory folding for commutative instructions (updated)
This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.

Updated version of r219584 (reverted in r219595) - the commutation attempt now explicitly ensures that neither of the commuted source operands are tied to the destination operand / register, which was the source of all the regressions that occurred with the original patch attempt.

Added additional regression test case provided by Joerg Sonnenberger.

Differential Revision: http://reviews.llvm.org/D5818

llvm-svn: 220239
2014-10-20 22:14:22 +00:00
Tim Northover 23075ccee7 ARM: rework Thumb1 frame index rewriting
The previous code had a few problems, motivating the choices here.

1. It could create instructions clobbering CPSR, but the incoming MachineInstr
   didn't reflect this. A potential source of corruption. This is why the patch
   has a new PseudoInst for before lowering.
2. Similarly, there was some code to handle the incoming instruction not being
   ARMCC::AL, but this would have caused massive problems if it was actually
   invoked when a complex offset needing more than one instruction was requested.
3. It wasn't designed to handle unaligned pointers (or offsets). These should
   probably be minimised anyway, but the code needs to deal with them properly
   regardless.
4. It had some rather dubious ad-hoc code to avoid calling
   emitThumbRegPlusImmediate, a function which should be designed to do precisely
   this job.

We seem to cover the common cases correctly now, and hopefully can enhance
emitThumbRegPlusImmediate to handle any extra optimisations we need to add in
future.

llvm-svn: 220236
2014-10-20 21:28:41 +00:00
Oliver Stannard 6672f37ed2 [Thumb2] RFE, SRS and "SUBS pc, lr" are undefined on v7M
These instructions are related to the v7[AR] exception model, and are
not defined on v7M.

llvm-svn: 220204
2014-10-20 15:37:35 +00:00
Sid Manning c374ac97c8 Remove unnecessary else.
llvm-svn: 220200
2014-10-20 13:08:19 +00:00
Oliver Stannard e8f63a54b4 [ARM] Do not select SMULW[BT] or SMLAW[BT]
The current instruction selection patterns for SMULW[BT] and SMLAW[BT]
are incorrect. These instructions multiply a 32-bit and a 16-bit value
(both signed) and return the top 32 bits of the 48-bit result. This
preserves the 16 bits of overflow, whereas the patterns they currently
match truncate the result to 16 bits then sign extend.

To select these instructions, we would need to match an ISD::SMUL_LOHI,
a sign extend, two shifts and an or. There is no way to match SMUL_LOHI
in an instruction pattern as it defines multiple values, so this would
have to be done in C++. I have raised
http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct
selection of these instructions.

This fixes http://llvm.org/bugs/show_bug.cgi?id=19396

llvm-svn: 220196
2014-10-20 11:30:35 +00:00
Oliver Stannard fce039240a [Thumb] Fix crash in Thumb1RegisterInfo::rewriteFrameIndex
This function can, for some offsets from the SP, split one instruction
into two. Since it re-uses the original instruction as the first
instruction of the result, we need ensure its result register is not
marked as dead before we use it in the second instruction.

llvm-svn: 220194
2014-10-20 11:00:18 +00:00
Bob Wilson 1e1f13862e Use triple predicate functions instead of checking values directly. NFC.
llvm-svn: 220155
2014-10-19 00:39:30 +00:00
Aaron Watry 8114437a8f R600/SI: Add global atomicrmw xchg
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220110
2014-10-17 23:33:03 +00:00
Aaron Watry d672ee2a47 R600/SI: Add global atomicrmw xor
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220109
2014-10-17 23:33:01 +00:00
Aaron Watry 8a911e6926 R600/SI: Add global atomicrmw or
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220108
2014-10-17 23:32:59 +00:00
Aaron Watry 58c9992f15 R600/SI: Add global atomicrmw min/umin
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220107
2014-10-17 23:32:57 +00:00
Aaron Watry 29f295d7a5 R600/SI: Add global atomicrmw max/umax
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220106
2014-10-17 23:32:56 +00:00
Aaron Watry 621278034c R600/SI: Add global atomicrmw and
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220105
2014-10-17 23:32:54 +00:00
Aaron Watry 328f1bae8e R600/SI: Add global atomicrmw sub
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220104
2014-10-17 23:32:52 +00:00
Bill Schmidt ba637db298 [PowerPC] Change assert to better form
llvm-svn: 220092
2014-10-17 21:19:59 +00:00
Matt Arsenault a708358e93 R600/SI: Remove redundant setting of instruction bits
These are all set on the instruction base classes.

llvm-svn: 220091
2014-10-17 21:13:11 +00:00
Bill Schmidt a087d74250 [PowerPC] Change liveness testing in VSX FMA mutation pass
With VSX enabled, LLVM crashes when compiling
test/CodeGen/PowerPC/fma.ll.  I traced this to the liveness test
that's revised in this patch. The interval test is designed to only
work for virtual registers, but in this case the AddendSrcReg is
physical. Since there is already a walk of the MIs between the
AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg
in that loop.  At Hal Finkel's request, I converted the liveness test
to an assert restricted to virtual registers.

I've changed the fma.ll test to have VSX and non-VSX variants so we
can test both kinds of multiply-adds.

llvm-svn: 220090
2014-10-17 21:02:44 +00:00
Matt Arsenault 933c38df40 Fix typo
llvm-svn: 220068
2014-10-17 18:02:31 +00:00
Matt Arsenault e184482bf8 R600/SI: Also check for FPImm literal constants
llvm-svn: 220067
2014-10-17 18:00:50 +00:00
Matt Arsenault d282ada508 R600/SI: Allow commuting with source modifiers
llvm-svn: 220066
2014-10-17 18:00:48 +00:00
Matt Arsenault 8943d24949 R600/SI: Simplify code with hasModifiersSet
llvm-svn: 220065
2014-10-17 18:00:45 +00:00
Matt Arsenault ace5b76739 R600/SI: Fix general commuting breaking src mods
The generic code trying to use findCommutedOpIndices won't
understand that it needs to swap the modifier operands also,
so it should fail if they are set.

llvm-svn: 220064
2014-10-17 18:00:43 +00:00
Matt Arsenault ffc5d5bbf0 R600/SI: Cleanup code with ChangeToFPImmediate
llvm-svn: 220063
2014-10-17 18:00:41 +00:00
Matt Arsenault 6d3cd544bb R600/SI: Allow comuting fp immediates
llvm-svn: 220062
2014-10-17 18:00:39 +00:00
Matt Arsenault aa5ccfb566 R600/SI: Use early return instead of checking condition twice
Any commutable instruction will have at least src1.

llvm-svn: 220061
2014-10-17 18:00:37 +00:00
Matt Arsenault 328b1193b5 R600/SI: Use complex pattern for MUBUF load patterns.
This eliminates a use of the SI_ADDR64_RSRC pseudo

llvm-svn: 220057
2014-10-17 17:43:00 +00:00
Matt Arsenault 83a535ff6b R600/SI: Remove SI_BUFFER_RSRC pseudo
Just use REG_SEQUENCE directly, so there are fewer
instructions to need to deal with later.

llvm-svn: 220056
2014-10-17 17:42:56 +00:00
Andrea Di Biagio c48cb86f05 [X86] Fix missed selection of non-temporal store of zero vector.
When the input to a store instruction was a zero vector, the backend
always selected a normal vector store regardless of the non-temporal
hint. This is fixed by this patch.

This fixes PR19370.

llvm-svn: 220054
2014-10-17 17:27:06 +00:00
James Molloy f497d5511d [AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering.
We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same.

While we're at it, avoid writing to std::vector::end()...

Bug found with random testing and a lot of coffee.

llvm-svn: 220051
2014-10-17 17:06:31 +00:00
Bill Schmidt 2d1128acb2 [PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation
Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64
types, but does not yet use lxvw4x and stxvw4x for 4x32 types.  This
patch adds that support.

As with lxvd2x/stxvd2x, this involves straightforward overriding of
the patterns normally recognized for lvx/stvx, with preference given
to the VSX patterns when VSX is enabled.

In addition, the logic for permitting misaligned memory accesses is
modified so that v4r32 and v4i32 are treated the same as v2f64 and
v2i64 when VSX is enabled.  Finally, the DAG generation for unaligned
loads is changed to just use a normal LOAD (which will become lxvw4x)
on P8 and later hardware, where unaligned loads are preferred over
lvsl/lvx/lvx/vperm.

A number of tests now generate the VSX loads/stores instead of
lvx/stvx, so this patch adds VSX variants to those tests.  I've also
added <4 x float> tests to the vsx.ll test case, and created a
vsx-p8.ll test case to be used for testing code generation for the
P8Vector feature.  For now, that simply tests the unaligned load/store
behavior.

This has been tested along with a temporary patch to enable the VSX
and P8Vector features, with no new regressions encountered with or
without the temporary patch applied.

llvm-svn: 220047
2014-10-17 15:13:38 +00:00
Jan Vesely 54468a5a58 Mips: Only set divrem i64 to custom on 64bit
Reviewed-by: Daniel Sanders <daniel.sanders@imgtec.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220046
2014-10-17 14:45:28 +00:00