Commit Graph

24300 Commits

Author SHA1 Message Date
Justin Holewinski 01f89f0428 [NVPTX] Add GenericToNVVM IR converter to better handle idiomatic LLVM IR inputs
This converter currently only handles global variables in address space 0. For
these variables, they are promoted to address space 1 (global memory), and all
uses are updated to point to the result of a cvta.global instruction on the new
variable.

The motivation for this is address space 0 global variables are illegal since we
cannot declare variables in the generic address space.  Instead, we place the
variables in address space 1 and explicitly convert the pointer to address
space 0. This is primarily intended to help new users who expect to be able to
place global variables in the default address space.

llvm-svn: 182254
2013-05-20 12:13:32 +00:00
Justin Holewinski 700b6fa934 [NVPTX] Fix i1 kernel parameters and global variables. ABI rules say we need to use .u8 for i1 parameters for kernels.
llvm-svn: 182253
2013-05-20 12:13:28 +00:00
Stepan Dyatkovskiy d0e34a200f PR15868 fix.
Introduction:
In case when stack alignment is 8 and GPRs parameter part size is not N*8:
we add padding to GPRs part, so part's last byte must be recovered at
address K*8-1.
We need to do it, since remained (stack) part of parameter starts from
address K*8, and we need to "attach" "GPRs head" without gaps to it:

Stack:
|---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8 bytes...
[ [padding] [GPRs head] ] [ ------ Tail passed via stack  ------ ...

FIX:
Note, once we added padding we need to correct *all* Arg offsets that are going
after padded one. That's why we need this fix: Arg offsets were never corrected
before this patch. See new test-cases included in patch.

We also don't need to insert padding for byval parameters that are stored in GPRs
only. We need pad only last byval parameter and only in case it outsides GPRs
and stack alignment = 8.
Though, stack area, allocated for recovered byval params, must satisfy
"Size mod 8 = 0" restriction.

This patch reduces stack usage for some cases:
We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be
"packed" with alignment 4 in some cases.

llvm-svn: 182237
2013-05-20 08:01:34 +00:00
Jakob Stoklund Olesen f927800325 Also expand 64-bit bitcasts.
llvm-svn: 182229
2013-05-20 01:01:43 +00:00
Jakob Stoklund Olesen c7bc5fbc5c Implement spill and fill of I64Regs.
llvm-svn: 182228
2013-05-20 00:53:25 +00:00
Jakob Stoklund Olesen 751e9b8407 Mark i64 SETCC as expand so it is turned into a SELECT_CC.
llvm-svn: 182227
2013-05-20 00:28:36 +00:00
Benjamin Kramer 8bad66e586 Replace some bit operations with simpler ones. No functionality change.
llvm-svn: 182226
2013-05-19 22:01:57 +00:00
Jakob Stoklund Olesen 86c5469d26 Don't use %g0 to materialize 0 directly.
The wired physreg doesn't work on tied operands like on MOVXCC.

Add a README note to fix this later.

llvm-svn: 182225
2013-05-19 21:47:13 +00:00
Jakob Stoklund Olesen 92ebf1153e Select i64 values with %icc conditions.
llvm-svn: 182224
2013-05-19 20:38:21 +00:00
Jakob Stoklund Olesen 7ca944b9db Add floating point selects on %xcc predicates.
llvm-svn: 182222
2013-05-19 20:33:11 +00:00
Jakob Stoklund Olesen 4a78c86a6a Implement SPselectfcc for i64 operands.
Also clean up the arguments to all the MOVCC instructions so the
operands always are (true-val, false-val, cond-code).

llvm-svn: 182221
2013-05-19 20:20:54 +00:00
Venkatraman Govindaraju 3320e5a921 [Sparc] Rearrange integer registers' allocation order so that register allocator will use I and G registers before using L and O registers.
Also, enable registers %g2-%g4 to be used in application and %g5 in 64 bit mode.

llvm-svn: 182219
2013-05-19 20:07:20 +00:00
Jakob Stoklund Olesen ead983cec9 Handle i64 FrameIndex nodes in SPARC v9 mode.
llvm-svn: 182216
2013-05-19 19:14:24 +00:00
Hal Finkel 2f474f0e8a Check InlineAsm clobbers in PPCCTRLoops
We don't need to reject all inline asm as using the counter register (most does
not). Only those that explicitly clobber the counter register need to prevent
the transformation.

llvm-svn: 182191
2013-05-18 09:20:39 +00:00
Tim Northover fd2639f784 AArch64: add CMake dependency to fix very parallel builds
llvm-svn: 182190
2013-05-18 08:17:47 +00:00
David Majnemer 5ba473afb0 X86: Bad peephole interaction between adc, MOV32r0
The peephole tries to reorder MOV32r0 instructions such that they are
before the instruction that modifies EFLAGS.

The problem is that the peephole does not consider the case where the
instruction that modifies EFLAGS also depends on the previous state of
EFLAGS.

Instead, walk backwards until we find an instruction that has a def for
EFLAGS but does not have a use.
If we find such an instruction, insert the MOV32r0 before it.
If it cannot find such an instruction, skip the optimization.

llvm-svn: 182184
2013-05-18 01:02:03 +00:00
Matt Arsenault 75865923c9 Add LLVMContext argument to getSetCCResultType
llvm-svn: 182180
2013-05-18 00:21:46 +00:00
JF Bastien 97b08c404c Support unaligned load/store on more ARM targets
This patch matches GCC behavior: the code used to only allow unaligned
load/store on ARM for v6+ Darwin, it will now allow unaligned load/store
for v6+ Darwin as well as for v7+ on Linux and NaCl.

The distinction is made because v6 doesn't guarantee support (but LLVM
assumes that Apple controls hardware+kernel and therefore have
conformant v6 CPUs), whereas v7 does provide this guarantee (and
Linux/NaCl behave sanely).

The patch keeps the -arm-strict-align command line option, and adds
-arm-no-strict-align. They behave similarly to GCC's -mstrict-align and
-mnostrict-align.

I originally encountered this discrepancy in FastIsel tests which expect
unaligned load/store generation. Overall this should slightly improve
performance in most cases because of reduced I$ pressure.

llvm-svn: 182175
2013-05-17 23:49:01 +00:00
Rafael Espindola 5986ce0e5d Fix the build in c++11 mode.
The errors were:

non-constant-expression cannot be narrowed from type 'int64_t' (aka 'long') to 'uint32_t' (aka 'unsigned int') in initializer list

and

non-constant-expression cannot be narrowed from type 'long' to 'uint32_t' (aka 'unsigned int') in initializer list

llvm-svn: 182168
2013-05-17 22:45:52 +00:00
Vincent Lejeune d3fcb5016c R600: Lower int_load_input to copyFromReg instead of Register node
It solves a bug uncovered by dot4 patch where the register class of
int_load_input use was ignored.

llvm-svn: 182130
2013-05-17 16:51:06 +00:00
Vincent Lejeune 3d5118ca40 R600: Use bottom up scheduling algorithm
llvm-svn: 182129
2013-05-17 16:50:56 +00:00
Vincent Lejeune 4c81d4da6f R600: Use depth first scheduling algorithm
It should increase PV substitution opportunities and lower gpr
usage (pending computations path are "flushed" sooner)

llvm-svn: 182128
2013-05-17 16:50:44 +00:00
Vincent Lejeune e958c8e0d8 R600: Replace big texture opcode switch in scheduler by usesTC/usesVC
llvm-svn: 182127
2013-05-17 16:50:37 +00:00
Vincent Lejeune 519f21eed3 R600: Relax some vector constraints on Dot4.
Dot4 now uses 8 scalar operands instead of 2 vectors one which allows register
coalescer to remove some unneeded COPY.
This patch also defines some structures/functions that can be used to handle
every vector instructions (CUBE, Cayman special instructions...) in a similar
fashion.

llvm-svn: 182126
2013-05-17 16:50:32 +00:00
Vincent Lejeune d3eed66e8c R600: Improve texture handling
llvm-svn: 182125
2013-05-17 16:50:20 +00:00
Vincent Lejeune 4ebef18ab5 R600: Rename 128 bit registers.
Almost all instructions that takes a 128 bits reg as input (fetch, export...)
have the abilities to swizzle their argument and output. Instead of printing
default swizzle for each 128 bits reg, rename T*.XYZW to T* and let instructions
print potentially optimized swizzles themselves.

llvm-svn: 182124
2013-05-17 16:50:09 +00:00
Vincent Lejeune 0fca91d52e R600: Some factorization
llvm-svn: 182123
2013-05-17 16:50:02 +00:00
Vincent Lejeune f9f4e1e7db R600: Factorize Fetch size limit inside AMDGPUSubTarget
llvm-svn: 182122
2013-05-17 16:49:55 +00:00
Vincent Lejeune 709e01688d R600: prettier dump of clamp
llvm-svn: 182121
2013-05-17 16:49:49 +00:00
Tom Stellard ecc2ad1cd4 R600: Fix encoding for R600 family GPUs
Reviewed-by: Vincent Lejeune <vljn@ovi.com>

https://bugs.freedesktop.org/show_bug.cgi?id=64193
https://bugs.freedesktop.org/show_bug.cgi?id=64257
https://bugs.freedesktop.org/show_bug.cgi?id=64320

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 182113
2013-05-17 15:23:21 +00:00
Tom Stellard edade94bbc R600: Pass MCSubtargetInfo reference to R600CodeEmitter
llvm-svn: 182112
2013-05-17 15:23:12 +00:00
Venkatraman Govindaraju 641b0b5a21 [Sparc] Implements hasReservedCallFrame and hasFP.
This is to generate correct framesetup code when the function
 has variable sized allocas.

llvm-svn: 182108
2013-05-17 15:14:34 +00:00
Benjamin Kramer fc33e1d99b X86: Make shuffle -> shift conversion more aggressive about undefs.
Shuffles that only move an element into position 0 of the vector are common in
the output of the loop vectorizer and often generate suboptimal code when SSSE3
is not available. Lower them to vector shifts if possible.

We still prefer palignr over psrldq because it has higher throughput on
sandybridge.

llvm-svn: 182102
2013-05-17 14:48:34 +00:00
Ulrich Weigand 2dbe06a987 [PowerPC] Fix hi/lo encoding in old-style code emitter
This patch implements the equivalent change to r182091/r182092
in the old-style code emitter.  Instead of having two separate
16-bit immediate encoding routines depending on the instruction,
this patch introduces a single encoder that checks the machine
operand flags to decide whether the low or high half of a
symbol address is required.

Since now both encoders make no further distinction between
"symbolLo" and "symbolHi", the .td operand can now use a
single getS16ImmEncoding method.

Tested by running the old-style JIT tests on 32-bit Linux.

llvm-svn: 182097
2013-05-17 14:14:12 +00:00
Ulrich Weigand 6e23ac606e [PowerPC] Merge/rename PPC fixup types
Now that fixup_ppc_ha16 and fixup_ppc_lo16 are being treated exactly
the same everywhere, it no longer makes sense to have two fixup types.

This patch merges them both into a single type fixup_ppc_half16,
and renames fixup_ppc_lo16_ds to fixup_ppc_half16ds for consistency.
(The half16 and half16ds names are taken from the description of
relocation types in the PowerPC ABI.)

No change in code generation expected.

llvm-svn: 182092
2013-05-17 12:37:21 +00:00
Ulrich Weigand 994f49ed79 [PowerPC] Fix processing of ha16/lo16 fixups
The current PowerPC MC back end distinguishes between fixup_ppc_ha16
and fixup_ppc_lo16, which are determined by the instruction the fixup
applies to, and uses this distinction to decide whether a fixup ought
to resolve to the high or the low part of a symbol address.

This isn't quite correct, however.  It is valid -if unusual- assembler
to use, e.g.
  li 1, symbol@ha
or
  lis 1, symbol@l
Whether the high or the low part of the address is used depends solely
on the @ suffix, not on the instruction.

In addition, both
  li 1, symbol
and
  lis 1, symbol
are valid, assuming the symbol address fits into 16 bits; again, both
will then refer to the actual symbol value (so li will load the value
itself, while lis will load the value shifted by 16).


To fix this, two places need to be adapted.  If the fixup cannot be
resolved at assembler time, a relocation needs to be emitted via
PPCELFObjectWriter::getRelocType.  This routine already looks at
the VK_ type to determine the relocation.  The only problem is that
will reject any _LO modifier in a ha16 fixup and vice versa.  This
is simply incorrect; any of those modifiers ought to be accepted
for either fixup type.

If the fixup *can* be resolved at assembler time, adjustFixupValue
currently selects the high bits of the symbol value if the fixup
type is ha16.  Again, this is incorrect; see the above example
  lis 1, symbol

Now, in theory we'd have to respect a VK_ modifier here.  However,
in fact common code never even attempts to resolve symbol references
using any nontrivial VK_ modifier at assembler time; it will always
fall back to emitting a reloc and letting the linker handle it.

If this ever changes, presumably there'd have to be a target callback
to resolve VK_ modifiers.  We'd then have to handle @ha etc. there.

llvm-svn: 182091
2013-05-17 12:36:29 +00:00
Benjamin Kramer 2057a2b86f Don't cast away constness.
llvm-svn: 182086
2013-05-17 11:39:41 +00:00
Christian Konig b7be72df5b R600/SI: return undef instead of null for skipped arguments
This is a candidate for the stable branch.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=64694

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 182084
2013-05-17 09:46:48 +00:00
Venkatraman Govindaraju 54bf611c79 [Sparc] Prevent instructions that defines or uses %o7 to be in call's delay slot.
llvm-svn: 182063
2013-05-16 23:53:29 +00:00
Akira Hatanaka 252f54f769 [mips] Improve instruction selection for pattern (store (fp_to_sint $src), $ptr).
Previously, three instructions were needed:

trunc.w.s $f0, $f2
mfc1 $4, $f0
sw $4, 0($2)

Now we need only two:

trunc.w.s $f0, $f2
swc1 $f0, 0($2)

llvm-svn: 182053
2013-05-16 21:17:15 +00:00
Rafael Espindola b08d2c2db0 Remove addFrameMove.
Now that we have good testing, remove addFrameMove and create cfi
instructions directly.

llvm-svn: 182052
2013-05-16 21:02:15 +00:00
Akira Hatanaka d82ee940c3 [mips] Factor out unaligned store lowering code.
llvm-svn: 182050
2013-05-16 20:45:17 +00:00
Jack Carter 03f0fd37a9 Mips assembler: Add TwoOperandConstraint definitions
This patch removes alias definition for addiu $rs,$imm 
and instead uses the TwoOperandAliasConstraint field in 
the ArithLogicI instruction class. 

This way all instructions that inherit ArithLogicI class 
have the same macro defined. 

The usage examples are added to test files.

Patch by Vladimir Medic

llvm-svn: 182048
2013-05-16 20:24:27 +00:00
Jack Carter 59817110ff Mips td file formatting: white space and long lines
llvm-svn: 182047
2013-05-16 20:08:49 +00:00
Hal Finkel 5f587c59a5 Create an new preheader in PPCCTRLoops to avoid counter register clobbers
Some IR-level instructions (such as FP <-> i64 conversions) are not chained
w.r.t. the mtctr intrinsic and yet may become function calls that clobber the
counter register. At the selection-DAG level, these might be reordered with the
mtctr intrinsic causing miscompiles. To avoid this situation, if an existing
preheader has instructions that might use the counter register, create a new
preheader for the mtctr intrinsic. This extra block will be remerged with the
old preheader at the MI level, but will prevent unwanted reordering at the
selection-DAG level.

llvm-svn: 182045
2013-05-16 19:58:38 +00:00
Akira Hatanaka fce4dd7974 [mips] Test case for r182042. Add comment.
llvm-svn: 182044
2013-05-16 19:57:23 +00:00
Akira Hatanaka 39d40f7baf [mips] Fix instruction selection pattern for sint_to_fp node to avoid emitting an
invalid instruction sequence.

Rather than emitting an int-to-FP move instruction and an int-to-FP conversion
instruction during instruction selection, we emit a pseudo instruction which gets
expanded post-RA. Without this change, register allocation can possibly insert a
floating point register move instruction between the two instructions, which is not
valid according to the ISA manual.

mtc1 $f4, $4         # int-to-fp move instruction.
mov.s $f2, $f4       # move contents of $f4 to $f2.
cvt.s.w $f0, $f2     # int-to-fp conversion.

llvm-svn: 182042
2013-05-16 19:48:37 +00:00
Jack Carter 51785c4715 Mips assembler: Add branch macro definitions
This patch adds bnez and beqz instructions which represent alias definitions for bne and beq instructions as follows:
bnez $rs,$imm => bne $rs,$zero,$imm
beqz $rs,$imm => beq $rs,$zero,$imm

The corresponding test cases are added.

Patch by Vladimir Medic

llvm-svn: 182040
2013-05-16 19:40:19 +00:00
Akira Hatanaka 21bab5badc [mips] Fix indentation.
llvm-svn: 182036
2013-05-16 18:42:42 +00:00
Akira Hatanaka 7b6e4f1366 [mips] Delete unused enum value.
llvm-svn: 182035
2013-05-16 18:40:12 +00:00