llvm-project

Commit Graph

Author	SHA1	Message	Date
Justin Holewinski	4c47d87ba6	[NVPTX] Fix mis-use of CurrentFnSym in NVPTXAsmPrinter. This was causing a symbol name error in the output PTX. llvm-svn: 182298	2013-05-20 16:42:18 +00:00
Justin Holewinski	18f3a1ffe6	[NVPTX] Add programmatic interface to NVVMReflect pass llvm-svn: 182297	2013-05-20 16:42:16 +00:00
Hal Finkel	0859ef29d5	Rename PPC MTCTRse to MTCTRloop As the pairing of this instruction form with the bdnz/bdz branches is now enforced by the verification pass, make it clear from the name that these are used only for counter-based loops. No functionality change intended. llvm-svn: 182296	2013-05-20 16:08:37 +00:00
Hal Finkel	8ca3884147	Add a PPCCTRLoops verification pass When asserts are enabled, this adds a verification pass for PPC counter-loop formation. Unfortunately, without sacrificing code quality, there is no better way of forming counter-based loops except at the (late) IR level. This means that we need to recognize, at the IR level, anything which might turn into a function call (or indirect branch). Because this is currently a finite set of things, and because SelectionDAG lowering is basic-block local, this can be done. Nevertheless, it is fragile, and failure results in a miscompile. This verification pass checks that all (reachable) counter-based branches are dominated by a loop mtctr instruction, and that no instructions in between clobber the counter register. If these conditions are not satisfied, then an ICE will be triggered. In short, this is to help us sleep better at night. llvm-svn: 182295	2013-05-20 16:08:17 +00:00
Benjamin Kramer	927ca942ce	R600: Fix bug detected by GCC warning. R600TextureIntrinsicsReplacer.cpp:232: warning: the address of ‘ArgsType’ will always evaluate as ‘true’ This doesn't have any effect on the output as a vararg intrinsic behaves the same way as a non-vararg one. llvm-svn: 182293	2013-05-20 15:58:43 +00:00
Tom Stellard	f1ee716446	R600/SI: Use a multiclass for MUBUF_Load_Helper This will simplify the instructions and also the pattern definitions. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182288	2013-05-20 15:02:31 +00:00
Tom Stellard	b8458f88d6	R600/SI: Add a pattern for S_LOAD_DWORDX2_* instructions Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182287	2013-05-20 15:02:28 +00:00
Tom Stellard	d2eebf001e	R600/SI: Add pattern for rotr Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182286	2013-05-20 15:02:24 +00:00
Tom Stellard	5643c4ac72	R600: Swap the legality of rotl and rotr The hardware supports rotr and not rotl. llvm-svn: 182285	2013-05-20 15:02:19 +00:00
Tom Stellard	1cfd7a50bb	R600/SI: Add patterns for 64-bit shift operations Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182284	2013-05-20 15:02:12 +00:00
Tom Stellard	459a79a81c	R600/SI: Use the same names for VOP3 operands and encoding fields This makes it possible to reorder the operands without breaking the encoding. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182283	2013-05-20 15:02:08 +00:00
Tom Stellard	b35efba4d9	R600/SI: Make fitsRegClass() operands const Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182282	2013-05-20 15:02:01 +00:00
Mihai Popa	f41e3f56a5	VSTn instructions have a number of encoding constraints which are not implemented. I have added these using wrapper methods around the original custom decoder (incidentally - this is a huge poorly written method that should be cleaned up. I have left it as is since the changes would be much to hard to review). llvm-svn: 182281	2013-05-20 14:57:05 +00:00
Mihai Popa	dcf0922720	Q registers are encoded in fields of the same length as D registers. As Q registers are half as many, the ARM reference manual mandates the least significant bit to be zeroed out. Failure to do so should result in an undefined instruction. With this change test/MC/Disassembler/ARM/invalid-VQADD-arm.txt is passing (removed XFAIL). llvm-svn: 182279	2013-05-20 14:42:43 +00:00
Richard Sandiford	312425f32d	[SystemZ] Add long branch pass Before this change, the SystemZ backend would use BRCL for all branches and only consider shortening them to BRC when generating an object file. E.g. a branch on equal would use the JGE alias of BRCL in assembly output, but might be shortened to the JE alias of BRC in ELF output. This was a useful first step, but it had two problems: (1) The z assembler isn't traditionally supposed to perform branch shortening or branch relaxation. We followed this rule by not relaxing branches in assembler input, but that meant that generating assembly code and then assembling it would not produce the same result as going directly to object code; the former would give long branches everywhere, whereas the latter would use short branches where possible. (2) Other useful branches, like COMPARE AND BRANCH, do not have long forms. We would need to do something else before supporting them. (Although COMPARE AND BRANCH does not change the condition codes, the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction during codegen, so that we can safely lower it to a separate compare and long branch where necessary. This is not a valid transformation for the assembler proper to make.) This patch therefore moves branch relaxation to a pre-emit pass. For now, calls are still shortened from BRASL to BRAS by the assembler, although this too is not really the traditional behaviour. The first test takes about 1.5s to run, and there are likely to be more tests in this vein once further branch types are added. The feeling on IRC was that 1.5s is a bit much for a single test, so I've restricted it to SystemZ hosts for now. The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests. A later patch will remove the {{g}}s from that directory. llvm-svn: 182274	2013-05-20 14:23:08 +00:00
Justin Holewinski	01f89f0428	[NVPTX] Add GenericToNVVM IR converter to better handle idiomatic LLVM IR inputs This converter currently only handles global variables in address space 0. For these variables, they are promoted to address space 1 (global memory), and all uses are updated to point to the result of a cvta.global instruction on the new variable. The motivation for this is address space 0 global variables are illegal since we cannot declare variables in the generic address space. Instead, we place the variables in address space 1 and explicitly convert the pointer to address space 0. This is primarily intended to help new users who expect to be able to place global variables in the default address space. llvm-svn: 182254	2013-05-20 12:13:32 +00:00
Justin Holewinski	700b6fa934	[NVPTX] Fix i1 kernel parameters and global variables. ABI rules say we need to use .u8 for i1 parameters for kernels. llvm-svn: 182253	2013-05-20 12:13:28 +00:00
Stepan Dyatkovskiy	d0e34a200f	PR15868 fix. Introduction: In case when stack alignment is 8 and GPRs parameter part size is not N8: we add padding to GPRs part, so part's last byte must be recovered at address K8-1. We need to do it, since remained (stack) part of parameter starts from address K8, and we need to "attach" "GPRs head" without gaps to it: Stack: \|---- 8 bytes block ----\| \|---- 8 bytes block ----\| \|---- 8 bytes... [ [padding] [GPRs head] ] [ ------ Tail passed via stack ------ ... FIX: Note, once we added padding we need to correct all* Arg offsets that are going after padded one. That's why we need this fix: Arg offsets were never corrected before this patch. See new test-cases included in patch. We also don't need to insert padding for byval parameters that are stored in GPRs only. We need pad only last byval parameter and only in case it outsides GPRs and stack alignment = 8. Though, stack area, allocated for recovered byval params, must satisfy "Size mod 8 = 0" restriction. This patch reduces stack usage for some cases: We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be "packed" with alignment 4 in some cases. llvm-svn: 182237	2013-05-20 08:01:34 +00:00
Jakob Stoklund Olesen	f927800325	Also expand 64-bit bitcasts. llvm-svn: 182229	2013-05-20 01:01:43 +00:00
Jakob Stoklund Olesen	c7bc5fbc5c	Implement spill and fill of I64Regs. llvm-svn: 182228	2013-05-20 00:53:25 +00:00
Jakob Stoklund Olesen	751e9b8407	Mark i64 SETCC as expand so it is turned into a SELECT_CC. llvm-svn: 182227	2013-05-20 00:28:36 +00:00
Benjamin Kramer	8bad66e586	Replace some bit operations with simpler ones. No functionality change. llvm-svn: 182226	2013-05-19 22:01:57 +00:00
Jakob Stoklund Olesen	86c5469d26	Don't use %g0 to materialize 0 directly. The wired physreg doesn't work on tied operands like on MOVXCC. Add a README note to fix this later. llvm-svn: 182225	2013-05-19 21:47:13 +00:00
Jakob Stoklund Olesen	92ebf1153e	Select i64 values with %icc conditions. llvm-svn: 182224	2013-05-19 20:38:21 +00:00
Jakob Stoklund Olesen	7ca944b9db	Add floating point selects on %xcc predicates. llvm-svn: 182222	2013-05-19 20:33:11 +00:00
Jakob Stoklund Olesen	4a78c86a6a	Implement SPselectfcc for i64 operands. Also clean up the arguments to all the MOVCC instructions so the operands always are (true-val, false-val, cond-code). llvm-svn: 182221	2013-05-19 20:20:54 +00:00
Venkatraman Govindaraju	3320e5a921	[Sparc] Rearrange integer registers' allocation order so that register allocator will use I and G registers before using L and O registers. Also, enable registers %g2-%g4 to be used in application and %g5 in 64 bit mode. llvm-svn: 182219	2013-05-19 20:07:20 +00:00
Jakob Stoklund Olesen	ead983cec9	Handle i64 FrameIndex nodes in SPARC v9 mode. llvm-svn: 182216	2013-05-19 19:14:24 +00:00
Hal Finkel	2f474f0e8a	Check InlineAsm clobbers in PPCCTRLoops We don't need to reject all inline asm as using the counter register (most does not). Only those that explicitly clobber the counter register need to prevent the transformation. llvm-svn: 182191	2013-05-18 09:20:39 +00:00
Tim Northover	fd2639f784	AArch64: add CMake dependency to fix very parallel builds llvm-svn: 182190	2013-05-18 08:17:47 +00:00
David Majnemer	5ba473afb0	X86: Bad peephole interaction between adc, MOV32r0 The peephole tries to reorder MOV32r0 instructions such that they are before the instruction that modifies EFLAGS. The problem is that the peephole does not consider the case where the instruction that modifies EFLAGS also depends on the previous state of EFLAGS. Instead, walk backwards until we find an instruction that has a def for EFLAGS but does not have a use. If we find such an instruction, insert the MOV32r0 before it. If it cannot find such an instruction, skip the optimization. llvm-svn: 182184	2013-05-18 01:02:03 +00:00
Matt Arsenault	75865923c9	Add LLVMContext argument to getSetCCResultType llvm-svn: 182180	2013-05-18 00:21:46 +00:00
JF Bastien	97b08c404c	Support unaligned load/store on more ARM targets This patch matches GCC behavior: the code used to only allow unaligned load/store on ARM for v6+ Darwin, it will now allow unaligned load/store for v6+ Darwin as well as for v7+ on Linux and NaCl. The distinction is made because v6 doesn't guarantee support (but LLVM assumes that Apple controls hardware+kernel and therefore have conformant v6 CPUs), whereas v7 does provide this guarantee (and Linux/NaCl behave sanely). The patch keeps the -arm-strict-align command line option, and adds -arm-no-strict-align. They behave similarly to GCC's -mstrict-align and -mnostrict-align. I originally encountered this discrepancy in FastIsel tests which expect unaligned load/store generation. Overall this should slightly improve performance in most cases because of reduced I$ pressure. llvm-svn: 182175	2013-05-17 23:49:01 +00:00
Rafael Espindola	5986ce0e5d	Fix the build in c++11 mode. The errors were: non-constant-expression cannot be narrowed from type 'int64_t' (aka 'long') to 'uint32_t' (aka 'unsigned int') in initializer list and non-constant-expression cannot be narrowed from type 'long' to 'uint32_t' (aka 'unsigned int') in initializer list llvm-svn: 182168	2013-05-17 22:45:52 +00:00
Vincent Lejeune	d3fcb5016c	R600: Lower int_load_input to copyFromReg instead of Register node It solves a bug uncovered by dot4 patch where the register class of int_load_input use was ignored. llvm-svn: 182130	2013-05-17 16:51:06 +00:00
Vincent Lejeune	3d5118ca40	R600: Use bottom up scheduling algorithm llvm-svn: 182129	2013-05-17 16:50:56 +00:00
Vincent Lejeune	4c81d4da6f	R600: Use depth first scheduling algorithm It should increase PV substitution opportunities and lower gpr usage (pending computations path are "flushed" sooner) llvm-svn: 182128	2013-05-17 16:50:44 +00:00
Vincent Lejeune	e958c8e0d8	R600: Replace big texture opcode switch in scheduler by usesTC/usesVC llvm-svn: 182127	2013-05-17 16:50:37 +00:00
Vincent Lejeune	519f21eed3	R600: Relax some vector constraints on Dot4. Dot4 now uses 8 scalar operands instead of 2 vectors one which allows register coalescer to remove some unneeded COPY. This patch also defines some structures/functions that can be used to handle every vector instructions (CUBE, Cayman special instructions...) in a similar fashion. llvm-svn: 182126	2013-05-17 16:50:32 +00:00
Vincent Lejeune	d3eed66e8c	R600: Improve texture handling llvm-svn: 182125	2013-05-17 16:50:20 +00:00
Vincent Lejeune	4ebef18ab5	R600: Rename 128 bit registers. Almost all instructions that takes a 128 bits reg as input (fetch, export...) have the abilities to swizzle their argument and output. Instead of printing default swizzle for each 128 bits reg, rename T.XYZW to T and let instructions print potentially optimized swizzles themselves. llvm-svn: 182124	2013-05-17 16:50:09 +00:00
Vincent Lejeune	0fca91d52e	R600: Some factorization llvm-svn: 182123	2013-05-17 16:50:02 +00:00
Vincent Lejeune	f9f4e1e7db	R600: Factorize Fetch size limit inside AMDGPUSubTarget llvm-svn: 182122	2013-05-17 16:49:55 +00:00
Vincent Lejeune	709e01688d	R600: prettier dump of clamp llvm-svn: 182121	2013-05-17 16:49:49 +00:00
Tom Stellard	ecc2ad1cd4	R600: Fix encoding for R600 family GPUs Reviewed-by: Vincent Lejeune <vljn@ovi.com> https://bugs.freedesktop.org/show_bug.cgi?id=64193 https://bugs.freedesktop.org/show_bug.cgi?id=64257 https://bugs.freedesktop.org/show_bug.cgi?id=64320 NOTE: This is a candidate for the 3.3 branch. llvm-svn: 182113	2013-05-17 15:23:21 +00:00
Tom Stellard	edade94bbc	R600: Pass MCSubtargetInfo reference to R600CodeEmitter llvm-svn: 182112	2013-05-17 15:23:12 +00:00
Venkatraman Govindaraju	641b0b5a21	[Sparc] Implements hasReservedCallFrame and hasFP. This is to generate correct framesetup code when the function has variable sized allocas. llvm-svn: 182108	2013-05-17 15:14:34 +00:00
Benjamin Kramer	fc33e1d99b	X86: Make shuffle -> shift conversion more aggressive about undefs. Shuffles that only move an element into position 0 of the vector are common in the output of the loop vectorizer and often generate suboptimal code when SSSE3 is not available. Lower them to vector shifts if possible. We still prefer palignr over psrldq because it has higher throughput on sandybridge. llvm-svn: 182102	2013-05-17 14:48:34 +00:00
Ulrich Weigand	2dbe06a987	[PowerPC] Fix hi/lo encoding in old-style code emitter This patch implements the equivalent change to r182091/r182092 in the old-style code emitter. Instead of having two separate 16-bit immediate encoding routines depending on the instruction, this patch introduces a single encoder that checks the machine operand flags to decide whether the low or high half of a symbol address is required. Since now both encoders make no further distinction between "symbolLo" and "symbolHi", the .td operand can now use a single getS16ImmEncoding method. Tested by running the old-style JIT tests on 32-bit Linux. llvm-svn: 182097	2013-05-17 14:14:12 +00:00
Ulrich Weigand	6e23ac606e	[PowerPC] Merge/rename PPC fixup types Now that fixup_ppc_ha16 and fixup_ppc_lo16 are being treated exactly the same everywhere, it no longer makes sense to have two fixup types. This patch merges them both into a single type fixup_ppc_half16, and renames fixup_ppc_lo16_ds to fixup_ppc_half16ds for consistency. (The half16 and half16ds names are taken from the description of relocation types in the PowerPC ABI.) No change in code generation expected. llvm-svn: 182092	2013-05-17 12:37:21 +00:00

1 2 3 4 5 ...

24315 Commits