llvm-project

Commit Graph

Author	SHA1	Message	Date
David Majnemer	3821ff03cd	X86: Simplify X86WindowsTargetObjectFile::getSectionForConstant There exists a helper function to abstract away the various differences between ConstantVector, ConstantDataVector, ConstantAggregateZero, etc. Use it to simplify X86WindowsTargetObjectFile::getSectionForConstant. llvm-svn: 213104	2014-07-15 23:01:10 +00:00
Sanjay Patel	a2f658d69d	Move Post RA Scheduling flag bit into SchedMachineModel Refactoring; no functional changes intended Removed PostRAScheduler bits from subtargets (X86, ARM). Added PostRAScheduler bit to MCSchedModel class. This bit is set by a CPU's scheduling model (if it exists). Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses. Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!). Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling. Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values. Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. c. PPC overrides the CPU's postRA settings by enabling postRA for everything. d. X86 is the only target that actually has postRA specified via sched model info. Differential Revision: http://reviews.llvm.org/D4217 llvm-svn: 213101	2014-07-15 22:39:58 +00:00
Matt Arsenault	0d89e849bd	R600/SI: Fix select on i1 llvm-svn: 213096	2014-07-15 21:44:37 +00:00
Matt Arsenault	e9fa3b8e6b	R600/SI: Implement less wrong f32 fdiv Assuming single precision denormals and accurate sqrt/div are not reported, this passes the OpenCL conformance test. llvm-svn: 213089	2014-07-15 20:18:31 +00:00
Matt Arsenault	1d077749ea	R600: Add predicate for UnsafeFPMath llvm-svn: 213088	2014-07-15 20:18:24 +00:00
Matt Arsenault	84446a026b	R600: Remove intrinsics that appear to be unused llvm-svn: 213087	2014-07-15 20:10:27 +00:00
Chris Bieneman	03695ab57e	[RegisterCoalescer] Add new subtarget hook allowing targets to opt-out of coalescing. The coalescer is very aggressive at propagating constraints on the register classes, and the register allocator doesn’t know how to split sub-registers later to recover. This patch provides an escape valve for targets that encounter this problem to limit coalescing. This patch also implements such for ARM to lower register pressure when using lots of large register classes. This works around PR18825. llvm-svn: 213078	2014-07-15 17:18:41 +00:00
Cameron McInally	44f3e30cf2	Revert r213070. It's breaking the build in MCELFStreamer::EmitInstToData(...). llvm-svn: 213073	2014-07-15 16:24:24 +00:00
Jan Vesely	6ddb8dd442	R600: Implement zero undef variants of ctlz/cttz v2: use ffbh/l if available v3: Rebase on top of Matt's SI patches Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 213072	2014-07-15 15:51:09 +00:00
Daniel Sanders	a6e125f07e	[mips] Correct .MIPS.abiflags fp_abi field for -mfpxx and without .module Summary: Previously all the test cases set it after initialization with '.module fp=xx'. Differential Revision: http://reviews.llvm.org/D4489 llvm-svn: 213071	2014-07-15 15:31:39 +00:00
Cameron McInally	53bc7a3330	Add x86 patterns to match a specific add-with-carry. llvm-svn: 213070	2014-07-15 15:03:32 +00:00
NAKAMURA Takumi	04b8b37f56	Prune Redundant libdeps in CMake's target_link_libraries and LLVMBuild.txt. I checked this with Release+Asserts on x86_64-mingw32. Please restore partially if this were overkill. llvm-svn: 213064	2014-07-15 11:37:03 +00:00
Andrea Di Biagio	04d5a7b337	Silence a warning in conditional expression. Fixes a gcc warning caused by a typo. A redundant assignment operation was accidentally used as the third operand of a conditional expression. No functional change intended. llvm-svn: 213061	2014-07-15 10:53:44 +00:00
Tim Northover	e4b8e138e1	AArch64: fall back to generic code for out of range extract/insert. rdar://problem/17624784 llvm-svn: 213059	2014-07-15 10:00:26 +00:00
David Majnemer	d4d9944416	Fix typo in comment No functionality changed. llvm-svn: 213052	2014-07-15 07:11:32 +00:00
Juergen Ributzka	8f073c8d60	[FastISel][X86] Remove no longer needed functions. llvm-svn: 213051	2014-07-15 06:35:53 +00:00
Juergen Ributzka	3566c08dd9	[FastISel][X86] Implement the FastLowerIntrinsicCall hook. Rename X86VisitIntrinsicCall -> FastLowerIntrinsicCall, which effectively implements the target hook. llvm-svn: 213050	2014-07-15 06:35:50 +00:00
Juergen Ributzka	23d43318c7	[FastISel][X86] Implement the FastLowerCall hook. This implements the FastLowerCall hook, which is based on the DoSelectCall function. The implementation is very similar, but the target-independent call lowering part has been factored out. This should also enable patchpoint intrinsic lowering for FastISel on X86. Related to <rdar://problem/17427052>. llvm-svn: 213049	2014-07-15 06:35:47 +00:00
Juergen Ributzka	5ee9d90248	Revert "[FastISel][X86] Remove no longer needed functions." Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook." Revert "[FastISel][X86] Implement the FastLowerCall hook." This reverts commit r213035, r213036, and r213037 to make the buildbots happy again. llvm-svn: 213048	2014-07-15 05:23:40 +00:00
David Majnemer	4e3ccc0505	CodeGen: Handle ConstantVector and undef in WinCOFF constant pools The constant pool entry code for WinCOFF assumed that vector constants would be formed using ConstantDataVector, it did not expect to see a ConstantVector. Furthermore, it did not expect undef as one of the elements of the vector. ConstantVectors should be handled like ConstantDataVectors, treat Undef as zero. llvm-svn: 213038	2014-07-15 02:34:12 +00:00
Juergen Ributzka	9fbf33d70f	[FastISel][X86] Remove no longer needed functions. llvm-svn: 213037	2014-07-15 02:22:56 +00:00
Juergen Ributzka	170f9354bb	[FastISel][X86] Implement the FastLowerIntrinsicCall hook. Rename X86VisitIntrinsicCall -> FastLowerIntrinsicCall, which effectively implements the target hook. llvm-svn: 213036	2014-07-15 02:22:53 +00:00
Juergen Ributzka	a9cced8a94	[FastISel][X86] Implement the FastLowerCall hook. This implements the FastLowerCall hook, which is based on the DoSelectCall function. The implementation is very similar, but the target-independent call lowering part has been factored out. This should also enable patchpoint intrinsic lowering for FastISel on X86. Related to <rdar://problem/17427052>. llvm-svn: 213035	2014-07-15 02:22:49 +00:00
Matt Arsenault	ca3976f7ae	R600: Add dag combine for copy of an illegal type. This helps avoid redundant instructions to unpack, and repack the vectors. Ideally we could recognize that pattern and eliminate it. Currently v4i8 and other small element type vectors are scalarized, so this has the added bonus of avoiding that. llvm-svn: 213031	2014-07-15 02:06:31 +00:00
Matt Arsenault	f171cf23b8	R600: Add denormal handling subtarget features. llvm-svn: 213018	2014-07-14 23:40:49 +00:00
Matt Arsenault	c6ae7b4763	R600/SI: Default to no single precision denormals. llvm-svn: 213017	2014-07-14 23:40:43 +00:00
Adam Nemet	cf7c905cfb	[X86] Specify all TSFlags bit-offsets symbolically No functional change. The offsets for the other bitfields are specified symbolically. I need to increase the size for one of the earlier fields which is easier after this cleanup. Why these bits are relative to VEXShift is a bit strange but that is for another cleanup. I made sure that the values for the enums are unchanged after this change. llvm-svn: 213011	2014-07-14 23:18:39 +00:00
David Majnemer	8bce66b093	CodeGen: Stick constant pool entries in COMDAT sections for WinCOFF COFF lacks a feature that other object file formats support: mergeable sections. To work around this, MSVC sticks constant pool entries in special COMDAT sections so that each constant is in it's own section. This permits unused constants to be dropped and it also allows duplicate constants in different translation units to get merged together. This fixes PR20262. Differential Revision: http://reviews.llvm.org/D4482 llvm-svn: 213006	2014-07-14 22:57:27 +00:00
Saleem Abdulrasool	b51d464f1e	X86: correct 64-bit atomics on 32-bit We would emit a libcall for a 64-bit atomic on x86 after SVN r212119. This was due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a 32-bit target. They were added at different times and would result in the border condition being mishandled. This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic operations on x86 at the cost of restoring a long-standing bug in the codegen. We emit a cmpxchg8b on all x86 targets even where the CPU does not support this instruction (pre-Pentium CPUs). Although this bug should be fixed, this was present prior to SVN r212119 and this change, so this is not really introducing a regression. llvm-svn: 212956	2014-07-14 16:28:13 +00:00
Tim Northover	6c647eae8b	X86: remove temporary atomicrmw used during lowering. We construct a temporary "atomicrmw xchg" instruction when lowering atomic stores for widths that aren't supported natively. This isn't on the top-level worklist though, so it won't be removed automatically and we have to do it ourselves once that itself has been lowered. Thanks Saleem for pointing this out! llvm-svn: 212948	2014-07-14 15:31:13 +00:00
Daniel Sanders	41ffa5d1ba	Re-commit: [mips] Correct section alignments and EntrySizes for .bss, .text, .data, .reginfo, .MIPS.options, and .MIPS.abiflags The lld tests will temporarily fail again but Simon Atanasyan will commit a fix for those shortly. llvm-svn: 212946	2014-07-14 15:05:51 +00:00
Daniel Sanders	cb0d36e592	Revert: [mips] Correct section alignments and EntrySizes for .bss, .text, .data, .reginfo, .MIPS.options, and .MIPS.abiflags This commit causes multiple lld tests to fail. Reverting while I investigate the issue. llvm-svn: 212945	2014-07-14 14:43:45 +00:00
Daniel Sanders	8e254166e1	[mips] Correct section alignments and EntrySizes for .bss, .text, .data, .reginfo, .MIPS.options, and .MIPS.abiflags Summary: .bss, .text, and .data are at least 16-byte aligned. .reginfo is 4-byte aligned and has a 24-byte EntrySize. .MIPS.abiflags has an 24-byte EntrySize. .MIPS.options is 8-byte aligned and has 1-byte EntrySize. Using a 1-byte EntrySize for .MIPS.options seems strange because the records are neither 1-byte long nor fixed-length but this matches the value that GAS emits. Differential Revision: http://reviews.llvm.org/D4487 llvm-svn: 212939	2014-07-14 14:02:14 +00:00
Daniel Sanders	7ddb0ab85f	[mips] For the FP64A ABI, odd-numbered double-precision moves must not use mtc1/mfc1. Summary: This is because the FP64A the hardware will redirect 32-bit reads/writes from/to odd-numbered registers to the upper 32-bits of the corresponding even register. In effect, simulating FR=0 mode when FR=0 mode is not available. Unfortunately, we have to make the decision to avoid mfc1/mtc1 before register allocation so we currently do this for even registers too. FPXX has a similar requirement on 32-bit architectures that lack mfhc1/mthc1 so this patch also handles the affected moves from the FPU for FPXX too. Moves to the FPU were supported by an earlier commit. Differential Revision: http://reviews.llvm.org/D4484 llvm-svn: 212938	2014-07-14 13:08:14 +00:00
Daniel Sanders	24e08fd5c0	[mips] Use MFHC1 when it is available (MIPS32r2 and later) for both FP32 and FP64 moves Summary: This is similar to r210771 which did the same thing for MTHC1. Also corrected MTHC1_D32 and MTHC1_D64 which used AFGR64 and FGR64 on the wrong definitions. Differential Revision: http://reviews.llvm.org/D4483 llvm-svn: 212936	2014-07-14 12:41:31 +00:00
Tim Northover	3cb24110b1	AArch64: remove unnecessary pseudo-instruction. Sufficiently twisted use of TableGen lets us write patterns directly for f16 (as an i16 promoted to i32) -> f32 conversion. llvm-svn: 212933	2014-07-14 11:16:02 +00:00
Daniel Sanders	9ee2aee859	[mips] Correct the AFL_FLAGS1_ODDSPREG flag in .MIPS.abiflags when no '.module oddspreg' is used Differential Revision: http://reviews.llvm.org/D4486 llvm-svn: 212932	2014-07-14 10:26:15 +00:00
Sasa Stankovic	b976fee83c	[mips] Expand BuildPairF64 to a spill and reload when the O32 FPXX ABI is enabled and mthc1 and dmtc1 are not available (e.g. on MIPS32r1) This prevents the upper 32-bits of a double precision value from being moved to the FPU with mtc1 to an odd-numbered FPU register. This is necessary to ensure that the code generated executes correctly regardless of the current FPU mode. MIPS32r2 and above continues to use mtc1/mthc1, while MIPS-IV and above continue to use dmtc1. Differential Revision: http://reviews.llvm.org/D4465 llvm-svn: 212930	2014-07-14 09:40:29 +00:00
NAKAMURA Takumi	c3b3897e8a	NVPTX/LLVMBuild.txt: Add "Scalar" to required_libraries. It is really referenced. llvm-svn: 212918	2014-07-14 02:52:19 +00:00
Saleem Abdulrasool	3f3cefd392	MC: make DWARF and Windows unwinding handling more similar Rename member variables and functions for the MCStreamer for DWARF-like unwinding management. Rename the Windows ones as well and make the naming and handling similar across the two. No functional change intended. llvm-svn: 212912	2014-07-13 19:03:36 +00:00
Matt Arsenault	c3f6a7e44e	Remove unused include llvm-svn: 212898	2014-07-13 03:08:59 +00:00
Matt Arsenault	d32dbb6a10	R600: Use range for and fix missing consts. llvm-svn: 212897	2014-07-13 03:06:43 +00:00
Matt Arsenault	762af96f46	R600: Make ShaderType private llvm-svn: 212896	2014-07-13 03:06:39 +00:00
Matt Arsenault	d9a23ab20d	R600: Add option to disable promote alloca This can make writing some tests harder, so add a flag to disable it. llvm-svn: 212893	2014-07-13 02:08:26 +00:00
Saleem Abdulrasool	f74d48a011	AArch64: add support for llvm.aarch64.hint intrinsic This adds a llvm.aarch64.hint intrinsic to mirror the llvm.arm.hint in order to support the various hint intrinsic functions in the ACLE. Add an optional pattern field that permits the subclass to specify the pattern that matches the selection. The intrinsic pattern is set as mayLoad, mayStore, so overload the value for the definition of the hint instruction. llvm-svn: 212883	2014-07-12 21:20:49 +00:00
Juergen Ributzka	d755e9f730	Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook." This reverts commit r212851, because it broke the memset lowering. llvm-svn: 212855	2014-07-11 23:10:08 +00:00
Juergen Ributzka	04b444913b	[FastISel][X86] Implement the FastLowerIntrinsicCall hook. Rename X86VisitIntrinsicCall -> FastLowerIntrinsicCall, which effectively implements the target hook. llvm-svn: 212851	2014-07-11 22:37:43 +00:00
Ulrich Weigand	ea147a9d43	[PowerPC] Fix invalid displacement created by LocalStackAlloc This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that could result in the LocalStackAlloc pass creating an MI instruction out-of-range displacement: %vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32) %G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31 (In final assembler output the top bits are stripped off, resulting in a negative offset loading from below the stack pointer.) Common code expects the isFrameOffsetLegal routine to verify whether adding a given offset to the offset already present in the instruction results in a valid displacement. However, on PowerPC the routine did not take the already present instruction offset into account. This commit fixes isFrameOffsetLegal to add the instruction offset, and updates a local caller (needsFrameBaseReg) to no longer add the instruction offset itself before calling isFrameOffsetLegal. Reviewed by Hal Finkel. llvm-svn: 212832	2014-07-11 17:19:31 +00:00
Marek Olsak	eac5062cc0	R600/SI: Use i32 vectors for resources and samplers This affects new intrinsics only. What surprises me is that v32i8 still works. llvm-svn: 212831	2014-07-11 17:11:52 +00:00
Marek Olsak	d8ecaeec02	R600/SI: add sample and image intrinsics exposing all instruction fields We need the intrinsics with offsets, so why not just add them all. The R128 parameter will also be useful for reducing SGPR usage. GL_ARB_image_load_store also adds some image GLSL modifiers like "coherent", so Mesa will probably translate those to slc, glc, etc. When LLVM 3.5 is released, I'll switch Mesa to these new intrinsics. llvm-svn: 212830	2014-07-11 17:11:46 +00:00
Marek Olsak	ba77c3e4ed	R600/SI: fix shadow mapping for 1D and 2D array textures It was conflicting with def TEX_SHADOW_ARRAY, which also handles them. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 212829	2014-07-11 17:11:39 +00:00
Oliver Stannard	6eda6ffc0c	ARM: Allow __fp16 as a function arg or return type for AArch64 ACLE 2.0 allows __fp16 to be used as a function argument or return type. This enables this for AArch64. llvm-svn: 212812	2014-07-11 13:33:46 +00:00
Quentin Colombet	0f179c4d8a	[X86] Fix the inversion of low and high bits for the lowering of MUL_LOHI. Also add a few comments. <rdar://problem/17581756> llvm-svn: 212808	2014-07-11 12:08:23 +00:00
Adam Nemet	26f817497c	[X86] AVX512: Improve readability of isCDisp8 No functional change. As I was trying to understand this function, I found that variables were reused with confusing names and the broadcast case was a bit too implicit. Hopefully, this is an improvement. llvm-svn: 212795	2014-07-11 05:23:25 +00:00
Adam Nemet	e311c3c836	[X86] AVX512: Simplify logic in isCDisp8 It was computing the VL/n case as: MemObjSize = VectorByteSize / ElemByteSize / Divider * ElemByteSize ElemByteSize not only falls out but VectorByteSize/Divider now actually matches the definition of VL/n. Also some formatting fixes. llvm-svn: 212794	2014-07-11 05:23:12 +00:00
Jan Vesely	2cb62ce2a0	R600: Implement float to long/ulong Use alg. from LegalizeDAG.cpp Move Expand setting to SIISellowering v2: Extend existing tests instead of creating new ones v3: use separate LowerFPTOSINT function v4: use TargetLowering::expandFP_TO_SINT add comment about using FP_TO_SINT for uints Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 212773	2014-07-10 22:40:21 +00:00
Brad Smith	733cb6437d	Use the integrated assembler by default on OpenBSD. llvm-svn: 212771	2014-07-10 22:37:28 +00:00
Zoran Jovanovic	f34b454219	[mips] Emit two CFI offset directives per double precision SDC1/LDC1 instead of just one for FR=1 registers Differential Revision: http://reviews.llvm.org/D4310 llvm-svn: 212769	2014-07-10 22:23:30 +00:00
Akira Hatanaka	7cc27649a6	[X86] Mark pseudo instruction TEST8ri_NOEREX as hasSIdeEffects=0. Also, add a case clause in X86InstrInfo::shouldScheduleAdjacent to enable macro-fusion. <rdar://problem/15680770> llvm-svn: 212747	2014-07-10 18:00:53 +00:00
Eric Christopher	22405e4bbf	Make it possible for the Subtarget to change between function passes in the mips back end. This, unfortunately, required a bit of churn in the various predicates to use a pointer rather than a reference. llvm-svn: 212744	2014-07-10 17:26:51 +00:00
David Majnemer	99ef236542	Mips: Silence a -Wcovered-switch-default Remove a default label which covered no enumerators, replace it with a llvm_unreachable. No functionality changed. llvm-svn: 212729	2014-07-10 16:04:04 +00:00
Zoran Jovanovic	255d00dc23	[mips] Added FPXX modeless calling convention. Differential Revision: http://reviews.llvm.org/D4293 llvm-svn: 212726	2014-07-10 15:36:12 +00:00
Arnaud A. de Grandmaison	f643231163	[AArch64] Add logical alias instructions to MC AsmParser This patch teaches the AsmParser to accept some logical+immediate instructions and convert them as shown: bic Rd, Rn, #imm -> and Rd, Rn, #~imm bics Rd, Rn, #imm -> ands Rd, Rn, #~imm orn Rd, Rn, #imm -> orr Rd, Rn, #~imm eon Rd, Rn, #imm -> eor Rd, Rn, #~imm Those instructions are an alternate syntax available to assembly coders, and are needed in order to support code already compiling with some other assemblers. For example, the bic construct is used by the linux kernel. llvm-svn: 212722	2014-07-10 15:12:26 +00:00
Tim Northover	fee2adefba	AArch64: correctly fast-isel i8 & i16 multiplies We were asking for a register for type i8 or i16 which caused an assert. rdar://problem/17620015 llvm-svn: 212718	2014-07-10 14:18:46 +00:00
Daniel Sanders	7e527423f5	[mips] Add support for -modd-spreg/-mno-odd-spreg Summary: When -mno-odd-spreg is in effect, 32-bit floating point values are not permitted in odd FPU registers. The option also prohibits 32-bit and 64-bit floating point comparison results from being written to odd registers. This option has three purposes: * It allows support for certain MIPS implementations such as loongson-3a that do not allow the use of odd registers for single precision arithmetic. * When using -mfpxx, -mno-odd-spreg is the default and this allows us to statically check that code is compliant with the O32 FPXX ABI since mtc1/mfc1 instructions to/from odd registers are guaranteed not to appear for any reason. Once this has been established, the user can then re-enable -modd-spreg to regain the use of all 32 single-precision registers. * When using -mfp64 and -mno-odd-spreg together, an O32 extension named O32 FP64A is used as the ABI. This is intended to provide almost all functionality of an FR=1 processor but can also be executed on a FR=0 core with the assistance of a hardware compatibility mode which emulates FR=0 behaviour on an FR=1 processor. * Added '.module oddspreg' and '.module nooddspreg' each of which update the .MIPS.abiflags section appropriately * Moved setFpABI() call inside emitDirectiveModuleFP() so that the caller doesn't have to remember to do it. * MipsABIFlags now calculates the flags1 and flags2 member on demand rather than trying to maintain them in the same format they will be emitted in. There is one portion of the -mfp64 and -mno-odd-spreg combination that is not implemented yet. Moves to/from odd-numbered double-precision registers must not use mtc1. I will fix this in a follow-up. Differential Revision: http://reviews.llvm.org/D4383 llvm-svn: 212717	2014-07-10 13:38:23 +00:00
Zinovy Nis	cad431c122	[x32] Add AsmBackend for X32 which uses ELF32 with x86_64 (the author is Pavel Chupin). This is minimal change for backend required to have "hello world" compiled and working on x32 target (x86_64-linux-gnux32). More patches for x32 will follow. Differential Revision: http://reviews.llvm.org/D4181 llvm-svn: 212716	2014-07-10 13:03:26 +00:00
Richard Sandiford	02bb0ec368	[SystemZ] Use SystemZCallingConv.td to define callee-saved registers Just a clean-up. No behavioral change intended. llvm-svn: 212711	2014-07-10 11:44:37 +00:00
Richard Sandiford	909aa3ad21	[SystemZ] Tweak instruction format classifications There's no real need to have Shift as a separate format type from Binary. The comments for other format types were too specific and in some cases no longer accurate. Just a clean-up, no behavioral change intended. llvm-svn: 212707	2014-07-10 11:29:23 +00:00
Chandler Carruth	df8d0caab7	[x86] Add another combine that is particularly useful for the new vector shuffle lowering: match shuffle patterns equivalent to an unpcklwd or unpckhwd instruction. This allows us to use generic lowering code for v8i16 shuffles and match the unpack pattern late. llvm-svn: 212705	2014-07-10 11:09:29 +00:00
Richard Sandiford	e66e8c8b66	[SystemZ] Add MC support for LEDBRA, LEXBRA and LDXBRA These instructions aren't used for codegen since the original L*DB instructions are suitable for fround. llvm-svn: 212703	2014-07-10 11:00:55 +00:00
Richard Sandiford	ca44614ac0	[SystemZ] Avoid using i8 constants for immediate fields Immediate fields that have no natural MVT type tended to use i8 if the field was small enough. This was a bit confusing since i8 isn't a legal type for the target. Fields for short immediates in a 32-bit or 64-bit operation use i32 or i64 instead, so it would be better to do the same for all fields. No behavioral change intended. llvm-svn: 212702	2014-07-10 10:52:51 +00:00
Richard Sandiford	ac1dba0fdf	[SystemZ] Fix FPR dwarf numbering The dwarf FPR numbers are supposed to have the order F0, F2, F4, F6, F1, F3, F5, F7, F8, etc., which matches the pairing of registers for long doubles. E.g. a long double stored in F0 is paired with F2. llvm-svn: 212701	2014-07-10 10:45:11 +00:00
Daniel Sanders	cbd44c591d	Make it possible for ints/floats to return different values from getBooleanContents() Summary: On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer comparisons return 0 or 1. Updated the various uses of getBooleanContents. Two simplifications had to be disabled when float and int boolean contents differ: - ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially discoverable (i.e. when the condition of the VSELECT is a SETCC node). - visitVSELECT (select C, 0, 1) -> (xor C, 1). Come to think of it, this one could test for the common case of 'C' being a SETCC too. Preserved existing behaviour for all other targets and updated the affected MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low' variable was counting in the wrong direction because it thought it could simply add the result of the comparison. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4389 llvm-svn: 212697	2014-07-10 10:18:12 +00:00
Chandler Carruth	853fa0ac8d	[x86] Expand the target DAG combining for PSHUFD nodes to be able to combine into half-shuffles through unpack instructions that expand the half to a whole vector without messing with the dword lanes. This fixes some redundant instructions in splat-like lowerings for v16i8, which are now getting to be really nice. llvm-svn: 212695	2014-07-10 09:57:36 +00:00
Chandler Carruth	a34a8e230d	[x86] Tweak the v16i8 single input special case lowering for shuffles that splat i8s into i16s. Previously, we would try much too hard to arrange a sequence of i8s in one half of the input such that we could unpack them into i16s and shuffle those into place. This isn't always going to be a cheaper i8 shuffle than our other strategies. The case where it is always going to be cheaper is when we can arrange all the necessary inputs into one half using just i16 shuffles. It happens that viewing the problem this way also makes it much easier to produce an efficient set of shuffles to move the inputs into one half and then unpack them. With this, our splat code gets one step closer to being not terrible with the new experimental lowering strategy. It also exposes two combines missing which I will add next. llvm-svn: 212692	2014-07-10 09:16:40 +00:00
Chandler Carruth	7d2ffb5492	[x86] Initial improvements to the new shuffle lowering for v16i8 shuffles specifically for cases where a small subset of the elements in the input vector are actually used. This is specifically targetted at improving the shuffles generated for trunc operations, but also helps out splat-like operations. There is still some really low-hanging fruit here that I want to address but this is a huge step in the right direction. llvm-svn: 212680	2014-07-10 04:34:06 +00:00
Matt Arsenault	b0df92577d	R600/SI: Add support for llvm.convert.{to\|from}.fp16 llvm-svn: 212676	2014-07-10 03:22:20 +00:00
Chandler Carruth	b3840a55ae	[x86] Refactor some of the new code for lowering v16i8 shuffles to remove duplication and make it easier to select different strategies. No functionality changed. llvm-svn: 212674	2014-07-10 02:24:26 +00:00
Chandler Carruth	d3561f6fec	[SDAG] Make the new zext-vector-inreg node default to expand so targets don't need to set it manually. This is based on feedback from Tom who pointed out that if every target needs to handle this we need to reach out to those maintainers. In fact, it doesn't make sense to duplicate everything when anything other than expand seems unlikely at this stage. llvm-svn: 212661	2014-07-09 22:53:04 +00:00
Jim Grosbach	34cc92b475	AArch64: Better codegen for storing to __fp16. Storing will generally be immediately preceded by rounding from an f32 or f64, so make sure to match those patterns directly to convert into the FPR16 register class directly rather than going through the integer GPRs. This also eliminates an extra step in the convert-from-f64 path which was first converting to f32 and then to f16 from there. rdar://17594379 llvm-svn: 212638	2014-07-09 18:55:52 +00:00
Benjamin Kramer	c560a6cadc	TargetRegisterInfo: Remove function that fell out of use years ago. llvm-svn: 212636	2014-07-09 18:53:57 +00:00
Adam Nemet	2820a5b9e9	[X86] AVX512: Enable it in the Loop Vectorizer This lets us experiment with 512-bit vectorization without passing force-vector-width manually. The code generated for a simple integer memset loop is properly vectorized. Disassembly is still broken for it though :(. llvm-svn: 212634	2014-07-09 18:22:33 +00:00
Louis Gerbarg	1ce0c37bf0	Make AArch64FastISel::EmitIntExt explicitly check its source and destination types This is a follow up to r212492. There should be no functional difference, but this patch makes it clear that SrcVT must be an i1/i8/16/i32 and DestVT must be an i8/i16/i32/i64. rdar://17516686 llvm-svn: 212633	2014-07-09 17:54:32 +00:00
Benjamin Kramer	d6f1733add	X86: When lowering v8i32 himuls use the correct shuffle masks for AVX2. Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out as we have to blend two vectors. While there remove unecessary cross-lane moves from the shuffles so the backend can lower it to palignr instead of vperm. Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2. llvm-svn: 212611	2014-07-09 11:12:39 +00:00
Chandler Carruth	afe4b2507e	[x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 llvm-svn: 212610	2014-07-09 10:58:18 +00:00
Daniel Sanders	e31155fd1a	[mips][mips64r6] Correct select patterns that have the condition or true/false values backwards Summary: This bug caused SingleSource/Regression/C/uint64_to_float and SingleSource/UnitTests/2002-05-02-CastTest3 to fail (among others). Differential Revision: http://reviews.llvm.org/D4388 llvm-svn: 212608	2014-07-09 10:47:26 +00:00
Daniel Sanders	dc06718e0b	[mips][mips64r6] Correct cond names in the cmp.cond.[ds] instructions Summary: It seems we accidentally read the wrong column of the table MIPS64r6 spec and used the names for c.cond.fmt instead of cmp.cond.fmt. Differential Revision: http://reviews.llvm.org/D4387 llvm-svn: 212607	2014-07-09 10:40:20 +00:00
Chandler Carruth	ef5dcf571e	[x86] Initialize a pointer to null to fix a bug in r212602. This should restore GCC hosts (which happen to put the bad stuff into the pointer) and MSan, etc. llvm-svn: 212606	2014-07-09 10:36:42 +00:00
Daniel Sanders	f5a5fbd3f4	[mips][mips64r6] Use JALR for indirect branches instead of JR (which is not available on MIPS32r6/MIPS64r6) Summary: This completes the change to use JALR instead of JR on MIPS32r6/MIPS64r6. Reviewers: jkolek, vmedic, zoran.jovanovic, dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4269 llvm-svn: 212605	2014-07-09 10:21:59 +00:00
Daniel Sanders	338513b3fa	[mips][mips64r6] Use JALR for returns instead of JR (which is not available on MIPS32r6/MIPS64r6) Summary: RET, and RET_MM have been replaced by a pseudo named PseudoReturn. In addition a version with a 64-bit GPR named PseudoReturn64 has been added. Instruction selection for a return matches RetRA, which is expanded post register allocation to PseudoReturn/PseudoReturn64. During MipsAsmPrinter, this PseudoReturn/PseudoReturn64 are emitted as: - (JALR64 $zero, $rs) on MIPS64r6 - (JALR $zero, $rs) on MIPS32r6 - (JR_MM $rs) on microMIPS - (JR $rs) otherwise On MIPS32r6/MIPS64r6, 'jr $rs' is an alias for 'jalr $zero, $rs'. To aid development and review (specifically, to ensure all cases of jr are updated), these aliases are temporarily named 'r6.jr' instead of 'jr'. A follow up patch will change them back to the correct mnemonic. Added (JALR $zero, $rs) to MipsNaClELFStreamer's definition of an indirect jump, and removed it from its definition of a call. Note: I haven't accounted for MIPS64 in MipsNaClELFStreamer since it's doesn't appear to account for any MIPS64-specifics. The return instruction created as part of eh_return expansion is now expanded using expandRetRA() so we use the right return instruction on MIPS32r6/MIPS64r6 ('jalr $zero, $rs'). Also, fixed a misuse of isABI_N64() to detect 64-bit wide registers in expandEhReturn(). Reviewers: jkolek, vmedic, mseaborn, zoran.jovanovic, dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4268 llvm-svn: 212604	2014-07-09 10:16:07 +00:00
Chandler Carruth	2ebc942683	[x86] Re-apply a variant of the x86 side of r212324 now that the rest has settled without incident, removing the x86-specific and overly strict 'isVectorSplat' routine in favor of generic and more powerful splat detection. The primary motivation and result of this is that the x86 backend can now see through splats which contain undef elements. This is essential if we are using a widening form of legalization and I've updated a test case to also run in that mode as before this change the generated code for the test case was completely scalarized. This version of the patch much more carefully handles the undef lanes. - We aren't overly conservative about them in the shift lowering (where we will never use the splat itself). - One place where the splat would have been re-used by the existing code now explicitly constructs a new constant splat that will be safe. - The broadcast lowering is much more reasonable with undefs by doing a correct check of whether the splat is the only user of a loaded value, checking that the splat actually crosses multiple lanes before using a broadcast, and handling broadcasts of non-constant splats. As a consequence of the last bullet, the weird usage of vpshufd instead of vbroadcast is gone, and we actually can lower an AVX splat with vbroadcastss where before we emitted a really strange pattern of a vector load and a manual splat across the vector. llvm-svn: 212602	2014-07-09 10:06:58 +00:00
NAKAMURA Takumi	843c4cb401	MipsTargetStreamer.h: Avoid "using" to appease msc17. llvm-svn: 212577	2014-07-08 23:48:22 +00:00
Jim Grosbach	04691a530d	AArch64: Better codegen for loading from __fp16. Loading will generally extend to an f32 or an 64, so make sure to match those patterns directly to load into the FPR16 register class directly rather than going through the integer GPRs. This also eliminates an extra step in the convert-to-f64 path which was first converting to f32 and then to f64 from there. rdar://17594379 llvm-svn: 212573	2014-07-08 23:28:48 +00:00
Ulrich Weigand	862d8b8d06	[PowerPC] Implement atomic NAND operations as actual NAND This changes the implementation of atomic NAND operations from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)" (compatible with GCC >= 4.4). This is in line with the common-code and ARM back-end change implemented in r212433. llvm-svn: 212547	2014-07-08 16:16:02 +00:00
Daniel Sanders	324ad956e0	[mips] Fixed struct/class mismatch introduced in r212522. Clang emits a warning about this. llvm-svn: 212528	2014-07-08 13:13:42 +00:00
Daniel Sanders	7201a3e3bb	Fix r212522 - [mips] Improve encapsulation of the .MIPS.abiflags implementation and limit scope of related enums Added two lines that should have been in r212522. llvm-svn: 212523	2014-07-08 10:35:52 +00:00
Daniel Sanders	c7dbc630e5	[mips] Improve encapsulation of the .MIPS.abiflags implementation and limit scope of related enums Summary: Follow on to r212519 to improve the encapsulation and limit the scope of the enums. Also merged two very similar parser functions, fixed a bug where ASE's were not being reported, and marked CPR1's as being 128-bit when MSA is enabled. Differential Revision: http://reviews.llvm.org/D4384 llvm-svn: 212522	2014-07-08 10:11:38 +00:00
Renato Golin	b8a86c43c0	Revert "Refactor ARM subarchitecture parsing" This reverts commit 7b4a6882467e7fef4516a0cbc418cbfce0fc6f6d. llvm-svn: 212521	2014-07-08 10:06:16 +00:00
Arnaud A. de Grandmaison	d7827606de	Truncate the immediate in logical operation to the register width And continue to produce an error if the 32 most significant bits are not all ones or zeros. llvm-svn: 212520	2014-07-08 09:53:04 +00:00
Vladimir Medic	fb8a2a95cd	Mips.abiflags is a new implicitly generated section that will be present on all new modules. The section contains a versioned data structure which represents essentially information to allow a program loader to determine the requirements of the application. This patch implements mips.abiflags section and provides test cases for it. llvm-svn: 212519	2014-07-08 08:59:22 +00:00
Chandler Carruth	142e966261	[x86,SDAG] Sink the logic for folding shuffles of splats more aggressively from the x86 shuffle lowering to the generic SDAG vector shuffle formation code. This code already tried to fold away shuffles of splats! It just had lots of bugs and couldn't handle the case my new x86 shuffle lowering needed. First, it failed to correctly compute whether N2 was undef because it pre-computed this, then did transformations which could make N2 undef, then failed to ever re-consider the precomputed state. Second, it didn't look through bitcasts at all, even in the safe cases where they are just element-type bitcasts with no change to the number of elements. Third, it didn't handle all-zero bit casts nicely the way my code in the x86 side of things did, which is essential to getting good zext-shuffle lowerings. But all of these are generic. I just ported the code down to this layer and fixed the surrounding bugs. Tests exercising this in the x86 backend still pass and some silly code in widen_cast-6.ll gets better. I updated that test to be a bit more precise but it's still pretty unclear what the value of the test is in this day and age. llvm-svn: 212517	2014-07-08 08:45:38 +00:00
Adam Nemet	79580db918	[X86] AVX512: Only allow k1-k7 as predicates to vpcmp* As destination k0 is allowed but not as predicate/writemask. I also modified the test to allow checking of error messages by the assembler. I applied a similar approach to the test ret.s in the same directory. llvm-svn: 212504	2014-07-08 00:22:32 +00:00
Andrea Di Biagio	2620b877b6	[x86] Fix assertion failure caused by a wrong combine of PSHUFD nodes with different types. When combining a sequence of two PSHUFD dag nodes into a single PSHUFD, make sure that we assign the correct type to the resulting PSHUFD. X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32. Before this change, an assertion failure was triggered in method 'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test below into a single PSHUFD. define <4 x float> @test1(<4 x float> %V) { %1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1> %2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1> ret <4 x float> %2 } llvm-svn: 212498	2014-07-07 23:25:23 +00:00
Juergen Ributzka	665ea71fcd	[FastISel][X86] Fix smul.with.overflow.i8 lowering. Add custom lowering code for signed multiply instruction selection, because the default FastISel instruction selection for ISD::MUL will use unsigned multiply for the i8 type and signed multiply for all other types. This would set the incorrect flags for the overflow check. This fixes <rdar://problem/17549300> llvm-svn: 212493	2014-07-07 21:52:21 +00:00
Louis Gerbarg	4c5b4054b2	Allow AArch64FastISel to degrade graceully in the presence of an MVT::i128 Currently AArch64FastISel crashes if it tries to extend an integer into an MVT::i128. This can happen by creating 128 bit integers like so: typedef unsigned int uint128_t __attribute__((mode(TI))); typedef int sint128_t __attribute__((mode(TI))); This patch makes EmitIntExt check for their presence and then falls back to SelectionDAG. Tests included. rdar://17516686 llvm-svn: 212492	2014-07-07 21:37:51 +00:00
Renato Golin	1e9c282cd1	Refactor ARM subarchitecture parsing According to a FIXME in ARMMCTargetDesc.cpp the ARM version parsing should be in the Triple helper class. Patch by: Gabor Ballabas llvm-svn: 212479	2014-07-07 20:01:11 +00:00
Ulrich Weigand	de8641bfde	[PowerPC] Fix no-assert build r212476 caused a compile failure (unused variable) in a non-assertion build ... llvm-svn: 212477	2014-07-07 19:39:44 +00:00
Ulrich Weigand	ec2bf93895	[PowerPC] Fix "byval align" arguments Arguments passed as "byval align" should get the specified alignment in the parameter save area. There was some code in PPCISelLowering.cpp that attempted to implement this, but this didn't work correctly: while code did update the ArgOffset value, it neglected to update the PtrOff value (which was already computed from the old ArgOffset), and it also neglected to update GPR_idx -- fields skipped due to alignment in the save area must likewise be skipped in GPRs. This patch fixes and simplifies this logic by: - handling argument offset alignment right at the beginning of argument processing, using a new helper routine CalculateStackSlotAlignment (this avoids having to update PtrOff and other derived values later on) - not tracking GPR_idx separately, but always computing the correct GPR_idx for each argument from its ArgOffset - removing some redundant computation in LowerFormalArguments: MinReservedArea must equal ArgOffset after argument processing, so there's no use in computing it twice. [This doesn't change the behavior of the current clang front-end, since that never creates "byval align" arguments at the moment. This will change with a follow-on patch, however.] llvm-svn: 212476	2014-07-07 19:26:41 +00:00
Chandler Carruth	beeacac0b3	[x86] Revert r212324 which was too aggressive w.r.t. allowing undef lanes in vector splats. The core problem here is that undef lanes can't unilaterally be considered to contribute to splats. Their handling needs to be more cautious. There is also a reported failure of the nightly testers (thanks Tobias!) that may well stem from the same core issue. I'm going to fix this theoretical issue, factor the APIs a bit better, and then verify that I don't see anything bad with Tobias's reduction from the test suite before recommitting. Original commit message for r212324: [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is dramatically improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212475	2014-07-07 19:03:32 +00:00
Matt Arsenault	d2c9e08b63	R600: Fix mishandling of load / store chains. Fixes various bugs with reordering loads and stores. Scalarized vector loads weren't collecting the chains at all. llvm-svn: 212473	2014-07-07 18:34:45 +00:00
Matt Arsenault	fda9dad17f	Fix typo, weird indentation llvm-svn: 212472	2014-07-07 18:34:42 +00:00
Tim Northover	3705283b24	X86: revert unintentional change to X86FastISel. This crept in with r212443. llvm-svn: 212459	2014-07-07 14:06:42 +00:00
Evgeniy Stepanov	6fa6c677cc	[asan] Generate asm instrumentation in MC. Generate entire ASan asm instrumentation in MC without relying on runtime helper functions. Patch by Yuri Gorshenin. llvm-svn: 212455	2014-07-07 13:57:37 +00:00
Chandler Carruth	0dcb366268	[x86] Teach the new vector shuffle lowering code to handle what is essentially a DAG combine that never gets a chance to run. We might typically expect DAG combining to remove shuffles-of-splats and other similar patterns, but we don't get a chance to run the DAG combiner when we recursively form sub-shuffles during the lowering of a shuffle. So instead hand-roll a really important combine directly into the lowering code to detect shuffles-of-splats, especially shuffles of an all-zero splat which needn't even have the same element width, etc. This lets the new vector shuffle lowering handle shuffles which implement things like zero-extension really nicely. This will become even more important when I wire the legalization of zero-extension to vector shuffles with the new widening legalization strategy. llvm-svn: 212444	2014-07-07 09:06:58 +00:00
Tim Northover	55beb64bd0	CodeGen: it turns out that NAND is not the same thing as BIC. At all. We've been performing the wrong operation on ARM for "atomicrmw nand" for years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b". This bled over into the generic expansion pass. So I assume no-one has ever actually tried to do an atomic nand in the real world. Oh well. llvm-svn: 212443	2014-07-07 09:06:35 +00:00
Saleem Abdulrasool	763f9a50a5	ARM: properly lower dllimport'ed global values This completes the handling for DLL import storage symbols when lowering instructions. A DLL import storage symbol must have an additional load performed prior to use. This is applicable to variables and functions. This is particularly important for non-function symbols as it is possible to handle function references by emitting a thunk which performs the translation from the unprefixed __imp_ symbol to the proper symbol (although, this is a non-optimal lowering). For a variable symbol, no such thunk can be accommodated. llvm-svn: 212431	2014-07-07 05:18:35 +00:00
Saleem Abdulrasool	220a044888	ARM: correctly mangle dllimport symbols Add support for tracking DLLImport storage class information on a per symbol basis in the ARM instruction selection. Use that information to correctly mangle the symbol (dllimport symbols are referenced via *__imp_<name>). llvm-svn: 212430	2014-07-07 05:18:30 +00:00
Saleem Abdulrasool	1eb4a28b44	ARM: unify symbol name retrieval Ensure that all paths that retrieve the symbol name go through GetARMGVSymbol rather than getSymbol. This is desirable so that any global symbol mangling can be centralised to this function. The motivation for this is handling of symbols that are marked as having dll import dll storage. Such a symbol requires an extra load that is currently handled in the backend and a __imp_ prefix on the symbol name. llvm-svn: 212429	2014-07-07 05:18:22 +00:00
Kevin Qin	4473c1943f	[AArch64] Normalize all constants to build a vector. The value of constant operands will be truncated to fit element width. llvm-svn: 212428	2014-07-07 02:45:40 +00:00
Saleem Abdulrasool	97255a017b	AArch64: whitespace cleanup llvm-svn: 212420	2014-07-06 22:13:26 +00:00
Matt Arsenault	4261973548	Use cast<> instead of dyn_cast + assert llvm-svn: 212380	2014-07-05 21:16:43 +00:00
Matt Arsenault	258c6e7cd9	Fix grammar llvm-svn: 212379	2014-07-05 21:16:40 +00:00
Ehsan Akhgari	4103da6bfb	Add support for parsing the not operator in Microsoft inline assembly This fixes http://llvm.org/PR20202 llvm-svn: 212352	2014-07-04 19:13:05 +00:00
Daniel Sanders	950f48d3c7	[mips][mips64r6] Set ELF e_flags for MIPS32r6/MIPS64r6. Also do MIPS-I to MIPS-V Differential Revision: http://reviews.llvm.org/D4386 llvm-svn: 212346	2014-07-04 15:21:53 +00:00
Tim Northover	1bc367a41b	ARM: when falling back to scattered relocs, keep the type. The linker relies on relocation type info (e.g. is it a branch?) to perform the correct actions, so we should keep that even when we end up using a scattered relocation for whatever reason. rdar://problem/17553104 llvm-svn: 212333	2014-07-04 10:58:05 +00:00
Daniel Sanders	2e03d66453	[mips][mips64r6] Correct the encoding of dmuh, dmuhu, dmul, and dmulu. We have detected a documentation bug in the encoding tables of the released MIPS64r6 specification that has resulted in the wrong encodings being used for these instructions in LLVM. This commit corrects them. llvm-svn: 212330	2014-07-04 10:08:27 +00:00
Chandler Carruth	5d79bb5d32	[x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is dramatically improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212324	2014-07-04 08:11:49 +00:00
Alexey Volkov	302309f39f	[X86] Limit maximum nop length on Silvermont Silvermont can only decode one instruction per cycle if the instruction exceeds 8 bytes. Also in Silvermont instructions with more than 3 prefixes will cause 3 cycle penalty. Maximum nop length is limited to 7 bytes when used for padding on Silvermont. For other x86 processors max nop length remains unchanged 15 bytes. Differential Revision: http://reviews.llvm.org/D4374 llvm-svn: 212321	2014-07-04 07:14:56 +00:00
Robert Lytton	37d3fa7e36	XCore target: remove incorrect DebugLoc entries from prologue Summary: This was causing the prologue_end to be incorrectly positioned. Differential Revision: http://reviews.llvm.org/D4122 llvm-svn: 212318	2014-07-04 06:38:22 +00:00
Eric Christopher	c1058df66f	Move function dependent resetting of a subtarget variable out of the subtarget. This involved having the movt predicate take the current function - since we care about size in instruction selection for whether or not to use movw/movt take the function so we can check the attributes. This required adding the current MachineFunction to FastISel and propagating through. llvm-svn: 212309	2014-07-04 01:55:26 +00:00
Chandler Carruth	19cff8205e	[x86] Clarify that this lowering only applies to vectors and is only used when we have SSE2. llvm-svn: 212300	2014-07-03 22:57:44 +00:00
Eric Christopher	2f991c9ee1	Remove caching of the target machine and initialization of the subtarget from ARMISelDAGtoDAG. The former is unnecessary and the latter is initialized on each runOnMachineFunction. llvm-svn: 212297	2014-07-03 22:24:49 +00:00
Andrea Di Biagio	c8e8bda58f	[CostModel][x86] Improved cost model for alternate shuffles. This patch: 1) Improves the cost model for x86 alternate shuffles (originally added at revision 211339); 2) Teaches the Cost Model Analysis pass how to analyze alternate shuffles. Alternate shuffles are a special kind of blend; on x86, we can often easily lowered alternate shuffled into single blend instruction (depending on the subtarget features). The existing cost model didn't take into account subtarget features. Also, it had a couple of "dead" entries for vector types that are never legal (example: on x86 types v2i32 and v2f32 are not legal; those are always either promoted or widened to 128-bit vector types). The new x86 cost model takes into account what target features we have before returning the shuffle cost (i.e. the number of instructions after the blend is lowered/expanded). This patch also teaches the Cost Model Analysis how to identify and analyze alternate shuffles (i.e. 'SK_Alternate' shufflevector instructions): - added function 'isAlternateVectorMask'; - added some logic to check if an instruction is a alternate shuffle and, in case, call the target specific TTI to get the corresponding shuffle cost; - added a test to verify the cost model analysis on alternate shuffles. llvm-svn: 212296	2014-07-03 22:24:18 +00:00
Andrea Di Biagio	a37a2fc81f	[X86] Add ISel patterns to select 'f32_to_f16' and 'f16_to_f32' dag nodes. This patch adds tablegen patterns to select F16C float-to-half-float conversion instructions from 'f32_to_f16' and 'f16_to_f32' dag nodes. If the target doesn't have F16C, then 'f32_to_f16' and 'f16_to_f32' are expanded into library calls. llvm-svn: 212293	2014-07-03 21:51:06 +00:00
Yi Kong	93e52da641	[ARM] Implement ISB memory barrier intrinsic Adds support for __builtin_arm_isb. Also corrects DSB and ISB instructions modelling by adding has-side-effects property. llvm-svn: 212276	2014-07-03 16:00:41 +00:00
Chandler Carruth	739b6ada99	[x86] Fix crashes in lowering bitcast instructions with the widening mode. This also runs the test in that mode which would reproduce the crash. What I love is that every single FIXME in the test is addressed by switching to widening. llvm-svn: 212254	2014-07-03 03:43:47 +00:00
Chandler Carruth	49a8b10d82	[x86] Based on a long conversation between myself, Jim Grosbach, Hal Finkel, Eric Christopher, and a bunch of other people I'm probably forgetting (sorry), add an option to the x86 backend to widen vectors during type legalization rather than promote them. This still would promote vNi1 vectors to get the masks right, but would widen other vectors. A lot of experiments are piling up right now showing that widening should probably be the default legalization strategy outside of vNi1 cases, but it is very hard to test the rammifications of that and fix bugs in widening-based legalization without an option that enables it. I'll be checking in tests shortly that use this option to exercise cases where widening doesn't work well and hopefully we'll be able to switch fully to this soon. llvm-svn: 212249	2014-07-03 02:11:29 +00:00
Eric Christopher	f204208e4f	Make these preprocessor directives match all of the others in the port. llvm-svn: 212245	2014-07-03 00:44:31 +00:00
Eric Christopher	ad4de684ea	Remove dead code. llvm-svn: 212244	2014-07-03 00:44:28 +00:00
Chandler Carruth	9d010fffe1	[codegen,aarch64] Add a target hook to the code generator to control vector type legalization strategies in a more fine grained manner, and change the legalization of several v1iN types and v1f32 to be widening rather than scalarization on AArch64. This fixes an assertion failure caused by scalarizing nodes like "v1i32 trunc v1i64". As v1i64 is legal it will fail to scalarize v1i32. This also provides a foundation for other targets to have more granular control over how vector types are legalized. Patch by Hao Liu, reviewed by Tim Northover. I'm committing it to allow some work to start taking place on top of this patch as it adds some really important hooks to the backend that I'd like to immediately start using. =] http://reviews.llvm.org/D4322 llvm-svn: 212242	2014-07-03 00:23:43 +00:00
Eric Christopher	daa9dbbbd5	Move subtarget dependent features into the subtarget from the target machine. Includes a fix for a subtarget initialization for hard floating point on mips16. llvm-svn: 212240	2014-07-03 00:10:24 +00:00
Eric Christopher	4cdb3f9b6a	So that we can include frame lowering in the subtarget, remove include circular dependency with the subtarget by inlining accessor methods and outlining a routine. llvm-svn: 212236	2014-07-02 23:29:55 +00:00
Eric Christopher	bf33a3cf70	So that we can include target lowering in the subtarget, remove include circular dependency with the subtarget by inlining accessor methods and outlining a routine. llvm-svn: 212234	2014-07-02 23:18:40 +00:00
Eric Christopher	0eaa541ea5	Fix typos. llvm-svn: 212228	2014-07-02 22:05:40 +00:00
Eric Christopher	5f9fd210b3	Move the data layout and selection dag info from the mips target machine down to the subtarget. llvm-svn: 212224	2014-07-02 21:29:23 +00:00
Adam Nemet	11dd5cf9f1	[X86] AVX512: Allow writemask argument in vpermt* intrinsics llvm-svn: 212223	2014-07-02 21:26:01 +00:00
Adam Nemet	efe9c98a16	[X86] AVX512: Generate Pat<>'s for the vpermt2* intrinsics via multiclass This new multiclass, avx512_perm_table_3src derives from the current one and provides the Pat<>. The next patch will add another Pat<> that uses the writemask. Note that I dropped the type annotation from the intrinsic call, i.e.: (v16f32 VR512:$src1) -> R512:$src1. I think that this should be fine (at least many intrinsic calls don't provide them) and it greatly reduces the number of template arguments. llvm-svn: 212222	2014-07-02 21:25:58 +00:00
Adam Nemet	2415a497b5	[X86] AVX512: Add writemask variants for vperm2 This includes assembler and codegen support (see the new tests in avx512-encodings.s and avx512-shuffle.ll). <rdar://problem/17492620> llvm-svn: 212221	2014-07-02 21:25:54 +00:00
Tom Stellard	e9219e0026	R600: Add a comment that llvm.AMDGPU.trunc is a legacy intrinsic llvm-svn: 212218	2014-07-02 20:53:57 +00:00
Tom Stellard	7c1838d797	R600/SI: Use a ComplexPattern for ADDR64 addressing of MUBUF loads llvm-svn: 212217	2014-07-02 20:53:56 +00:00

1 2 3 4 5 ...

29246 Commits