llvm-project

Commit Graph

Author	SHA1	Message	Date
Adam Nemet	11dd5cf9f1	[X86] AVX512: Allow writemask argument in vpermt* intrinsics llvm-svn: 212223	2014-07-02 21:26:01 +00:00
Adam Nemet	efe9c98a16	[X86] AVX512: Generate Pat<>'s for the vpermt2* intrinsics via multiclass This new multiclass, avx512_perm_table_3src derives from the current one and provides the Pat<>. The next patch will add another Pat<> that uses the writemask. Note that I dropped the type annotation from the intrinsic call, i.e.: (v16f32 VR512:$src1) -> R512:$src1. I think that this should be fine (at least many intrinsic calls don't provide them) and it greatly reduces the number of template arguments. llvm-svn: 212222	2014-07-02 21:25:58 +00:00
Adam Nemet	2415a497b5	[X86] AVX512: Add writemask variants for vperm2 This includes assembler and codegen support (see the new tests in avx512-encodings.s and avx512-shuffle.ll). <rdar://problem/17492620> llvm-svn: 212221	2014-07-02 21:25:54 +00:00
Benjamin Kramer	e739cf3eb5	X86: When combining shuffles just remove shuffles that are completely redundant. CombineTo doesn't allow replacing a node with itself so this would crash if the combined shuffle is the same as the input shuffle. llvm-svn: 212181	2014-07-02 15:09:44 +00:00
Elena Demikhovsky	678bd5ba4a	AVX-512: dec/inc instructions are slow on KNL After Alexey Volkov, I'm adding the same property for KNL, that prefers ADD/SUB instead of INC/DEC. Added a test. llvm-svn: 212178	2014-07-02 14:11:05 +00:00
Tim Northover	334d8eebe5	X86: remove atomic instructions after we've iterated through them. Otherwise they get freed and the implicit "isa<XYZ>" tests following turn out badly (at least under sanitizers). Also corrects the ordering of unordered atomic stores. llvm-svn: 212136	2014-07-01 22:10:30 +00:00
Juergen Ributzka	3bd03c7099	[DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI. The argument list vector is never used after it has been passed to the CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and pretty much as cheap as keeping a pointer to it. llvm-svn: 212135	2014-07-01 22:01:54 +00:00
Tim Northover	df58625e3c	X86: delegate expanding atomic libcalls to generic code. On targets without cmpxchg16b or cmpxchg8b, the borderline atomic operations were slipping through the gaps. X86AtomicExpand.cpp was delegating to ISelLowering. Generic ISelLowering was delegating to X86ISelLowering and X86ISelLowering was asserting. The correct behaviour is to expand to a libcall, preferably in generic ISelLowering. This can be achieved by X86ISelLowering deciding it doesn't want the faff after all. llvm-svn: 212134	2014-07-01 21:44:59 +00:00
Tim Northover	277066ab43	X86: expand atomics in IR instead of as MachineInstrs. The logic for expanding atomics that aren't natively supported in terms of cmpxchg loops is much simpler to express at the IR level. It also allows the normal optimisations and CodeGen improvements to help out with atomics, instead of using a limited set of possible instructions.. rdar://problem/13496295 llvm-svn: 212119	2014-07-01 18:53:31 +00:00
Adam Nemet	16de2486cb	[X86] AVX512: Allow writemasks with vpcmp For now I only updated the _alt variants. The main variants are used by codegen and that will need a bit more work to trigger. <rdar://problem/17492620> llvm-svn: 212114	2014-07-01 18:03:45 +00:00
Adam Nemet	1efcb90fcd	[X86] AVX512: Factor generating the AsmString into avx512_icmp_cc Adding a writemask variant would require a third asm string to be passed to the template. Generate the AsmString in the template instead. No change in X86.td.expanded. llvm-svn: 212113	2014-07-01 18:03:43 +00:00
Reid Kleckner	b5dd9452b4	Fix .seh_stackalloc 0 seh_stackalloc 0 is not representable in Win64 SEH info, so emitting it is a bug. Reviewers: rnk Differential Revision: http://reviews.llvm.org/D4334 Patch by Vadim Chugunov! llvm-svn: 212081	2014-07-01 00:42:47 +00:00
Andrea Di Biagio	53b6830069	[X86] Add support for builtin to read performance monitoring counters. This patch adds support for a new builtin instruction called __builtin_ia32_rdpmc. Builtin '__builtin_ia32_rdpmc' is defined as a 'GCC builtin'; on X86, it can be used to read performance monitoring counters. It takes as input the index of the performance counter to read, and returns the value of the specified performance counter as a 64-bit number. Calls to this new builtin will map to instruction RDPMC. The index in input to the builtin call is moved to register %ECX. The result of the builtin call is the value of the specified performance counter (RDPMC would return that quantity in registers RDX:RAX). This patch: - Adds builtin int_x86_rdpmc as a GCCBuiltin; - Adds a new x86 DAG node called 'RDPMC_DAG'; - Teaches how to lower this new builtin; - Adds an ISel pattern to select instruction RDPMC; - Fixes the definition of instruction RDPMC adding %RAX and %RDX as implicit definitions, and adding %ECX as implicit use; - Adds a LLVM test to verify that the new builtin is correctly selected. llvm-svn: 212049	2014-06-30 17:14:21 +00:00
Saleem Abdulrasool	e3c3fe53eb	X86: fix comment Fix a comment typo `DbgLocLImport` instead of `DLLImport`. llvm-svn: 212012	2014-06-30 03:11:18 +00:00
Saleem Abdulrasool	67b548154e	CodeGen: rename Win64 ExceptionHandling to WinEH This exception format is not specific to Windows x64. A similar approach is taken on nearly all architectures. Generalise the name to reflect reality. This will eventually be used for Windows on ARM data emission as well. Switch the enum and namespace into an enum class. llvm-svn: 212000	2014-06-29 21:43:47 +00:00
Saleem Abdulrasool	7206a52522	MC: rename EmitWin64EH routines Rename the routines to reflect the reality that they are more related to call frame information than to Win64 EH. Although EH is implemented in an intertwined manner by augmenting with an exception handler and an associated parameter, the majority of these routines emit information required to unwind the frames. This also helps identify that these routines are generic for most windows platforms (they apply equally to nearly all architectures except x86) although the encoding of the information is architecture dependent. Unwinding data is emitted via EmitWinCFI* and exception handling information via EmitWinEH*. llvm-svn: 211994	2014-06-29 01:52:01 +00:00
Chandler Carruth	bd0717d7cc	[x86] Fix a bug in the v8i16 shuffling exposed by the new splat-like lowering for v16i8. ASan and some bots caught this bug with existing test cases. Fixing it even fixed a miscompile with one of the test cases. I'm still a bit suspicious of this test case as I've not taken a proper amount of time to think about it, but the fix here is strict goodness. llvm-svn: 211976	2014-06-28 05:46:28 +00:00
Chandler Carruth	887c2c3482	[x86] Add handling for splat-like widenings of v16i8 shuffles. These show up really frequently, not the least with actual splats. =] We lowered these quite badly before. The new code path tries to widen i8 shuffles to i16 shuffles in a splat-like way. There are still some inefficiencies in our i16 splat logic though, so we aren't really done here. Also, for certain patterns (bit of a gather-and-splat) we still generate pretty silly code, and I've left a fixme for addressing it. However, I'm not actually worried about this code pattern as much. The old shuffle lowering generates a 29 instruction monstrosity for it that should execute much more slowly. llvm-svn: 211974	2014-06-28 05:16:40 +00:00
Chandler Carruth	a94ef908d9	[x86] Fix another bug hit when bootstrapping with the new shuffle lowering. For maximum irony, I had already discovered this bug, diagnosed it, and left FIXMEs about it in the test cases. =[ I just failed to go back over those until after i had reduced a bootstrap miscompile down to a single TU, stared at the assembly for an hour, and figured out the bug. Again. Oh well. llvm-svn: 211955	2014-06-27 20:07:40 +00:00
Chandler Carruth	dd6470a9dd	[x86] Fix a miscompile in the new shuffle lowering uncovered by a bootstrap. I managed to mis-remember how PACKUS worked on x86, and was using undef for the high bytes instead of zero. The fix is fairly obvious. llvm-svn: 211922	2014-06-27 18:25:23 +00:00
Juergen Ributzka	345589e257	[FastISel][X86] Fix typos. llvm-svn: 211911	2014-06-27 17:16:34 +00:00
Alexander Kornienko	b673b4b187	Clean up unused variable warning in release build. llvm-svn: 211902	2014-06-27 15:30:55 +00:00
Chandler Carruth	ed4a0bc734	[x86] Clean up some unused variables, especially in release builds. llvm-svn: 211894	2014-06-27 12:04:18 +00:00
Chandler Carruth	688001f042	[x86] Teach the target combine step to aggressively fold pshufd insturcions. Summary: This allows it to fold pshufd instructions across intervening half-shuffles and other noise. This pattern actually shows up in the generic lowering tests, but I've also added direct tests using intrinsics to make sure that the specific desired functionality is working even if the lowering stuff changes in the future. Differential Revision: http://reviews.llvm.org/D4292 llvm-svn: 211892	2014-06-27 11:40:13 +00:00
Chandler Carruth	0d6d1f2b17	[x86] Teach the target-specific combining how to aggressively fold half-shuffles, even looking through intervening instructions in a chain. Summary: This doesn't happen to show up with any test cases I've found for the current shuffle lowering, but previous attempts would benefit from this and it seems generally useful. I've tested it directly using intrinsics, which also shows that it will work with hand vectorized code as well. Note that even though pshufd isn't directly used in these tests, it gets exercised because we combine some of the half shuffles into a pshufd first, and then merge them. Differential Revision: http://reviews.llvm.org/D4291 llvm-svn: 211890	2014-06-27 11:34:40 +00:00
Chandler Carruth	97ebc2362c	[x86] Teach the X86 backend to DAG-combine SSE2 shuffles that are trivially redundant. This fixes several cases in the new vector shuffle lowering algorithm which would generate redundant shuffle instructions for the sake of simplicity. I'm also deleting a testcase which was somewhat ridiculous. It was checking for a bug in 2007 about incorrectly transforming shuffles by looking for the string "-86" in the output of a pretty substantial function. This test case doesn't seem to have any value at this point. Differential Revision: http://reviews.llvm.org/D4240 llvm-svn: 211889	2014-06-27 11:27:52 +00:00
Chandler Carruth	83860cfcfa	[x86] Begin a significant overhaul of how vector lowering is done in the x86 backend. This sketches out a new code path for vector lowering, hidden behind an off-by-default flag while it is under development. The fundamental idea behind the new code path is to aggressively break down the problem space in ways that ease selecting the odd set of instructions available on x86, and carefully avoid scalarizing code even when forced to use older ISAs. Notably, this starts off restricting itself to SSE2 and implements the complete vector shuffle and blend space for 128-bit vectors in SSE2 without scalarizing. The plan is to layer on top of this ISA extensions where we can bail out of the complex SSE2 lowering and opt for a cheaper, specialized instruction (or set of instructions). It also needs to be generalized to AVX and AVX512 vector widths. Currently, this does a decent but not perfect job for SSE2. There are some specific shortcomings that I plan to address: - We need a peephole combine to fold together shuffles where possible. There are cases where a previous shuffle could be modified slightly to arrange for elements to be in the correct position and a later shuffle eliminated. Doing this eagerly added quite a bit of complexity, and so my plan is to combine away these redundancies afterward. - There are a lot more clever ways to use unpck and pack that need to be added. This is essential for real world shuffles as it turns out... Once SSE2 is polished a bit I should be able to get interesting numbers on performance improvements on benchmarks conducive to vectorization. All of this will be off by default until it is functionally equivalent of course. Differential Revision: http://reviews.llvm.org/D4225 llvm-svn: 211888	2014-06-27 11:23:44 +00:00
Craig Topper	9f62d8006a	Rename getX86ConditonCode -> getX86ConditionCode llvm-svn: 211869	2014-06-27 05:18:21 +00:00
Adam Nemet	73f72e15ac	[X86] AVX512: Add vbroadcasti* For now I used a separate template for these sub-vector/tuple broadcasts rather than sharing the mem variants with avx512_int_broadcast_rm. <rdar://problem/17402869> llvm-svn: 211828	2014-06-27 00:43:38 +00:00
Alp Toker	e69170a110	Revert "Introduce a string_ostream string builder facilty" Temporarily back out commits r211749, r211752 and r211754. llvm-svn: 211814	2014-06-26 22:52:05 +00:00
Eric Christopher	83e0723457	Remove extraneous includes from the target machines. llvm-svn: 211800	2014-06-26 19:30:05 +00:00
Andrea Di Biagio	1ee38843ac	Silence a warning due to a comparison between signed and unsigned. No functional change intended. llvm-svn: 211782	2014-06-26 13:41:10 +00:00
Andrea Di Biagio	7fb85256bc	[X86] Improve the selection of SSE3/AVX addsub instructions. This patch teaches the backend how to canonicalize a shuffle vectors according to the rule: - (shuffle (FADD A, B), (FSUB A, B), Mask) -> (shuffle (FSUB A, -B), (FADD A, -B), Mask) Where 'Mask' is: <0,5,2,7> ;; for v4f32 and v4f64 shuffles. <0,3> ;; for v2f64 shuffles. <0,9,2,11,4,13,6,15> ;; for v8f32 shuffles. In general, ISel only knows how to pattern-match a canonical 'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction. This new rule allows to convert a non-canonical dag sequence into a canonical one that will be matched by a single ADDSUB at ISel stage. The idea of converting a non-canonical ADDSUB into a canonical one by swapping the first two operands of the shuffle, and then negating the second operand of the FADD and FSUB, was originally proposed by Hal Finkel. llvm-svn: 211771	2014-06-26 10:45:21 +00:00
Adam Nemet	905832bf87	[X86] AVX512: Fix asm syntax for packed vcmp The *_alt defs for vcmp are used by the InstParser (the asm string in the main def is used by the InstPrinter) . The former was accepting vector registers as destination rather than mask registers. llvm-svn: 211750	2014-06-26 00:21:12 +00:00
Alp Toker	614717388c	Introduce a string_ostream string builder facilty string_ostream is a safe and efficient string builder that combines opaque stack storage with a built-in ostream interface. small_string_ostream<bytes> additionally permits an explicit stack storage size other than the default 128 bytes to be provided. Beyond that, storage is transferred to the heap. This convenient class can be used in most places an std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair would previously have been used, in order to guarantee consistent access without byte truncation. The patch also converts much of LLVM to use the new facility. These changes include several probable bug fixes for truncated output, a programming error that's no longer possible with the new interface. llvm-svn: 211749	2014-06-26 00:00:48 +00:00
Juergen Ributzka	a13d7d6ede	[FastISel][X86] More refactoring of select lowering and XALU folding. NFC. llvm-svn: 211740	2014-06-25 22:50:59 +00:00
Juergen Ributzka	c010ddb73d	[FastISel][X86] Refactor XALU folding. NFC. llvm-svn: 211735	2014-06-25 22:17:23 +00:00
Juergen Ributzka	296833cde9	[FastISel][X86] Only fold the cmp into the select when both instructions are in the same basic block. If the cmp is in a different basic block, then it is possible that not all operands of that compare have defined registers. This can happen when one of the operands to the cmp is a load and the load gets folded into the cmp. In this case FastISel will skip the load instruction and the vreg is never defined. llvm-svn: 211730	2014-06-25 20:06:12 +00:00
Andrea Di Biagio	07cdffc324	[X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP (or VPERM2X128). This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to the check for 'isBlendMask'; the idea is that, when possible, we should firstly check if a shuffle performs a blend, and in case, try to lower it into a BLENDI instead of selecting a SHUFP or (worse) a VPERM2X128. In general: - AVX VBLENDPS/D always have better latency and throughput than VPERM2F128; - BLENDPS/D instructions tend to always have better 'reciprocal throughput' than the equivalent SHUFPS/D; - Both BLENDPS/D and SHUFPS/D are often decoded into the same number of m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more than one execution port. This patch: - Moves the check for 'isBlendMask' immediately before the check for 'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE'; - Updates existing tests for sse/avx shuffle/blend instructions to verify that we select (v)blendps/d when possible (instead of (v)shufps/d or vperm2f128). llvm-svn: 211720	2014-06-25 17:41:58 +00:00
Juergen Ributzka	9029bda8a3	Fix indentation. llvm-svn: 211717	2014-06-25 16:49:37 +00:00
Chandler Carruth	e5724d7532	[x86] Add intrinsics for the pshufd, pshuflw, and pshufhw instructions. llvm-svn: 211694	2014-06-25 13:12:54 +00:00
NAKAMURA Takumi	1db5995d14	Re-apply r211399, "Generate native unwind info on Win64" with a fix to ignore SEH pseudo ops in X86 JIT emitter. -- This patch enables LLVM to emit Win64-native unwind info rather than DWARF CFI. It handles all corner cases (I hope), including stack realignment. Because the unwind info is not flexible enough to describe stack frames with a gap of unknown size in the middle, such as the one caused by stack realignment, I modified register spilling code to place all spills into the fixed frame slots, so that they can be accessed relative to the frame pointer. Patch by Vadim Chugunov! Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D4081 llvm-svn: 211691	2014-06-25 12:41:52 +00:00
NAKAMURA Takumi	c403be1991	Reformat. llvm-svn: 211689	2014-06-25 12:40:56 +00:00
Andrea Di Biagio	6d9b9e125d	[X86] Add target combine rule to select ADDSUB instructions from a build_vector This patch teaches the backend how to combine a build_vector that implements an 'addsub' between packed float vectors into a sequence of vector add and vector sub followed by a VSELECT. The new VSELECT is expected to be lowered into a BLENDI. At ISel stage, the sequence 'vector add + vector sub + BLENDI' is pattern-matched against ISel patterns added at r211427 to select 'addsub' instructions. Added three more ISel patterns for ADDSUB. Added test sse3-avx-addsub-2.ll to verify that we correctly emit 'addsub' instructions. llvm-svn: 211679	2014-06-25 10:02:21 +00:00
Juergen Ributzka	2bce27e5a0	[FastISel][X86] Fold XALU condition into branch and compare. Optimize the codegen of select and branch instructions to directly use the EFLAGS from the {s\|u}{add\|sub\|mul}.with.overflow intrinsics. llvm-svn: 211645	2014-06-24 23:51:21 +00:00
Robert Khasanov	21c836823f	vpblend intrinsics combines as shifts intrinsics due to absence return stmt between them Fix PR20088 Differential Revision: http://reviews.llvm.org/D4277 llvm-svn: 211617	2014-06-24 18:08:04 +00:00
Adam Nemet	8ae70506ea	[Disasm][AVX512] Implement decoding of top bit for non-destructive reg fields V' bit in the P2 byte of the EVEX prefix provides the top bit of the NDD and NDS register fields. This was simply not used in the decoder until now. Fixes <rdar://problem/17402661> llvm-svn: 211565	2014-06-24 01:42:32 +00:00
Juergen Ributzka	aed5c96684	[FastISel][X86] Lower unsupported selects to control-flow. The extends the select lowering coverage by emiting pseudo cmov instructions. These insturction will be later on lowered to control-flow to simulate the select. llvm-svn: 211545	2014-06-23 21:55:44 +00:00
Juergen Ributzka	21d560843f	[FastISel][X86] Add support for floating-point select. This extends the select lowering to support floating-point selects. The lowering depends on SSE instructions and that the conditon comes from a floating-point compare. Under this conditions it is possible to emit an optimized instruction sequence that doesn't require any branches to simulate the select. llvm-svn: 211544	2014-06-23 21:55:40 +00:00
Juergen Ributzka	6ef06f9159	[FastISel][X86] Optimize selects when the condition comes from a compare. Optimize the select instructions sequence to use the EFLAGS directly from a compare when possible. llvm-svn: 211543	2014-06-23 21:55:36 +00:00
NAKAMURA Takumi	d77cefe633	Revert r211399, "Generate native unwind info on Win64" It broke Legacy JIT Tests on x86_64-{mingw32\|msvc}, aka Windows x64. llvm-svn: 211480	2014-06-22 22:00:56 +00:00
Filipe Cabecinhas	1af2dfd274	Fix PR20087 by using the source index when changing the vector load llvm-svn: 211472	2014-06-22 17:21:37 +00:00
Andrea Di Biagio	e5015d8aba	[X86] Add ISel patterns to select SSE3/AVX ADDSUB instructions. This patch adds ISel patterns to select SSE3/AVX ADDSUB instructions from a sequence of "vadd + vsub + blend". Example: /// typedef float float4 __attribute__((ext_vector_type(4))); float4 foo(float4 A, float4 B) { float4 X = A - B; float4 Y = A + B; return (float4){X[0], Y[1], X[2], Y[3]}; } /// Before this patch, (with flag -mcpu=corei7) llc produced the following assembly sequence: movaps %xmm0, %xmm2 addps %xmm1, %xmm2 subps %xmm1, %xmm0 blendps $10, %xmm2, %xmm0 With this patch, we now get a single addsubps %xmm1, %xmm0 llvm-svn: 211427	2014-06-21 01:31:15 +00:00
Rafael Espindola	df100c337c	Delete dead code. The compact unwind info is only used by code that knows it is supported. llvm-svn: 211412	2014-06-20 22:30:31 +00:00
Rafael Espindola	b4357fc293	Don't produce eh_frame relocations when targeting the IOS simulator. First step for fixing pr19185. llvm-svn: 211404	2014-06-20 21:15:27 +00:00
Reid Kleckner	4a01230db4	Generate native unwind info on Win64 This patch enables LLVM to emit Win64-native unwind info rather than DWARF CFI. It handles all corner cases (I hope), including stack realignment. Because the unwind info is not flexible enough to describe stack frames with a gap of unknown size in the middle, such as the one caused by stack realignment, I modified register spilling code to place all spills into the fixed frame slots, so that they can be accessed relative to the frame pointer. Patch by Vadim Chugunov! Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D4081 llvm-svn: 211399	2014-06-20 20:35:47 +00:00
Karthik Bhat	e03a25da70	Add Support to Recognize and Vectorize NON SIMD instructions in SLPVectorizer. This patch adds support to recognize patterns such as fadd,fsub,fadd,fsub.../add,sub,add,sub... and vectorizes them as vector shuffles if they are profitable. These patterns of vector shuffle can later be converted to instructions such as addsubpd etc on X86. Thanks to Arnold and Hal for the reviews. http://reviews.llvm.org/D4015 llvm-svn: 211339	2014-06-20 04:32:48 +00:00
Chandler Carruth	8366cebeb5	[x86] Make the x86 PACKSSWB, PACKSSDW, PACKUSWB, and PACKUSDW instructions available as synthetic SDNodes PACKSS and PACKUS that will select to the correct instruction variants based on the return type. This allows us to use these rather important instructions when lowering vector shuffles. Also moves the relevant instruction definitions to be split out from the fully generic multiclasses to allow them to match these new SDNodes in the same way that the UNPCK instructions do. No functionality should actually be changed here. llvm-svn: 211332	2014-06-20 01:05:28 +00:00
Alp Toker	1d099d9339	Fix typos llvm-svn: 211304	2014-06-19 19:41:26 +00:00
Andrea Di Biagio	54b0949af9	[X86] Teach how to combine horizontal binop even in the presence of undefs. Before this change, the backend was unable to fold a build_vector dag node with UNDEF operands into a single horizontal add/sub. This patch teaches how to combine a build_vector with UNDEF operands into a horizontal add/sub when possible. The algorithm conservatively avoids to combine a build_vector with only a single non-UNDEF operand. Added test haddsub-undef.ll to verify that we correctly fold horizontal binop even in the presence of UNDEFs. llvm-svn: 211265	2014-06-19 10:29:41 +00:00
David Majnemer	6a5b812c7b	MS asm: Properly handle quoted symbol names We would get confused by '@' characters in symbol names, we would mistake the text following them for the variant kind. When an identifier a string, the variant kind will never show up inside of it. Instead, check to see if there is a variant following the string. This fixes PR19965. llvm-svn: 211249	2014-06-19 01:25:43 +00:00
Adam Nemet	efd0785d82	[X86] AVX512: Add non-temporal stores Note that I followed the AVX2 convention here and didn't add LLVM intrinsics for stores. These can be generated with the nontemporal hint on LLVM IR stores (see new test). The GCC builtins are lowered directly into nontemporal stores. <rdar://problem/17082571> llvm-svn: 211176	2014-06-18 16:51:10 +00:00
Adam Nemet	ded81a810c	[X86] AVX512: Specify compressed displacement for vmovntdqa Use the max 64-bit element size with EVEX_CD8. This should work since element size is ignored for a full-vector access (FVM). llvm-svn: 211175	2014-06-18 16:51:07 +00:00
Cameron McInally	f10a7c963b	Add pattern for unsigned v4i32->v4f64 convert on AVX512. llvm-svn: 211164	2014-06-18 14:04:37 +00:00
Louis Gerbarg	343f5cdfad	Allow X86FastIsel to cope with 64 bit absolute relocations This patch is a follow up to r211040 & r211052. Rather than bailing out of fast isel this patch will generate an alternate instruction (movabsq) instead of the leaq. While this will always have enough room to handle the 64 bit displacment it is generally over kill for internal symbols (most displacements will be within 32 bits) but since we have no way of communicating the code model to the the assmebler in order to avoid flagging an absolute leal/leaq as illegal when using a symbolic displacement. llvm-svn: 211130	2014-06-17 23:22:41 +00:00
Juergen Ributzka	aa60209311	[FastISel][X86] Optimize predicates and fold CMP instructions. This optimizes predicates for certain compares, such as fcmp oeq %x, %x to fcmp ord %x, %x. The latter one is more efficient to generate. The same optimization is applied to conditional branches. llvm-svn: 211126	2014-06-17 21:55:43 +00:00
Juergen Ributzka	e35705675f	[FastISel][X86] Fix previous refactoring commit (r211077) Overlooked that fcmp_une uses an "or" instead of an "and" for combining the flags. llvm-svn: 211104	2014-06-17 14:47:45 +00:00
Juergen Ributzka	2da1bbc113	[FastISel][X86] Refactor the code to get the X86 condition from a helper function. NFC. Make use of helper functions to simplify the branch and compare instruction selection in FastISel. Also add test cases for compare and conditonal branch. llvm-svn: 211077	2014-06-16 23:58:24 +00:00
Louis Gerbarg	dcf00251ea	Improve comments for r211040 Added comment to clarify why we r211040 choose to bail out of fast isel instead of generating a more complicated relocation, and fix mislabelled register in the comments of the asan test case. llvm-svn: 211052	2014-06-16 20:31:50 +00:00
Louis Gerbarg	a5360c4cd8	Fix illegal relocations in X86FastISel On x86_86 the lea instruction can only use a 32 bit immediate value. When the code is compiled statically the RIP register is not used, meaning the immediate is all that can be used for the relocation, which is not sufficient in the case of targets more than +/- 2GB away. This patch bails out of fast isel in those cases and reverts to DAG which does the right thing. Test case included. llvm-svn: 211040	2014-06-16 17:35:40 +00:00
Cameron McInally	0d0489cea6	Hook up vector int_ctlz for AVX512. llvm-svn: 211024	2014-06-16 14:12:28 +00:00
Tim Northover	51472bc600	X86: lower ATOMIC_CMP_SWAP_WITH_SUCCESS directly Lowering this new node allows us to fold the almost universal comparison for success before it's even formed. Instead we can create a copy from EFLAGS and an X86ISD::SETCC operation since all "cmpxchg" instructions set the zero-flag to the correct value. rdar://problem/13201607 llvm-svn: 210923	2014-06-13 17:29:39 +00:00
Tim Northover	420a216817	IR: add "cmpxchg weak" variant to support permitted failure. This commit adds a weak variant of the cmpxchg operation, as described in C++11. A cmpxchg instruction with this modifier is permitted to fail to store, even if the comparison indicated it should. As a result, cmpxchg instructions must return a flag indicating success in addition to their original iN value loaded. Thus, for uniformity all cmpxchg instructions now return "{ iN, i1 }". The second flag is 1 when the store succeeded. At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been added as the natural representation for the new cmpxchg instructions. It is a strong cmpxchg. By default this gets Expanded to the existing ATOMIC_CMP_SWAP during Legalization, so existing backends should see no change in behaviour. If they wish to deal with the enhanced node instead, they can call setOperationAction on it. Beware: as a node with 2 results, it cannot be selected from TableGen. Currently, no use is made of the extra information provided in this patch. Test updates are almost entirely adapting the input IR to the new scheme. Summary for out of tree users: ------------------------------ + Legacy Bitcode files are upgraded during read. + Legacy assembly IR files will be invalid. + Front-ends must adapt to different type for "cmpxchg". + Backends should be unaffected by default. llvm-svn: 210903	2014-06-13 14:24:07 +00:00
Cameron McInally	c43c8f9458	Add HasCDI predicate to AVX512 VPBROADCASTM*. llvm-svn: 210892	2014-06-13 11:40:31 +00:00
Juergen Ributzka	3453bcf64d	[FastISel][X86] Add support for cvttss2si/cvttsd2si intrinsics. This adds support for the cvttss2si/cvttsd2si intrinsics. Preceding insertelement instructions are folded into the conversion instruction (if possible). llvm-svn: 210870	2014-06-13 02:21:58 +00:00
Juergen Ributzka	454d374e37	[FastISel][X86] - Add branch weights Add branch weights to branch instructions, so that the following passes can optimize based on it (i.e. basic block ordering). llvm-svn: 210863	2014-06-13 00:45:11 +00:00
Juergen Ributzka	349777d3ea	[FastISel][X86] Add MachineMemOperand to load/store instructions. This commit adds MachineMemOperands to load and store instructions. This allows the peephole optimizer to fold load instructions. Unfortunatelly the peephole optimizer currently doesn't run at -O0. llvm-svn: 210858	2014-06-12 23:27:57 +00:00
Juergen Ributzka	a13cab5b74	[FastIsel][X86] Add support for lowering the first 8 floating-point arguments. Recommit with fixed argument attribute checking code, which is required to bail out of all the cases we don't handle yet. llvm-svn: 210815	2014-06-12 20:12:34 +00:00
Juergen Ributzka	5ad463f55e	Revert "[FastIsel][X86] Add support for lowering the first 8 floating-point arguments." Reverting it because it breaks several tests. llvm-svn: 210810	2014-06-12 19:21:43 +00:00
Saleem Abdulrasool	3c890c4ad6	X86: stifle GCC warning lib/Target/X86/X86TargetTransformInfo.cpp: In member function ‘virtual unsigned int {anonymous}::X86TTI::getIntImmCost(unsigned int, unsigned int, const llvm::APInt&, llvm::Type*) const’: lib/Target/X86/X86TargetTransformInfo.cpp:920:60: warning: enumeral and non-enumeral type in conditional expression [enabled by default] This seems like an unhelpful warning, but there doesnt seem to be a controlling flag, so add an explicit cast to silence the warning. llvm-svn: 210806	2014-06-12 17:56:18 +00:00
Andrea Di Biagio	2dd3b3b674	[X86] Teach how to dump the name of target node RDTSCP_DAG. When I originally added node RDTSCP_DAG (r207127) I forgot to add a string name for it in method 'getTargetNodeName'. No functional change intended. llvm-svn: 210769	2014-06-12 11:37:24 +00:00
Andrea Di Biagio	972ff97f8c	[X86] Teach how to combine AVX and AVX2 horizontal binop on packed 256-bit vectors. This patch adds target combine rules to match: - [AVX] Horizontal add/sub of packed single/double precision floating point values from 256-bit vectors; - [AVX2] Horizontal add/sub of packed integer values from 256-bit vectors. llvm-svn: 210761	2014-06-12 10:53:48 +00:00
Juergen Ributzka	272b570a80	[FastISel][X86] Add support for the sqrt intrinsic. llvm-svn: 210720	2014-06-11 23:11:02 +00:00
Juergen Ributzka	fbaa3db909	[FastIsel][X86] Add support for lowering the first 8 floating-point arguments. llvm-svn: 210719	2014-06-11 23:10:58 +00:00
Juergen Ributzka	4dc958777c	[FastISel][X86] Add support for the frameaddress intrinsic. llvm-svn: 210709	2014-06-11 21:44:44 +00:00
Tim Northover	4dc9eaa6ba	X86: add stringy name for X86ISD::LCMPXCHG16_DAG I don't know what "target specific node #383" is, and I don't want to have to. llvm-svn: 210663	2014-06-11 17:04:08 +00:00
Cameron McInally	5d1b7b94e4	Add AVX512 masked leadz instrinsic support. llvm-svn: 210652	2014-06-11 12:54:45 +00:00
Andrea Di Biagio	c7af75f9a7	[X86] Refactor the logic to select horizontal adds/subs to a helper function. This patch moves part of the logic implemented by the target specific combine rules added at r210477 to a separate helper function. This should make easier to add more rules for matching AVX/AVX2 horizontal adds/subs. This patch also fixes a problem caused by a wrong check performed on indices of extract_vector_elt dag nodes in input to the scalar adds/subs. New tests have been added to verify that we correctly check indices of extract_vector_elt dag nodes when selecting a horizontal operation. llvm-svn: 210644	2014-06-11 07:57:50 +00:00
Eric Christopher	1a2120312b	Move to a private function to initialize the subtarget dependencies so that we can use initializer lists for the X86Subtarget. llvm-svn: 210614	2014-06-11 00:25:19 +00:00
Juergen Ributzka	2dace6e54b	[FastISel][X86] Extend support for {s\|u}{add\|sub\|mul}.with.overflow intrinsics. llvm-svn: 210610	2014-06-10 23:52:44 +00:00
Eric Christopher	cd996edec5	Use unique_ptr for X86Subtarget pointer members. llvm-svn: 210606	2014-06-10 23:26:47 +00:00
Eric Christopher	6c786a1dd1	Remove the use of TargetMachine from X86InstrInfo. llvm-svn: 210596	2014-06-10 22:34:31 +00:00
Eric Christopher	1f8ad4f4a7	Move X86RegisterInfo away from using the TargetMachine and only using the subtarget. llvm-svn: 210595	2014-06-10 22:34:28 +00:00
Eric Christopher	68d7559e97	Use the TargetMachine on the DAG or the MachineFunction instead of using the cached TargetMachine. llvm-svn: 210589	2014-06-10 21:25:13 +00:00
Eric Christopher	19b1d73e88	Add a FIXME. llvm-svn: 210559	2014-06-10 18:31:18 +00:00
Andrea Di Biagio	fa508af0fe	[X86] Improved target combine rules for selecting horizontal add/sub. This patch slightly changes the algorithm introduced at revision 210477 to fix a problem where the algorithm was producing incorrect code for the VEX.256 encoded versions of horizontal add/sub. For these cases, we now try to split the two 256-bit vectors into 128-bit chunks before emitting horizontal add/sub dag nodes. Added a new test case into haddsub-2.ll. llvm-svn: 210545	2014-06-10 16:42:57 +00:00
Adam Nemet	7f62b23e92	[X86] AVX512: Add vmovntdqa Along with the corresponding intrinsic and tests. llvm-svn: 210543	2014-06-10 16:39:53 +00:00
Tom Stellard	3787b12255	SelectionDAG: Don't use MVT::Other to determine legality of ISD::SELECT_CC The SelectionDAG bad a special case for ISD::SELECT_CC, where it would allow targets to specify: setOperationAction(ISD::SELECT_CC, MVT::Other, Expand); to indicate that they wanted to expand ISD::SELECT_CC for all types. This wasn't applied correctly everywhere, and it makes writing new DAG patterns with ISD::SELECT_CC difficult. llvm-svn: 210541	2014-06-10 16:01:29 +00:00
Tim Northover	7b9f86da5d	Revert "X86: elide comparisons after cmpxchg instructions." This reverts commit r210523. It was committed prematurely without waiting for review. llvm-svn: 210524	2014-06-10 10:50:11 +00:00
Tim Northover	84ad29ca1f	X86: elide comparisons after cmpxchg instructions. The C++ and C semantics of the compare_and_swap operations actually require us to return a boolean "success" value. In LLVM terms this means a second comparison of the output of "cmpxchg" against the input desired value. However, x86's "cmpxchg" instruction sets all flags for the comparison formed, so we can skip any secondary comparison. (N.b. this isn't true for cmpxchg8b/16b, which only set ZF). rdar://problem/13201607 llvm-svn: 210523	2014-06-10 10:49:07 +00:00
Eric Christopher	0fb16ab204	Delete X86JITInfo in the subtarget destructor. llvm-svn: 210516	2014-06-10 08:03:42 +00:00
Juergen Ributzka	b2e4edb5c8	[ConstantHoisting][X86] Improve the cost model for small constants with large types (i64 and above). This improves the X86 cost model for small constants with large types. Before this commit we would even hoist trivial constants such as i96 2. This is related to <rdar://problem/17070936> llvm-svn: 210504	2014-06-10 00:32:29 +00:00
Eric Christopher	a08f30bd40	Move all of the x86 subtarget initialized variables down into the x86 subtarget from the x86 target machine. Should be no functional change. llvm-svn: 210479	2014-06-09 17:08:19 +00:00
Andrea Di Biagio	f99dd64f0a	[X86] Add target combine rules for horizontal add/sub. This patch adds new target specific combine rules to identify horizontal add/sub idioms from BUILD_VECTOR dag nodes. This patch also teaches the DAGCombiner how to canonicalize sequences of insert_vector_elt dag nodes according to the following rule: (insert_vector_elt (insert_vector_elt A, I0), I1) -> (insert_vecto_elt (insert_vector_elt A, I1), I0) This new canonicalization rule only triggers if the inner insert_vector dag node has exactly one use; also, both indices must be known constants, and I1 < I0. This last rule made it possible to write a simpler algorithm to identify horizontal add/sub patterns because now we don't have to worry about the ordering of insert_vector_elt dag nodes. llvm-svn: 210477	2014-06-09 16:54:41 +00:00
Andrea Di Biagio	dfbdc71ea1	[X86] Avoid emitting unnecessary test instructions. This patch teaches the backend how to check for the 'NoSignedWrap' flag on binary operations to improve the emission of 'test' instructions. If the result of a binary operation is known not to overflow we know that resetting the Overflow flag is unnecessary and so we can avoid emitting the test instruction. Patch by Marcello Maggioni. llvm-svn: 210468	2014-06-09 12:34:50 +00:00
Alexey Volkov	5260dba323	[X86] Use ADD/SUB instead of INC/DEC for Silvermont According to Intel Software Optimization Manual on Silvermont INC or DEC instructions require an additional uop to merge the flags. As a result, a branch instruction depending on an INC or a DEC instruction incurs a 1 cycle penalty. Differential Revision: http://reviews.llvm.org/D3990 llvm-svn: 210466	2014-06-09 11:40:41 +00:00
Craig Topper	66f09ad041	[C++11] Use 'nullptr'. llvm-svn: 210442	2014-06-08 22:29:17 +00:00
Saleem Abdulrasool	4acde1d4dc	X86: simplify data layout calculation X86Subtarget::isTargetCygMing \|\| X86Subtarget::isTargetKnownWindowsMSVC is equivalent to all Windows environments. Simplify the check to isOSWindows. NFC. llvm-svn: 210431	2014-06-08 19:08:36 +00:00
David Blaikie	960ea3f018	AsmMatchers: Use unique_ptr to manage ownership of MCParsedAsmOperand I saw at least a memory leak or two from inspection (on probably untested error paths) and r206991, which was the original inspiration for this change. I ran this idea by Jim Grosbach a few weeks ago & he was OK with it. Since it's a basically mechanical patch that seemed sufficient - usual post-commit review, revert, etc, as needed. llvm-svn: 210427	2014-06-08 16:18:35 +00:00
Eric Christopher	28783da044	Replace the use of TargetMachine with a tiny bool variable. llvm-svn: 210386	2014-06-06 23:26:48 +00:00
Eric Christopher	e5add682ce	Remove all local variables from X86SelectionDAGInfo, the DAG has all of the ones we were stashing away on startup. llvm-svn: 210385	2014-06-06 23:26:43 +00:00
Benjamin Kramer	d0700b2919	X86: Don't turn shifts into ands if there's another use that may not check for equality. Fixes PR19964. llvm-svn: 210371	2014-06-06 21:08:55 +00:00
Eric Christopher	0dd8d486b3	Have TargetSelectionDAGInfo take a DataLayout initializer rather than a TargetMachine since the only thing it wants is DataLayout. llvm-svn: 210366	2014-06-06 19:04:48 +00:00
Filipe Cabecinhas	5181255696	Fixed a bug in lowering shuffle_vectors to insertps Summary: We were being too strict and not accounting for undefs. Added a test case and fixed another one where we improved codegen. Reviewers: grosbach, nadav, delena Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4039 llvm-svn: 210361	2014-06-06 18:07:06 +00:00
Eric Christopher	66f676e9e5	Remove X86Subtarget from the X86FrameLowering constructor since we can just pass in the values we already know and we're not caching the subtarget anymore. llvm-svn: 210292	2014-06-05 22:10:58 +00:00
Eric Christopher	f438164d30	Remove caching of the subtarget for X86FrameLowering. llvm-svn: 210290	2014-06-05 22:00:31 +00:00
Eric Christopher	c22a04c063	Remove duplicate copy of InstrItineraryData from the TargetMachine, it's already on the subtarget. llvm-svn: 210289	2014-06-05 21:42:54 +00:00
Tom Roeder	44cb65fff1	Add a new attribute called 'jumptable' that creates jump-instruction tables for functions marked with this attribute. It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables. This also adds backend support for generating the jump-instruction tables on ARM and X86. Note that since the jumptable attribute creates a second function pointer for a function, any function marked with jumptable must also be marked with unnamed_addr. llvm-svn: 210280	2014-06-05 19:29:43 +00:00
Eric Christopher	21a5e5c1c7	We've got a getSlotSize call already that we use everywhere else, use it here too. llvm-svn: 210227	2014-06-05 00:22:13 +00:00
Eric Christopher	52fa6599e8	80-columns. llvm-svn: 210224	2014-06-05 00:09:08 +00:00
Eric Christopher	11b05cccfa	Remove uses of the TargetMachine from X86FrameLowering. llvm-svn: 210223	2014-06-05 00:09:05 +00:00
Yaron Keren	2207190cd5	Two small enhancements for the JIT. When JITting a large project such as Boost it's quite hard to figure out the problematic inline asm without debug location. This patch provides debug location printout before the JIT aborts due to inline asm. printDebugLoc() was exposed from MachineInstr.cpp and reused here. If the JIT run with debug info, don't bomb on DBG_VALUE but ignore them. http://reviews.llvm.org/D3416 llvm-svn: 210201	2014-06-04 17:35:28 +00:00
Nick Lewycky	0a9a866ce1	Fix a use of uninitialized value. OldCC is set when IsCmpZero \|\| IsSwapped and read when ShouldUpdateCC \|\| IsSwapped, and ShouldUpdateCC is independent. Fixes PR19932, but no test since I wasn't able to get any symptoms to appear, not even with valgrind and the testcase from the PR. It's clear what happened from inspection of the code. llvm-svn: 210168	2014-06-04 07:45:54 +00:00
Eric Christopher	dd240fd79c	Revert r209381 as it isn't a local variable. Add a testcase so that we know next time this happens. llvm-svn: 210127	2014-06-03 21:01:39 +00:00
Eric Christopher	31b81ce5ee	Fixup formatting in the pass. llvm-svn: 210126	2014-06-03 21:01:35 +00:00
Andrea Di Biagio	4760813831	[X86] Fix checked arithmetic for i8 on X86. When lowering a ISD::BRCOND into a test+branch, make sure that we always use the correct condition code to emit the test operation. This fixes PR19858: "i8 checked mul is wrong on x86". Patch by Keno Fisher! llvm-svn: 210032	2014-06-02 16:00:27 +00:00
Eric Christopher	8995833a34	Have the TLOF creation take a Triple rather than needing a subtarget. llvm-svn: 209937	2014-05-31 00:07:32 +00:00
Andrea Di Biagio	446a527905	[X86] Add two combine rules to simplify dag nodes introduced during type legalization when promoting nodes with illegal vector type. This patch teaches the backend how to simplify/canonicalize dag node sequences normally introduced by the backend when promoting certain dag nodes with illegal vector type. This patch adds two new combine rules: 1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) -> (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>) 2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) -> (shuffle (BINOP A, B), Undef, <Mask>). Both rules are only triggered on the type-legalized DAG. In particular, rule 1. is a target specific combine rule that attempts to sink a bitconvert into the operands of a binary operation. Rule 2. is a target independet rule that attempts to move a shuffle immediately after a binary operation. llvm-svn: 209930	2014-05-30 23:17:53 +00:00
Filipe Cabecinhas	d3aebaf875	Separate the check for blend shuffle_vector masks Summary: Separate the check for blend shuffle_vector masks into isBlendMask. This function will also be used to check if a vector shuffle is legal. No change in functionality was intended, but we ended up improving codegen on two tests, which were being (more) optimized only if the resulting shuffle was legal. Reviewers: nadav, delena, andreadb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3964 llvm-svn: 209923	2014-05-30 21:31:21 +00:00
Adam Nemet	35b80eaef1	[X86] Remove AVX1 vbroadcast intrinsics The corresponding CFE patch replaces these intrinsics with vector initializers in avxintrin.h. This patch removes the LLVM intrinsics from the backend. We now stop lowering at X86ISD::VBROADCAST custom node rather than lowering that further to the intrinsics. The patch only changes VBROADCASTS* and leaves VBROADCAST[FI]128 to continue to use intrinsics. As explained in the CFE patch, the reason is that we currently don't generate as good code for them without the intrinsics. CodeGen/X86/avx-vbroadcast.ll already provides coverage for this change. It checks that for a series of insertelements we generate the appropriate vbroadcast instruction. Also verified that there was no assembly change in the test-suite before and after this patch. llvm-svn: 209864	2014-05-29 23:35:36 +00:00
Rafael Espindola	59f7eba2b5	[pr19844] Add thread local mode to aliases. This matches gcc's behavior. It also seems natural given that aliases contain other properties that govern how it is accessed (linkage, visibility, dll storage). Clang still has to be updated to expose this feature to C. llvm-svn: 209759	2014-05-28 18:15:43 +00:00
Rafael Espindola	4a04c4b69c	Emit data or code export directives based on the type. Currently we look at the Aliasee to decide what type of export directive to use. It seems better to use the type of the alias directly. This is similar to how we handle the alias having the same address but other attributes (linkage, visibility) from the aliasee. With this patch it is now possible to do things like target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-windows-msvc" @foo = global [6 x i8] c"\B8\00\00\00\C3", section ".text", align 16 @f = dllexport alias i32 (), [6 x i8] @foo !llvm.module.flags = !{!0} !0 = metadata !{i32 6, metadata !"Linker Options", metadata !1} !1 = metadata !{metadata !2, metadata !3} !2 = metadata !{metadata !"/DEFAULTLIB:libcmt.lib"} !3 = metadata !{metadata !"/DEFAULTLIB:oldnames.lib"} llvm-svn: 209600	2014-05-25 12:49:07 +00:00
Rafael Espindola	a31f3e50dc	Delete dead code. GV is never used past this point. This was probably a copy and paste error. llvm-svn: 209518	2014-05-23 15:07:51 +00:00
Andrea Di Biagio	c8dd1ad85b	[X86] Improve the lowering of BITCAST from MVT::f64 to MVT::v4i16/MVT::v8i8. This patch teaches the x86 backend how to efficiently lower ISD::BITCAST dag nodes from MVT::f64 to MVT::v4i16 (and vice versa), and from MVT::f64 to MVT::v8i8 (and vice versa). This patch extends the logic from revision 208107 to also handle MVT::v4i16 and MVT::v8i8. Also, this patch correctly propagates Undef values when performing the widening of a vector (example: when widening from v2i32 to v4i32, the upper 64bits of the resulting vector are 'undef'). llvm-svn: 209451	2014-05-22 16:21:39 +00:00
Tim Northover	f9e798ba6a	Segmented stacks: omit __morestack call when there's no frame. Patch by Florian Zeitz llvm-svn: 209436	2014-05-22 13:03:43 +00:00
Eric Christopher	4f09c59243	Override runOnMachineFunction for X86ISelDAGToDAG so that we can reset the subtarget on each function. llvm-svn: 209384	2014-05-22 01:53:26 +00:00
Eric Christopher	0d5c99eb08	Avoid using subtarget features when adding X86 specific passes to the pass pipeline. llvm-svn: 209382	2014-05-22 01:46:02 +00:00
Eric Christopher	e0bd2fa927	Remove extra local variable. llvm-svn: 209381	2014-05-22 01:45:59 +00:00
Eric Christopher	463b84b48b	Rename createGlobalBaseRegPass -> createX86GlobalBaseRegPass to make it obvious that it's a target specific pass. llvm-svn: 209380	2014-05-22 01:45:57 +00:00
Eric Christopher	89f18805f4	Fix typo. llvm-svn: 209377	2014-05-22 01:21:44 +00:00
Eric Christopher	3470bbbd54	Fix compilation issues. llvm-svn: 209342	2014-05-21 23:51:57 +00:00
Eric Christopher	6b0fcfee36	Make early if conversion dependent upon the subtarget and add a subtarget hook to enable. Unconditionally add to the pass pipeline for targets that might want to use it. No functional change. llvm-svn: 209340	2014-05-21 23:40:26 +00:00
Quentin Colombet	b4d53f1afa	[X86] Fix a bug in the lowering of BLENDI introduced in r209043. ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the second argument. On the other hand, BLENDI uses 0 to identify the first argument and 1 to identify the second argument. Fix the generation of the blend mask to account for this difference. The bug did not show up with r209043, because we were not checking for the actual arguments of the blend instruction! This commit also fixes the test cases. Note: The same mask works for the BLENDr variant because the arguments are swapped during instruction selection (see the BLENDXXrr patterns). <rdar://problem/16975435> llvm-svn: 209324	2014-05-21 22:00:39 +00:00
Evgeniy Stepanov	fc9c78a6b6	[asan] Fix x86-32 asm instrumentation to preserve flags. Patch by Yuri Gorshenin. llvm-svn: 209280	2014-05-21 08:14:24 +00:00
Simon Atanasyan	e7fa2314af	Add parentheses to suppress the gcc warning '-Wparentheses'. No functional changes. llvm-svn: 209203	2014-05-20 10:23:04 +00:00
Alexey Volkov	6226de6721	[X86] Tune LEA usage for Silvermont According to Intel Software Optimization Manual on Silvermont in some cases LEA is better to be replaced with ADD instructions: "The rule of thumb for ADDs and LEAs is that it is justified to use LEA with a valid index and/or displacement for non-destructive destination purposes (especially useful for stack offset cases), or to use a SCALE. Otherwise, ADD(s) are preferable." Differential Revision: http://reviews.llvm.org/D3826 llvm-svn: 209198	2014-05-20 08:55:50 +00:00
Juergen Ributzka	431761771c	[ConstantHoisting][X86] Change the cost model to never hoist constants for types larger than i128. Currently the X86 backend doesn't support types larger than i128 very well. For example an i192 multiply will assert in codegen when the 2nd argument is a constant and the constant got hoisted. This fix changes the cost model to never hoist constants for types larger than i128. Once the codegen issues have been resolved, the cost model can be updated to allow also larger types. This is related to <rdar://problem/16954938> llvm-svn: 209162	2014-05-19 21:00:53 +00:00
Andrea Di Biagio	7a85cadfd6	[X86] Add ISel patterns to improve the selection of TZCNT and LZCNT. Instructions TZCNT (requires BMI1) and LZCNT (requires LZCNT), always provide the operand size as output if the input operand is zero. We can take advantage of this knowledge during instruction selection stage in order to simplify a few corner case. llvm-svn: 209159	2014-05-19 20:38:59 +00:00
Filipe Cabecinhas	dc92102766	Added more insertps optimizations Summary: When inserting an element that's coming from a vector load or a broadcast of a vector (or scalar) load, combine the load into the insertps instruction. Added PerformINSERTPSCombine for the case where we need to fix the load (load of a vector + insertps with a non-zero CountS). Added patterns for the broadcasts. Also added tests for SSE4.1, AVX, and AVX2. Reviewers: delena, nadav, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3581 llvm-svn: 209156	2014-05-19 19:45:57 +00:00
Benjamin Kramer	f3ad23551d	SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not. - On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though. - On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal. - On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled. llvm-svn: 209123	2014-05-19 13:12:38 +00:00

1 2 3 4 5 ...

10438 Commits