llvm-project

Commit Graph

Author	SHA1	Message	Date
Andrea Di Biagio	54b0949af9	[X86] Teach how to combine horizontal binop even in the presence of undefs. Before this change, the backend was unable to fold a build_vector dag node with UNDEF operands into a single horizontal add/sub. This patch teaches how to combine a build_vector with UNDEF operands into a horizontal add/sub when possible. The algorithm conservatively avoids to combine a build_vector with only a single non-UNDEF operand. Added test haddsub-undef.ll to verify that we correctly fold horizontal binop even in the presence of UNDEFs. llvm-svn: 211265	2014-06-19 10:29:41 +00:00
Adam Nemet	efd0785d82	[X86] AVX512: Add non-temporal stores Note that I followed the AVX2 convention here and didn't add LLVM intrinsics for stores. These can be generated with the nontemporal hint on LLVM IR stores (see new test). The GCC builtins are lowered directly into nontemporal stores. <rdar://problem/17082571> llvm-svn: 211176	2014-06-18 16:51:10 +00:00
Cameron McInally	f10a7c963b	Add pattern for unsigned v4i32->v4f64 convert on AVX512. llvm-svn: 211164	2014-06-18 14:04:37 +00:00
Tim Northover	d82ed2e581	DAG: move sret demotion into most basic LowerCallTo implementation. It looks like there are two versions of LowerCallTo here: the SelectionDAGBuilder one is designed to operate on LLVM IR, and the TargetLowering one in the case where everything is at DAG level. Previously, only the SelectionDAGBuilder variant could handle demoting an impossible return to sret semantics (before delegating to the TargetLowering version), but this functionality is also useful for certain libcalls (e.g. 128-bit operations on 32-bit x86). So this commit moves the sret handling down a level. rdar://problem/17242889 llvm-svn: 211155	2014-06-18 11:52:44 +00:00
Louis Gerbarg	343f5cdfad	Allow X86FastIsel to cope with 64 bit absolute relocations This patch is a follow up to r211040 & r211052. Rather than bailing out of fast isel this patch will generate an alternate instruction (movabsq) instead of the leaq. While this will always have enough room to handle the 64 bit displacment it is generally over kill for internal symbols (most displacements will be within 32 bits) but since we have no way of communicating the code model to the the assmebler in order to avoid flagging an absolute leal/leaq as illegal when using a symbolic displacement. llvm-svn: 211130	2014-06-17 23:22:41 +00:00
Juergen Ributzka	aa60209311	[FastISel][X86] Optimize predicates and fold CMP instructions. This optimizes predicates for certain compares, such as fcmp oeq %x, %x to fcmp ord %x, %x. The latter one is more efficient to generate. The same optimization is applied to conditional branches. llvm-svn: 211126	2014-06-17 21:55:43 +00:00
Juergen Ributzka	e35705675f	[FastISel][X86] Fix previous refactoring commit (r211077) Overlooked that fcmp_une uses an "or" instead of an "and" for combining the flags. llvm-svn: 211104	2014-06-17 14:47:45 +00:00
Juergen Ributzka	2da1bbc113	[FastISel][X86] Refactor the code to get the X86 condition from a helper function. NFC. Make use of helper functions to simplify the branch and compare instruction selection in FastISel. Also add test cases for compare and conditonal branch. llvm-svn: 211077	2014-06-16 23:58:24 +00:00
Louis Gerbarg	a5360c4cd8	Fix illegal relocations in X86FastISel On x86_86 the lea instruction can only use a 32 bit immediate value. When the code is compiled statically the RIP register is not used, meaning the immediate is all that can be used for the relocation, which is not sufficient in the case of targets more than +/- 2GB away. This patch bails out of fast isel in those cases and reverts to DAG which does the right thing. Test case included. llvm-svn: 211040	2014-06-16 17:35:40 +00:00
Cameron McInally	0d0489cea6	Hook up vector int_ctlz for AVX512. llvm-svn: 211024	2014-06-16 14:12:28 +00:00
David Blaikie	eb1a27239c	DebugInfo: Following up to r209677, refactor local variable emission to delay the choice between emitting the definition attributes or using DW_AT_abstract_definition This doesn't fix the abstract variable handling yet, but it introduces a similar delay mechanism as was added for subprograms, causing DW_AT_location to be reordered to the beginning of the attribute list for local variables, and fixes all the test fallout for that. A subsequent commit will remove the abstract variable handling in DbgVariable and just do the abstract variable lookup at module end to ensure that abstract variables introduced after their concrete counterparts are appropriately referenced by the concrete variable. llvm-svn: 210943	2014-06-13 22:18:23 +00:00
Tim Northover	51472bc600	X86: lower ATOMIC_CMP_SWAP_WITH_SUCCESS directly Lowering this new node allows us to fold the almost universal comparison for success before it's even formed. Instead we can create a copy from EFLAGS and an X86ISD::SETCC operation since all "cmpxchg" instructions set the zero-flag to the correct value. rdar://problem/13201607 llvm-svn: 210923	2014-06-13 17:29:39 +00:00
Tim Northover	420a216817	IR: add "cmpxchg weak" variant to support permitted failure. This commit adds a weak variant of the cmpxchg operation, as described in C++11. A cmpxchg instruction with this modifier is permitted to fail to store, even if the comparison indicated it should. As a result, cmpxchg instructions must return a flag indicating success in addition to their original iN value loaded. Thus, for uniformity all cmpxchg instructions now return "{ iN, i1 }". The second flag is 1 when the store succeeded. At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been added as the natural representation for the new cmpxchg instructions. It is a strong cmpxchg. By default this gets Expanded to the existing ATOMIC_CMP_SWAP during Legalization, so existing backends should see no change in behaviour. If they wish to deal with the enhanced node instead, they can call setOperationAction on it. Beware: as a node with 2 results, it cannot be selected from TableGen. Currently, no use is made of the extra information provided in this patch. Test updates are almost entirely adapting the input IR to the new scheme. Summary for out of tree users: ------------------------------ + Legacy Bitcode files are upgraded during read. + Legacy assembly IR files will be invalid. + Front-ends must adapt to different type for "cmpxchg". + Backends should be unaffected by default. llvm-svn: 210903	2014-06-13 14:24:07 +00:00
Cameron McInally	ed5f645bf3	Fix bad copy-and-paste from r210652. AVX512 masked leading zero intrinsics. llvm-svn: 210901	2014-06-13 13:20:01 +00:00
NAKAMURA Takumi	b8ed66de4d	llvm/test/CodeGen/X86/fast-isel-args-fail2.ll: Don't expect to fail with -Asserts. It might or might not crash. llvm-svn: 210894	2014-06-13 12:05:06 +00:00
Juergen Ributzka	3453bcf64d	[FastISel][X86] Add support for cvttss2si/cvttsd2si intrinsics. This adds support for the cvttss2si/cvttsd2si intrinsics. Preceding insertelement instructions are folded into the conversion instruction (if possible). llvm-svn: 210870	2014-06-13 02:21:58 +00:00
Juergen Ributzka	454d374e37	[FastISel][X86] - Add branch weights Add branch weights to branch instructions, so that the following passes can optimize based on it (i.e. basic block ordering). llvm-svn: 210863	2014-06-13 00:45:11 +00:00
Juergen Ributzka	349777d3ea	[FastISel][X86] Add MachineMemOperand to load/store instructions. This commit adds MachineMemOperands to load and store instructions. This allows the peephole optimizer to fold load instructions. Unfortunatelly the peephole optimizer currently doesn't run at -O0. llvm-svn: 210858	2014-06-12 23:27:57 +00:00
Juergen Ributzka	14832b0ff7	Update test case to use "not" instead of "XFAIL". llvm-svn: 210829	2014-06-12 21:17:40 +00:00
Juergen Ributzka	8ce8f21c3c	[FastISel][X86] Argument lowering test case This test case is supposed to xfail, because we do not handle structs or byval arguments. llvm-svn: 210816	2014-06-12 20:34:09 +00:00
Juergen Ributzka	a13cab5b74	[FastIsel][X86] Add support for lowering the first 8 floating-point arguments. Recommit with fixed argument attribute checking code, which is required to bail out of all the cases we don't handle yet. llvm-svn: 210815	2014-06-12 20:12:34 +00:00
Juergen Ributzka	5ad463f55e	Revert "[FastIsel][X86] Add support for lowering the first 8 floating-point arguments." Reverting it because it breaks several tests. llvm-svn: 210810	2014-06-12 19:21:43 +00:00
Tom Stellard	7783b0adf4	Revert "SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, y)) for vectors" This reverts commit r210540, adds a testcase for the regression it caused, and marks the R600 test it was supposed to fix as XFAIL. llvm-svn: 210792	2014-06-12 16:04:47 +00:00
Andrea Di Biagio	972ff97f8c	[X86] Teach how to combine AVX and AVX2 horizontal binop on packed 256-bit vectors. This patch adds target combine rules to match: - [AVX] Horizontal add/sub of packed single/double precision floating point values from 256-bit vectors; - [AVX2] Horizontal add/sub of packed integer values from 256-bit vectors. llvm-svn: 210761	2014-06-12 10:53:48 +00:00
Juergen Ributzka	b43a559514	[FastISel][x86] Add testcase for r210719. llvm-svn: 210746	2014-06-12 03:54:05 +00:00
Juergen Ributzka	7eac929609	[x86] Improve frameaddress test from r210709. llvm-svn: 210743	2014-06-12 03:29:29 +00:00
Juergen Ributzka	04558dc77a	[FastISel] Add support for the stackmap intrinsic. This implements target-independent FastISel lowering for the stackmap intrinsic. llvm-svn: 210742	2014-06-12 03:29:26 +00:00
Juergen Ributzka	272b570a80	[FastISel][X86] Add support for the sqrt intrinsic. llvm-svn: 210720	2014-06-11 23:11:02 +00:00
Juergen Ributzka	4dc958777c	[FastISel][X86] Add support for the frameaddress intrinsic. llvm-svn: 210709	2014-06-11 21:44:44 +00:00
Cameron McInally	5d1b7b94e4	Add AVX512 masked leadz instrinsic support. llvm-svn: 210652	2014-06-11 12:54:45 +00:00
Andrea Di Biagio	c7af75f9a7	[X86] Refactor the logic to select horizontal adds/subs to a helper function. This patch moves part of the logic implemented by the target specific combine rules added at r210477 to a separate helper function. This should make easier to add more rules for matching AVX/AVX2 horizontal adds/subs. This patch also fixes a problem caused by a wrong check performed on indices of extract_vector_elt dag nodes in input to the scalar adds/subs. New tests have been added to verify that we correctly check indices of extract_vector_elt dag nodes when selecting a horizontal operation. llvm-svn: 210644	2014-06-11 07:57:50 +00:00
Juergen Ributzka	2dace6e54b	[FastISel][X86] Extend support for {s\|u}{add\|sub\|mul}.with.overflow intrinsics. llvm-svn: 210610	2014-06-10 23:52:44 +00:00
Andrea Di Biagio	fa508af0fe	[X86] Improved target combine rules for selecting horizontal add/sub. This patch slightly changes the algorithm introduced at revision 210477 to fix a problem where the algorithm was producing incorrect code for the VEX.256 encoded versions of horizontal add/sub. For these cases, we now try to split the two 256-bit vectors into 128-bit chunks before emitting horizontal add/sub dag nodes. Added a new test case into haddsub-2.ll. llvm-svn: 210545	2014-06-10 16:42:57 +00:00
Adam Nemet	7f62b23e92	[X86] AVX512: Add vmovntdqa Along with the corresponding intrinsic and tests. llvm-svn: 210543	2014-06-10 16:39:53 +00:00
Tim Northover	7b9f86da5d	Revert "X86: elide comparisons after cmpxchg instructions." This reverts commit r210523. It was committed prematurely without waiting for review. llvm-svn: 210524	2014-06-10 10:50:11 +00:00
Tim Northover	84ad29ca1f	X86: elide comparisons after cmpxchg instructions. The C++ and C semantics of the compare_and_swap operations actually require us to return a boolean "success" value. In LLVM terms this means a second comparison of the output of "cmpxchg" against the input desired value. However, x86's "cmpxchg" instruction sets all flags for the comparison formed, so we can skip any secondary comparison. (N.b. this isn't true for cmpxchg8b/16b, which only set ZF). rdar://problem/13201607 llvm-svn: 210523	2014-06-10 10:49:07 +00:00
Alp Toker	d3d017cf00	Reduce verbiage of lit.local.cfg files We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496	2014-06-09 22:42:55 +00:00
Andrea Di Biagio	f99dd64f0a	[X86] Add target combine rules for horizontal add/sub. This patch adds new target specific combine rules to identify horizontal add/sub idioms from BUILD_VECTOR dag nodes. This patch also teaches the DAGCombiner how to canonicalize sequences of insert_vector_elt dag nodes according to the following rule: (insert_vector_elt (insert_vector_elt A, I0), I1) -> (insert_vecto_elt (insert_vector_elt A, I1), I0) This new canonicalization rule only triggers if the inner insert_vector dag node has exactly one use; also, both indices must be known constants, and I1 < I0. This last rule made it possible to write a simpler algorithm to identify horizontal add/sub patterns because now we don't have to worry about the ordering of insert_vector_elt dag nodes. llvm-svn: 210477	2014-06-09 16:54:41 +00:00
NAKAMURA Takumi	0ecb6e77f3	llvm/test/CodeGen/X86/2014-05-29-factorial.ll: Relax an expression to match Win32 x64. llvm-svn: 210471	2014-06-09 14:20:23 +00:00
Andrea Di Biagio	dfbdc71ea1	[X86] Avoid emitting unnecessary test instructions. This patch teaches the backend how to check for the 'NoSignedWrap' flag on binary operations to improve the emission of 'test' instructions. If the result of a binary operation is known not to overflow we know that resetting the Overflow flag is unnecessary and so we can avoid emitting the test instruction. Patch by Marcello Maggioni. llvm-svn: 210468	2014-06-09 12:34:50 +00:00
Andrea Di Biagio	4db1abea15	[DAG] Expose NoSignedWrap, NoUnsignedWrap and Exact flags to SelectionDAG. This patch modifies SelectionDAGBuilder to construct SDNodes with associated NoSignedWrap, NoUnsignedWrap and Exact flags coming from IR BinaryOperator instructions. Added a new SDNode type called 'BinaryWithFlagsSDNode' to allow accessing nsw/nuw/exact flags during codegen. Patch by Marcello Maggioni. llvm-svn: 210467	2014-06-09 12:32:53 +00:00
Alp Toker	5c53639492	Fix typos llvm-svn: 210401	2014-06-07 21:23:09 +00:00
Benjamin Kramer	d0700b2919	X86: Don't turn shifts into ands if there's another use that may not check for equality. Fixes PR19964. llvm-svn: 210371	2014-06-06 21:08:55 +00:00
Filipe Cabecinhas	5181255696	Fixed a bug in lowering shuffle_vectors to insertps Summary: We were being too strict and not accounting for undefs. Added a test case and fixed another one where we improved codegen. Reviewers: grosbach, nadav, delena Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4039 llvm-svn: 210361	2014-06-06 18:07:06 +00:00
Tom Roeder	44cb65fff1	Add a new attribute called 'jumptable' that creates jump-instruction tables for functions marked with this attribute. It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables. This also adds backend support for generating the jump-instruction tables on ARM and X86. Note that since the jumptable attribute creates a second function pointer for a function, any function marked with jumptable must also be marked with unnamed_addr. llvm-svn: 210280	2014-06-05 19:29:43 +00:00
Eric Christopher	dd240fd79c	Revert r209381 as it isn't a local variable. Add a testcase so that we know next time this happens. llvm-svn: 210127	2014-06-03 21:01:39 +00:00
Rafael Espindola	64c1e18033	Allow alias to point to an arbitrary ConstantExpr. This patch changes GlobalAlias to point to an arbitrary ConstantExpr and it is up to MC (or the system assembler) to decide if that expression is valid or not. This reduces our ability to diagnose invalid uses and how early we can spot them, but it also lets us do things like @test5 = alias inttoptr(i32 sub (i32 ptrtoint (i32* @test2 to i32), i32 ptrtoint (i32* @bar to i32)) to i32) An important implication of this patch is that the notion of aliased global doesn't exist any more. The alias has to encode the information needed to access it in its metadata (linkage, visibility, type, etc). Another consequence to notice is that getSection has to return a "const char ". It could return a NullTerminatedStringRef if there was such a thing, but when that was proposed the decision was to just uses "const char*" for that. llvm-svn: 210062	2014-06-03 02:41:57 +00:00
Andrea Di Biagio	4760813831	[X86] Fix checked arithmetic for i8 on X86. When lowering a ISD::BRCOND into a test+branch, make sure that we always use the correct condition code to emit the test operation. This fixes PR19858: "i8 checked mul is wrong on x86". Patch by Keno Fisher! llvm-svn: 210032	2014-06-02 16:00:27 +00:00
Filipe Cabecinhas	83f4192a47	Make blend tests more specific Following the lead set by r209324, I'm making these tests match the whole instruction, so we can be sure we're lowering them correctly. llvm-svn: 209947	2014-05-31 00:52:23 +00:00
Andrea Di Biagio	446a527905	[X86] Add two combine rules to simplify dag nodes introduced during type legalization when promoting nodes with illegal vector type. This patch teaches the backend how to simplify/canonicalize dag node sequences normally introduced by the backend when promoting certain dag nodes with illegal vector type. This patch adds two new combine rules: 1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) -> (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>) 2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) -> (shuffle (BINOP A, B), Undef, <Mask>). Both rules are only triggered on the type-legalized DAG. In particular, rule 1. is a target specific combine rule that attempts to sink a bitconvert into the operands of a binary operation. Rule 2. is a target independet rule that attempts to move a shuffle immediately after a binary operation. llvm-svn: 209930	2014-05-30 23:17:53 +00:00

1 2 3 4 5 ...

4755 Commits