llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	fa82efb50a	[X86] Add VBLENDPS/VPBLENDD to the execution domain fixing tables. llvm-svn: 312449	2017-09-03 17:52:23 +00:00
Craig Topper	bb6506d251	[X86] Canonicalize (concat_vectors X, zero) -> (insert_subvector zero, X, 0). In a future patch, I plan to teach isel to use a small vector move with implicit zeroing of the upper elements when it sees the (insert_subvector zero, X, 0) pattern. llvm-svn: 312448	2017-09-03 17:52:19 +00:00
Ayman Musa	2927ea0b19	[X86] Add -mtriple option to LIT tests added in https://reviews.llvm.org/rL312442 llvm-svn: 312443	2017-09-03 15:06:26 +00:00
Ayman Musa	ef8f61bce6	[X86][AVX512] Add simple tests for all AVX512 shuffle instructions. Throughout an effort to strongly check the behavior of CodeGen with the IR shufflevector instruction we generated many tests while predicting the best X86 sequence that may be generated. This is a subset of the generated tests that we think may add value to our X86 set of tests. Some of the checks are not optimal and will be changed after fixing: 1. PR34394 2. PR34382 3. PR34380 4. PR34359 Differential Revision: https://reviews.llvm.org/D37329 llvm-svn: 312442	2017-09-03 13:53:44 +00:00
Ayman Musa	ac12849d32	[X86] Add RUN line for LIT test committed in "rL312438: [X86] Fix crash on assert of non-simple type after type-legalization.". llvm-svn: 312439	2017-09-03 10:44:18 +00:00
Ayman Musa	44cde94935	[X86] Fix crash on assert of non-simple type after type-legalization The function combineShuffleToVectorExtend in DAGCombine might generate an illegal typed node after "legalize types" phase, causing assertion on non-simple type to fail afterwards. Adding a type check in case the combine is running after the type legalize pass. Differential Revision: https://reviews.llvm.org/D37330 llvm-svn: 312438	2017-09-03 09:09:16 +00:00
Craig Topper	619b759a57	[X86] Teach fastisel to handle zext/sext i8->i16 and sext i1->i8/i16/i32/i64 Summary: ZExt and SExt from i8 to i16 aren't implemented in the autogenerated fast isel table because normal isel does a zext/sext to 32-bits and a subreg extract to avoid a partial register write or false dependency on the upper bits of the destination. This means without handling in fast isel we end up triggering a fast isel abort. We had no custom sign extend handling at all so while I was there I went ahead and implemented sext i1->i8/i16/i32/i64 which was also missing. This generates an i1->i8 sign extend using a mask with 1, then an 8-bit negate, then continues with a sext from i8. A better sequence would be a wider and/negate, but would require more custom code. Fast isel tests are a mess and I couldn't find a good home for the tests so I created a new one. The test pr34381.ll had to have fast-isel removed because it was relying on a fast isel abort to hit the bug. The test case still seems valid with fast-isel disabled though some of the instructions changed. Reviewers: spatel, zvi, igorb, guyblank, RKSimon Reviewed By: guyblank Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37320 llvm-svn: 312422	2017-09-02 18:53:46 +00:00
Sanjay Patel	f4425e9a66	[x86] eliminate redundant shuffle of horizontal math ops when both inputs are the same This is limited to a set of patterns based on the example in PR34111: https://bugs.llvm.org/show_bug.cgi?id=34111 ...but as I was investigating this, I see that horizontal patterns can go wrong in many, many other ways that would not be handled by this patch. Each data type may even go different in the DAG after starting with the same basic IR pattern, so even proper IR canonicalization won't fix it all. Differential Revision: https://reviews.llvm.org/D37357 llvm-svn: 312379	2017-09-01 21:09:04 +00:00
Craig Topper	2a75b6f26b	[X86] Add test case I forgot to commit with r312285. llvm-svn: 312335	2017-09-01 16:40:24 +00:00
Geoff Berry	65528f2991	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312328	2017-09-01 14:27:20 +00:00
Craig Topper	70e581cdd6	[X86] Add isel patterns for memory forms of FMA3 intrinsic instructions llvm-svn: 312309	2017-09-01 07:58:13 +00:00
Sanjay Patel	841acbbca0	[x86] add more tests for horizontal ops; NFC llvm-svn: 312279	2017-08-31 20:59:25 +00:00
Daniel Jasper	c0a976d417	Revert r311525: "[XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic references in .text" Breaks builds internally. Will forward repo instructions to author. llvm-svn: 312243	2017-08-31 15:17:17 +00:00
Yael Tsafrir	185c81725e	[X86] Added run line to intrinsics upgrade test. NFC. llvm-svn: 312241	2017-08-31 13:56:22 +00:00
Ashutosh Nema	bfcac0b480	AMD family 17h (znver1) scheduler model update. Summary: This patch enables the following: 1) Regex based Instruction itineraries for integer instructions. 2) The instructions are grouped as per the nature of the instructions (move, arithmetic, logic, Misc, Control Transfer). 3) FP instructions and their itineraries are added which includes values for SSE4A, BMI, BMI2 and SHA instructions. Patch by Ganesh Gopalasubramanian Reviewers: RKSimon, craig.topper Subscribers: vprasad, shivaram, ddibyend, andreadb, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D36617 llvm-svn: 312237	2017-08-31 12:38:35 +00:00
Hans Wennborg	24775a0a6c	Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!") > Issues identified by buildbots addressed since original review: > - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. > - The pass no longer forwards COPYs to physical register uses, since > doing so can break code that implicitly relies on the physical > register number of the use. > - The pass no longer forwards COPYs to undef uses, since doing so > can break the machine verifier by creating LiveRanges that don't > end on a use (since the undef operand is not considered a use). > > [MachineCopyPropagation] Extend pass to do COPY source forwarding > > This change extends MachineCopyPropagation to do COPY source forwarding. > > This change also extends the MachineCopyPropagation pass to be able to > be run during register allocation, after physical registers have been > assigned, but before the virtual registers have been re-written, which > allows it to remove virtual register COPY LiveIntervals that become dead > through the forwarding of all of their uses. llvm-svn: 312178	2017-08-30 22:11:37 +00:00
Geoff Berry	feffb0c8af	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues identified by buildbots addressed since original review: - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312154	2017-08-30 18:41:07 +00:00
Adrian Prantl	05782218ab	Canonicalize the representation of empty an expression in DIGlobalVariableExpression This change simplifies code that has to deal with DIGlobalVariableExpression and mirrors how we treat DIExpressions in debug info intrinsics. Before this change there were two ways of representing empty expressions on globals, a nullptr and an empty !DIExpression(). If someone needs to upgrade out-of-tree testcases: perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll> will catch 95%. llvm-svn: 312144	2017-08-30 18:06:51 +00:00
Craig Topper	afce0baacd	[AVX512] Don't use 32-bit elements version of AND/OR/XOR/ANDN during isel unless we're matching a masked op or broadcast Selecting 32-bit element logical ops without a select or broadcast requires matching a bitconvert on the inputs to the and. But that's a weird thing to rely on. It's entirely possible that one of the inputs doesn't have a bitcast and one does. Since there's no functional difference, just remove the extra patterns and save some isel table size. Differential Revision: https://reviews.llvm.org/D36854 llvm-svn: 312138	2017-08-30 16:38:33 +00:00
Igor Breger	36d447d8a8	[GlobalISel][X86] Support variadic function call. Summary: Support variadic function call. Port the implementation from X86FastISel. Reviewers: zvi, guyblank, oren_ben_simhon Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D37261 llvm-svn: 312130	2017-08-30 15:10:15 +00:00
Balaram Makam	42adadfca0	Re-land MachineInstr: Reason locally about some memory objects before going to AA. Summary: Reverts r311008 to reinstate r310825 with a fix. Refine alias checking for pseudo vs value to be conservative. This fixes the original failure in builtbot unittest SingleSource/UnitTests/2003-07-09-SignedArgs. Reviewers: hfinkel, nemanjai, efriedma Reviewed By: efriedma Subscribers: bjope, mcrosier, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D36900 llvm-svn: 312126	2017-08-30 14:57:12 +00:00
Gadi Haber	767d98bad8	[X86][Skylake] Fixing duplicated prefixes in the run command of Code Gen regression tests NFC. Replaced duplicated HASWELL prefixes in run commands in the X86 Code Gen regression tests by the SKYLAKE prefix when the -mcpu is set to skylake. The fix is needed in preparation of an upcoming patch containing the Skylake scheduling info. Reviewers: zvi, RKSimon, aymanmus, igorb Differential Revision: https://reviews.llvm.org/D37258 llvm-svn: 312103	2017-08-30 08:08:50 +00:00
Craig Topper	17854ecf24	[AVX512] Correct isel patterns to support selecting masked vbroadcastf32x2/vbroadcasti32x2 Summary: This patch adjusts the patterns to make the result type of the broadcast node vXf64/vXi64. Then adds a bitcast to vXi32 after that. Intrinsic lowering was also adjusted to generate this new pattern. Fixes PR34357 We should probably just drop the intrinsic entirely and use native IR, but I'll leave that for a future patch. Any idea what instruction we should be lowering the floating point 128-bit result version of this pattern to? There's a 128-bit v2i32 integer broadcast but not an fp one. Reviewers: aymanmus, zvi, igorb Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37286 llvm-svn: 312101	2017-08-30 07:48:39 +00:00
Craig Topper	48a7917079	[AVX512] Use 256-bit extract instructions for extracting bits [255:128] from a 512-bit register This enables the use of a smaller encoding by using a VEX instruction when possible. Differential Revision: https://reviews.llvm.org/D37092 llvm-svn: 312100	2017-08-30 07:26:12 +00:00
Craig Topper	ef1f71669e	[X86] Apply SlowIncDec feature to Sandybridge/Ivybridge CPUs as well Currently we start applying this on Haswell and newer. I don't believe anything changed in the Haswell architecture to make this the right cutoff point. The partial flag handling around this has been roughly the same since Sandybridge. Differential Revision: https://reviews.llvm.org/D37250 llvm-svn: 312099	2017-08-30 05:00:35 +00:00
Craig Topper	641e2af9e8	[X86] Provide a separate feature bit for macro fusion support instead of basing it on the AVX flag Summary: Currently we determine if macro fusion is supported based on the AVX flag as a proxy for the processor being Sandy Bridge". This is really strange as now AMD supports AVX. It also means if user explicitly disables AVX we disable macro fusion. This patch adds an explicit macro fusion feature. I've also enabled for the generic 64-bit CPU (which doesn't have AVX) This is probably another candidate for being in the MI layer, but for now I at least wanted to correct the overloading of the AVX feature. Reviewers: spatel, chandlerc, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37280 llvm-svn: 312097	2017-08-30 04:34:48 +00:00
Reid Kleckner	a058736c9c	[dwarfdump] Pretty print location expressions and location lists Summary: Based on Fred's patch here: https://reviews.llvm.org/D6771 I can't seem to commandeer the old review, so I'm creating a new one. With that change the locations exrpessions are pretty printed inline in the DIE tree. The output looks like this for debug_loc entries: DW_AT_location [DW_FORM_data4] (0x00000000 0x0000000000000001 - 0x000000000000000b: DW_OP_consts +3 0x000000000000000b - 0x0000000000000012: DW_OP_consts +7 0x0000000000000012 - 0x000000000000001b: DW_OP_reg0 RAX, DW_OP_piece 0x4 0x000000000000001b - 0x0000000000000024: DW_OP_breg5 RDI+0) And like this for debug_loc.dwo entries: DW_AT_location [DW_FORM_sec_offset] (0x00000000 Addr idx 2 (w/ length 190): DW_OP_consts +0, DW_OP_stack_value Addr idx 3 (w/ length 23): DW_OP_reg0 RAX, DW_OP_piece 0x4) Simple locations without ranges are printed inline: DW_AT_location [DW_FORM_block1] (DW_OP_reg4 RSI, DW_OP_piece 0x4, DW_OP_bit_piece 0x20 0x0) The debug_loc(.dwo) dumping in changed accordingly to factor the code. Reviewers: dblaikie, aprantl, friss Subscribers: mgorny, javed.absar, hiraditya, llvm-commits, JDevlieghere Differential Revision: https://reviews.llvm.org/D37123 llvm-svn: 312042	2017-08-29 21:41:21 +00:00
Guy Blank	9203afcf0d	[X86] Add a test cases to demonstrate selecting GPR instructions when using mask based ones are more appropriate. llvm-svn: 311996	2017-08-29 11:58:03 +00:00
Jatin Bhateja	1f41a505d6	[X86] Adding a test to demonstrate aggressive folding for LEA facotrization. Differential Revision: https://reviews.llvm.org/D37257 llvm-svn: 311994	2017-08-29 10:49:33 +00:00
Craig Topper	62c47a2aa5	Mark Knights Landing as having slow two memory operand instructions Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly. Patch by David Zarzycki. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37224 llvm-svn: 311979	2017-08-29 05:14:27 +00:00
Craig Topper	029a21dfdc	[DAGCombiner] Teach visitEXTRACT_SUBVECTOR to turn extracts of BUILD_VECTOR into smaller BUILD_VECTORs Only do this before operations are legalized of BUILD_VECTOR is Legal for the target. Differential Revision: https://reviews.llvm.org/D37186 llvm-svn: 311892	2017-08-28 15:28:33 +00:00
Gadi Haber	d76f7b824e	[X86][Haswell] Updating HSW instruction scheduling information This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target. We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling. The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792. Information includes latency, number of micro-Ops and used ports by each HSW instruction. Please expect some performance fluctuations due to code alignment effects. Reviewers: RKSimon, zvi, aymanmus, craig.topper, m_zuckerman, igorb, dim, chandlerc, aaboud Differential Revision: https://reviews.llvm.org/D36663 llvm-svn: 311879	2017-08-28 10:04:16 +00:00
Craig Topper	80075a5fb7	[AVX512] Add more patterns for using masked moves for subvector extracts of the lowest subvector. This time with bitcasts between the vselect and the extract. llvm-svn: 311856	2017-08-27 19:03:36 +00:00
Sanjay Patel	a7a61d9768	[DAGCombiner] allow undef shuffle operands when eliminating bitcasts (PR34111) As noted in the FIXME, this could be improved more, but this is the smallest fix that helps: https://bugs.llvm.org/show_bug.cgi?id=34111 llvm-svn: 311853	2017-08-27 17:29:30 +00:00
Sanjay Patel	4e4ba615b2	[x86] add haddps test for PR34111; NFC llvm-svn: 311852	2017-08-27 17:15:49 +00:00
Jatin Bhateja	23eaf52d7d	[X86] Adding more tests for horizontal [F]HADD/[F]SUB for AVX512 vectors types llvm-svn: 311847	2017-08-27 12:43:25 +00:00
Craig Topper	36bd247f64	[X86] Add a target-specific DAG combine to combine extract_subvector from all zero/one build_vectors. llvm-svn: 311841	2017-08-27 05:39:57 +00:00
Craig Topper	a088362e88	[AVX512] Add patterns to match masked extract_subvector with bitcasts between the vselect and the extract_subvector. Remove the late DAG combine. We used to do a late DAG combine to move the bitcasts out of the way, but I'm starting to think that it's better to canonicalize extract_subvector's type to match the type of its input. I've seen some cases where we've formed two different extract_subvector from the same node where one had a bitcast and the other didn't. Add some more test cases to ensure we've also got most of the zero masking covered too. llvm-svn: 311837	2017-08-26 22:24:57 +00:00
Jatin Bhateja	c2f41b9f0b	[X86] Adding a test for horizontal [f]add/[f]sub for avx512 vector type 16x32. Differential Revision: https://reviews.llvm.org/D37183 llvm-svn: 311834	2017-08-26 19:02:49 +00:00
Jatin Bhateja	e4ca95d6aa	[DAGCombiner] Extending pattern detection for vector shuffle. Summary: If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index. This will also fix PR 33784 Reviewers: zvi, delena, RKSimon, thakis Reviewed By: RKSimon Subscribers: chandlerc, eladcohen, llvm-commits Differential Revision: https://reviews.llvm.org/D35788 llvm-svn: 311833	2017-08-26 19:02:36 +00:00
Jatin Bhateja	b60cfbefac	Revert rL311247 : To rectify commit message. Summary: This reverts commit rL311247. Differential Revision: https://reviews.llvm.org/D36927 llvm-svn: 311832	2017-08-26 19:02:17 +00:00
Craig Topper	d27386a9ed	[AVX512] Add patterns to use masked moves to implement masked extract_subvector of the lowest subvector. This only supports 32 and 64 bit element sizes for now. But we could probably do 16 and 8-bit elements with BWI. llvm-svn: 311821	2017-08-25 23:34:59 +00:00
Craig Topper	b89dbf0220	[AVX512] Add additional test cases for masked extract subvector. This includes tests for extracting 128-bits from a 256-bit vector and zero masking. llvm-svn: 311820	2017-08-25 23:34:57 +00:00
Craig Topper	e81de105a5	[X86] Add patterns to show more failures to use TBM instructions when we're trying to check flags. We can probably add patterns to fix some of them. But the ones that use 'and' as their root node emit a X86ISD::CMP node in front of the 'and' and then pattern matching that to 'test' instruction. We can't use a tablegen pattern to fix that because we can't remap the cmp result to the flag output of a TBM instruction. llvm-svn: 311819	2017-08-25 23:34:55 +00:00
Chandler Carruth	4b611a896d	[x86] Teach the backend to fold more read-modify-write memory operands to instructions. These can't be reasonably matched in tablegen due to the handling of flags, so we have to do this in C++ code. We only did it for `inc` and `dec` historically, this starts fleshing that out to more interesting instructions. Notably, this handles transfering operands to `add` and `sub`. Currently this forces them into a register. The next patch will add support for keeping immediate operands as immediates. Then I'll extend this beyond just `add` and `sub`. I'm not super thrilled by the repeated switches in the code but everything else I tried was really ugly or problematic. Many thanks to Craig Topper for the suggestions about where to even begin here and how to make this stuff work. Differential Revision: https://reviews.llvm.org/D37130 llvm-svn: 311806	2017-08-25 22:50:52 +00:00
Sanjay Patel	50a446ef10	[x86] regenerate checks; NFC llvm-svn: 311793	2017-08-25 19:25:03 +00:00
Chandler Carruth	46259260c7	[x86] NFC - normalize test case formatting of IR and generate CHECK lines with the script rather than using manually written checks. llvm-svn: 311753	2017-08-25 02:32:51 +00:00
Craig Topper	355d8cff49	[X86] Add TBM instructions to X86InstrInfo::isDefConvertible. This allows us to remove "test" instructions and use the flags from the TBM instructions directly. llvm-svn: 311747	2017-08-25 01:59:06 +00:00
Chandler Carruth	5b491808f5	[x86] Back out one aspect of r311318: don't generically set FeatureSlowUAMem32. The idea was to mark things that are slow on widely available processors as slow in the generic CPU so that the code generated for that CPU would be fast across those processors. However, for this feature that doesn't work out very well at all. The problem here is that you can very easily enable AVX or AVX2 on top of this generic CPU. For example, this can happen just by using AVX2 intrinsics from Clang within a region of code guarded by a dynamic CPU feature test. When you do that, the generated code with SlowUAMem32 set is ... amazingly slower. The problem is that there really aren't very good alternatives to the unaligned loads, and so our vector codegen regresses significantly. The other issue is that there are plenty of AMD CPUs with AVX1 that don't set FeatureSlowUAMem32 and so we shouldn't just check for AVX2 instead of this special feature. =/ It would be nice to have the target attriute logic be able to enable/disable more than just one feature at a time and control this in a more fine grained and useful way, but that doesn't seem easy. Given that it is only Sandybridge and Ivybridge that set this feature, for now I'm just backing it out of the generic CPU. That has the additional advantage of going back to the previous state that people seemed vaguely happy with. llvm-svn: 311740	2017-08-25 00:56:05 +00:00
Chandler Carruth	8ac488b161	[x86] Fix an amazing goof in the handling of sub, or, and xor lowering. The comment for this code indicated that it should work similar to our handling of add lowering above: if we see uses of an instruction other than flag usage and store usage, it tries to avoid the specialized X86ISD::* nodes that are designed for flag+op modeling and emits an explicit test. Problem is, only the add case actually did this. In all the other cases, the logic was incomplete and inverted. Any time the value was used by a store, we bailed on the specialized X86ISD node. All of this appears to have been historical where we had different logic here. =/ Turns out, we have quite a few patterns designed around these nodes. We should actually form them. I fixed the code to match what we do for add, and it has quite a positive effect just within some of our test cases. The only thing close to a regression I see is using: notl %r testl %r, %r instead of: xorl -1, %r But we can add a pattern or something to fold that back out. The improvements seem more than worth this. I've also worked with Craig to update the comments to no longer be actively contradicted by the code. =[ Some of this still remains a mystery to both Craig and myself, but this seems like a large step in the direction of consistency and slightly more accurate comments. Many thanks to Craig for help figuring out this nasty stuff. Differential Revision: https://reviews.llvm.org/D37096 llvm-svn: 311737	2017-08-25 00:34:07 +00:00

1 2 3 4 5 ...

10127 Commits