llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Atanasyan	32d8d1bf04	[mips] Add a pattern for 64-bit GPR variant of the `rdhwr` instruction MIPS ISAs start to support third operand for the `rdhwr` instruction starting from Revision 6. But LLVM generates assembler code with three-operands version of this instruction on any MIPS64 ISA. The third operand is always zero, so in case of direct code generation we get correct code. This patch fixes the bug by adding an instruction alias. The same alias already exists for 32-bit ISA. Ideally, we also need to reject three-operands version of the `rdhwr` instruction in an assembler code if ISA revision is less than 6. That is a task for a separate patch. This fixes PR38861 (https://bugs.llvm.org/show_bug.cgi?id=38861) Differential revision: https://reviews.llvm.org/D51773 llvm-svn: 341919	2018-09-11 09:57:25 +00:00
Craig Topper	844f035e1e	[X86] In combineMOVMSK, look through int->fp bitcasts before callling SimplifyDemandedBits. MOVMSKPS and MOVMSKPD both take FP types, but likely the operations before it are on integer types with just a int->fp bitcast between them. If the bitcast isn't used by anything else and doesn't change the element width we can look through it to simplify the integer ops. llvm-svn: 341915	2018-09-11 08:20:02 +00:00
Craig Topper	85210311ba	[X86] Add test cases inspired by PR38840. These are test cases inspired by sequences like below for extracting the same bit from every vector element and checking for all zeros/ones. define i1 @and256_x8(<8 x i32>) { %a = trunc <8 x i32> %0 to <8 x i1> %b = bitcast <8 x i1> %a to i8 %d = icmp eq i8 %b, -1 ret i1 %d } This is what the above looks like after InstCombine. define i1 @and256_x8_opt(<8 x i32>) { %2 = and <8 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1> %a = icmp ne <8 x i32> %2, zeroinitializer %b = bitcast <8 x i1> %a to i8 %d = icmp eq i8 %b, -1 ret i1 %d } llvm-svn: 341908	2018-09-11 07:23:29 +00:00
Matt Arsenault	d0cf1b26d4	AMDGPU: Fix r600 test llvm-svn: 341898	2018-09-11 04:39:16 +00:00
Matt Arsenault	99c780159d	AMDGPU: Don't error on out of bounds address spaces We should never abort on valid IR. The most reasonable interpretation of an arbitrary address space pointer is probably some kind of special subset of global memory. llvm-svn: 341894	2018-09-11 04:00:41 +00:00
Craig Topper	07889079fa	[X89] Explicitly enable aes in aes-schedule.ll to fix failures after r341861. llvm-svn: 341868	2018-09-10 21:49:01 +00:00
Sanjay Patel	7feb3ed78c	[x86] test codegen for unsigned saturated add; NFC All of the ISA holes are going to make this difficult, but we can't canonicalize the IR and try to solve PR14613 until we have backend support to get this right. https://bugs.llvm.org/show_bug.cgi?id=14613 https://rise4fun.com/Alive/Guv https://rise4fun.com/Alive/AADG llvm-svn: 341845	2018-09-10 17:40:15 +00:00
Alexander Timofeev	20cbe6f319	[AMDGPU] Preliminary patch for divergence driven instruction selection. Inline immediate move to V_MADAK_F32. Differential revision: https://reviews.llvm.org/D51586 Reviewer: rampitec llvm-svn: 341843	2018-09-10 16:42:49 +00:00
Petar Jovanovic	ce4dd0ae38	[MIPS GlobalISel] Select icmp Select 32bit integer compare instructions for MIPS32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D51489 llvm-svn: 341840	2018-09-10 15:56:52 +00:00
Matt Arsenault	7f6dc597d3	AMDGPU: Stop reporting is-noop addrspacecast for constant 32-bit This will require something to cast. Before this would eliminate the cast, which would result in copies of $noreg. llvm-svn: 341803	2018-09-10 11:59:27 +00:00
Matt Arsenault	57b5966dad	DAG: Handle odd vector sizes in calling conv splitting This already worked if only one register piece was used, but didn't if a type was split into multiple, unequal sized pieces. Fixes not splitting 3i16/v3f16 into two registers for AMDGPU. This will also allow fixing the ABI for 16-bit vectors in a future commit so that it's the same for all subtargets. llvm-svn: 341801	2018-09-10 11:49:23 +00:00
Carl Ritson	f898edd117	[AMDGPU] Prevent sequences of non-instructions disrupting GCNHazardRecognizer wait state counting Summary: This fixes a bug where a large number of implicit def instructions can fill the GCNHazardRecognizer lookahead buffer causing required NOPs to not be inserted. Reviewers: nhaehnle, arsenm Reviewed By: arsenm Subscribers: sheredom, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51726 Change-Id: Ie75338f94de704ee5816b05afd0c922c6748a95b llvm-svn: 341798	2018-09-10 10:14:48 +00:00
Matt Arsenault	72d27f5525	AMDGPU: Fix tests using old number for constant address space llvm-svn: 341770	2018-09-10 02:54:25 +00:00
Matt Arsenault	d77fcc2a92	AMDGPU: Use GOT PSV since it has an address space now llvm-svn: 341768	2018-09-10 02:23:39 +00:00
Matt Arsenault	b998674610	AMDGPU: Don't abort on unknown addrspace argument llvm-svn: 341767	2018-09-10 02:23:30 +00:00
Craig Topper	3823516103	[X86] Custom type legalize (v2i32 (fp_to_uint v2f64))) without avx512vl by widening to v4i32 and v4f64 instead of v8i32 and v8f64. Make it aware of x86-experimental-vector-widening-legalization We have isel patterns for v4i32/v4f64 that artificially widen to v8i32/v8f64 so just use that. If x86-experimental-vector-widening-legalization is enabled, we don't need any custom legalization and can just return. I've modified the test RUN lines to cover this case. llvm-svn: 341765	2018-09-09 20:36:36 +00:00
Sanjay Patel	6ebf218e4c	[SelectionDAG] enhance vector demanded elements to look at a vector select condition operand This is the DAG equivalent of D51433. If we know we're not using all vector lanes, use that knowledge to potentially simplify a vselect condition. The reduction/horizontal tests show that we are eliminating AVX1 operations on the upper half of 256-bit vectors because we don't need those anyway. I'm not sure what the pr34592 test is showing. That's run with -O0; is SimplifyDemandedVectorElts supposed to be running there? Differential Revision: https://reviews.llvm.org/D51696 llvm-svn: 341762	2018-09-09 14:13:22 +00:00
Craig Topper	7af5e333e7	[X86] Create paddus/psubus from narrower vectors with i8/i16 element types. Summary: This patch allows vectors with a power of 2 number of elements and i8/i16 element type to select paddus/psubus instructions. ReplaceNodeResults has been updated to custom widen these operations up to 128 bits like we already do for PAVG. Another step towards fixing PR38691 Reviewers: RKSimon, spatel Reviewed By: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51818 llvm-svn: 341753	2018-09-08 19:32:58 +00:00
Craig Topper	a2c9694bc8	[X86] Mark the ADCX and ADOX instruction as commutable. llvm-svn: 341752	2018-09-08 18:47:56 +00:00
Craig Topper	4677110348	[X86] Add test cases for commuting ADCX/ADOX instruction to avoid copies. This is a MIR test so we can test ADOX which we have no isel patterns for. I also plan to remove ADCX isel patterns in the near future so this will help maintain coverage. llvm-svn: 341751	2018-09-08 18:47:54 +00:00
Craig Topper	c96305970d	[X86] Add commuted isel pattern for the load form of ADCX instructions. This prevents the legacy ADC instruction from being favored over ADCX when the load is in the operand 0. llvm-svn: 341745	2018-09-08 06:31:43 +00:00
Craig Topper	22a6f51646	[X86] Add load folding test cases for the addcarryx intrinsic. We are currently only able to fold a load in operand 1 to ADCX. A load in operand 0 will use the legacy ADC instruction. Ultimately I want to remove isel support for ADCX, but first I'm going to fix the shortcomings I know of so I can write proper MIR tests to maintain coverage later. llvm-svn: 341744	2018-09-08 06:31:41 +00:00
Craig Topper	761e88d1d4	[X86] Add stack folding MIR test for ADCX/ADOX. We currently have no way to isel ADOX and I plan to remove isel patterns for ADCX. This test will ensure we still have stack folding support for these instructions if we need them in the future. llvm-svn: 341743	2018-09-08 05:08:18 +00:00
Reid Kleckner	f803b23879	[COFF] Implement llvm.global_ctors priorities for MSVC COFF targets Summary: MSVC and LLD sort sections ASCII-betically, so we need to use section names that sort between .CRT$XCA (the start) and .CRT$XCU (the default priority). In the general case, use .CRT$XCT12345 as the section name, and let the linker sort the zero-padded digits. Users with low priorities typically want to initialize as early as possible, so use .CRT$XCA00199 for prioties less than 200. This number is arbitrary. Implements PR38552. Reviewers: majnemer, mstorsjo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D51820 llvm-svn: 341727	2018-09-07 23:07:55 +00:00
Thomas Lively	a0d25815a0	[WebAssembly] v8x16.shuffle Summary: Since the shuffle mask is not exposed as an operand in the native ISel DAG, create a new WebAssembly ISD node exposing the mask. The mask is lowered as sixteen immediate byte indices no matter what type the original vector shuffle was operating on. This CL depends on D51656 Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51659 llvm-svn: 341718	2018-09-07 21:54:46 +00:00
Craig Topper	fa535c027e	[X86] Add codegen tests for narrow PADDUS/PSUBUS patterns for PR38691. llvm-svn: 341711	2018-09-07 21:28:46 +00:00
Nick Desaulniers	287a3be379	[AArch64] Support reserving x1-7 registers. Summary: Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7. Reviewers: javed.absar, phosek, srhines, nickdesaulniers, efriedma Reviewed By: nickdesaulniers, efriedma Subscribers: niravd, jfb, manojgupta, nickdesaulniers, jyknight, efriedma, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D48580 llvm-svn: 341706	2018-09-07 20:58:57 +00:00
Craig Topper	5cbce81c91	[X86] Don't create ZERO_EXTEND_INREG/SIGN_EXTEND_INREG for v1iX vectors. The generic type legalizer will scalarize vXi1 instructions getting rid of the vector entirely. Creating wider vector instructions is just going to prevent that. llvm-svn: 341705	2018-09-07 20:56:03 +00:00
Craig Topper	39f48fdcbc	[X86] Don't create X86ISD::AVG nodes from v1iX vectors. The type legalizer will try to scalarize this and fail. It looks like there's some other v1iX oddities out there too since we still generated some vector instructions. llvm-svn: 341704	2018-09-07 20:56:01 +00:00
Craig Topper	4863313b35	[X86] Modify the the rdtscp intrinsic to return values instead of taking a pointer argument Similar to what was recently done for addcarry/subborrow and has been done for rdrand/rdseed for a while. It's better to use two results and an explicit store in IR when the store isn't part of the semantics of the instruction. This allows store->load forwarding to happen in the middle end. Or the store to be removed if its never loaded. Differential Revision: https://reviews.llvm.org/D51803 llvm-svn: 341698	2018-09-07 19:14:15 +00:00
Craig Topper	72964ae99e	[X86] Change the addcarry and subborrow intrinsics to return 2 results and remove the pointer argument. We should represent the store directly in IR instead. This gives the middle end a chance to remove it if it can see a load from the same address. Differential Revision: https://reviews.llvm.org/D51769 llvm-svn: 341677	2018-09-07 16:58:39 +00:00
Craig Topper	51e11788a4	[X86] Use regular expressions to make test immune to register allocation changes. llvm-svn: 341676	2018-09-07 16:58:36 +00:00
Craig Topper	313d09af51	[X86] Teach X86DAGToDAGISel::foldLoadStoreIntoMemOperand to handle loads in operand 1 of commutable operations. Previously we only handled loads in operand 0, but nothing guarantees the load will be operand 0 for commutable operations. Differential Revision: https://reviews.llvm.org/D51768 llvm-svn: 341675	2018-09-07 16:27:55 +00:00
Sid Manning	9ad0f02749	Add support for getRegisterByName. Support required to build the Hexagon Linux kernel. Differential Revision: https://reviews.llvm.org/D51363 llvm-svn: 341658	2018-09-07 13:36:21 +00:00
Simon Pilgrim	04d0748417	[X86][SSE] Add additional fadd/fsub(x, bitcast_fneg(y)) tests with different integer bitwidths llvm-svn: 341657	2018-09-07 13:27:07 +00:00
Simon Pilgrim	96d6b9c2e2	[DAGCombiner] foldBitcastedFPLogic - Add basic vector support Add support for bitcasts from float type to an integer type of the same element bitwidth. There maybe cases where we need to support different widths (e.g. as SSE __m128i is treated as v2i64) - but I haven't seen cases of this in the wild yet. llvm-svn: 341652	2018-09-07 12:13:45 +00:00
Simon Pilgrim	a2aef22a72	[X86][SSE] Add fadd/fsub(x, bitcast_fneg(y)) tests Show missing vector support llvm-svn: 341650	2018-09-07 11:24:43 +00:00
Tim Northover	bb7d7b3d33	ARM: fix Thumb2 CodeGen for ldrex with folded frame-index. Because t2LDREX (& t2STREX) were marked as AddrModeNone, but did allow a FrameIndex operand, rewriteT2FrameIndex asserted. This gives them a proper addressing-mode and tells the rewriter about it so that encodable offsets are exploited and others are rejected. Should fix PR38828. llvm-svn: 341642	2018-09-07 09:21:25 +00:00
Alexander Timofeev	a805c96c65	[AMDGPU] Preliminary patch for divergence driven instruction selection. Fold immediate SMRD offset. Differential revision: https://reviews.llvm.org/D51610 Reviewer: rampitec llvm-svn: 341636	2018-09-07 09:05:34 +00:00
QingShan Zhang	abbb894ff5	[PowerPC] Combine ADD to ADDZE On the ppc64le platform, if ir has the following form, define i64 @addze1(i64 %x, i64 %z) local_unnamed_addr #0 { entry: %cmp = icmp ne i64 %z, CONSTANT (-32767 <= CONSTANT <= 32768) %conv1 = zext i1 %cmp to i64 %add = add nsw i64 %conv1, %x ret i64 %add } we can optimize it to the form below. when C == 0 --> addze X, (addic Z, -1)) / add X, (zext(setne Z, C))-- \ when -32768 <= -C <= 32767 && C != 0 --> addze X, (addic (addi Z, -C), -1) Patch By: HLJ2009 (Li Jia He) Differential Revision: https://reviews.llvm.org/D51403 Reviewed By: Nemanjai llvm-svn: 341634	2018-09-07 07:56:05 +00:00
Craig Topper	30e129f256	[X86] Add more test cases for missed opportunities for using RMW form of ADC. llvm-svn: 341630	2018-09-07 02:39:56 +00:00
Craig Topper	2c9dede9cb	[X86] Add RMW ADC patterns with load in operand 1. ADC is commutable and the load could be in either operand, but we were only checking operand 0. Ideally we'd mark X86adc_flag as commutable and tablegen would automatically do this, but the EFLAGS register mention is preventing it. llvm-svn: 341606	2018-09-06 23:55:36 +00:00
Craig Topper	37d68e4599	[X86] Add a test case showing failure to use the RMW form of ADC when the load is in operand 1 going into isel. The ADC instruction is commutable, but we only have RMW isel patterns with a load on the left hand side. Nothing will canonicalize loads to the LHS on these ops. So we need two patterns. llvm-svn: 341605	2018-09-06 23:55:34 +00:00
Eric Christopher	fe83270ee9	The initial .text section generated in object files was missing the SHF_ARM_PURECODE flag when being built with the -mexecute-only flag. All code sections of an ELF must have the flag set for the final .text section to be execute-only, otherwise the flag gets removed. A HasData flag is added to MCSection to aid in the determination that the section is empty. A virtual setTargetSectionFlags is added to MCELFObjectTargetWriter to allow subclasses to set target specific section flags to be added to sections which we then use in the ARM backend to set SHF_ARM_PURECODE. Patch by Ivan Lozano! Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D48792 llvm-svn: 341593	2018-09-06 22:09:31 +00:00
Scott Linder	834cbc645c	Revert r341413 Causes a regression in expensive checks. llvm-svn: 341589	2018-09-06 21:38:56 +00:00
Sanjay Patel	9e5c163154	[x86] add tests for pow --> cbrt; NFC llvm-svn: 341575	2018-09-06 18:42:55 +00:00
Michael Berg	1b34b01a8e	[NFC] - in preparation for adding nsw, nuw and exact as flags to MI llvm-svn: 341565	2018-09-06 17:07:29 +00:00
JF Bastien	2920061105	ARM64: improve non-zero memset isel by ~2x Summary: I added a few ARM64 memset codegen tests in r341406 and r341493, and annotated where the generated code was bad. This patch fixes the majority of the issues by requesting that a 2xi64 vector be used for memset of 32 bytes and above. The patch leaves the former request for f128 unchanged, despite f128 materialization being suboptimal: doing otherwise runs into other asserts in isel and makes this patch too broad. This patch hides the issue that was present in bzero_40_stack and bzero_72_stack because the code now generates in a better order which doesn't have the store offset issue. I'm not aware of that issue appearing elsewhere at the moment. <rdar://problem/44157755> Reviewers: t.p.northover, MatzeB, javed.absar Subscribers: eraman, kristof.beyls, chrib, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51706 llvm-svn: 341558	2018-09-06 16:03:32 +00:00
Craig Topper	5a53760f65	[X86][Assembler] Allow %eip as a register in 32-bit mode for .cfi directives. This basically reverts a change made in r336217, but improves the text of the error message for not allowing IP-relative addressing in 32-bit mode. Fixes PR38826. Patch by Iain Sandoe. llvm-svn: 341512	2018-09-06 02:03:14 +00:00
JF Bastien	ec812ce3d6	NFC: more memset inline arm64 coverage I'm looking at some codegen optimization in this area and want to make sure I understand the current codegen and don't regress it. This patch further expands the tests (which I already expanded in r341406) to capture more of the current code generation when it comes to stack-based small non-zero memset on arm64. This patch annotates some potential fixes. llvm-svn: 341493	2018-09-05 20:35:06 +00:00

1 2 3 4 5 ...

25796 Commits