llvm-project

Commit Graph

Author	SHA1	Message	Date
Dmitry Preobrazhensky	04bd1185ad	[AMDGPU][MC] Corrected checks for DS offset0 range See bug 40889: https://bugs.llvm.org/show_bug.cgi?id=40889 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59313 llvm-svn: 356576	2019-03-20 17:13:58 +00:00
Dmitry Preobrazhensky	137976fae2	[AMDGPU][MC][GFX9] Added support of operands shared_base, shared_limit, private_base, private_limit, pops_exiting_wave_id See bug 39297: https://bugs.llvm.org/show_bug.cgi?id=39297 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D59290 llvm-svn: 356561	2019-03-20 15:40:52 +00:00
Simon Pilgrim	2acca37a2d	[X86] Use getConstantOperandAPInt to detect out-of-range shifts. llvm-svn: 356549	2019-03-20 11:41:52 +00:00
Andrea Di Biagio	624f5deff4	[X86] Remove X86 specific dag nodes for RDTSC/RDTSCP/RDPMC. NFCI This patch removes the following dag node opcodes from namespace X86ISD: RDTSC_DAG, RDTSCP_DAG, RDPMC_DAG The logic that expands RDTSC/RDPMC/XGETBV intrinsics is basically the same. The only differences are: RDTSC/RDTSCP don't implicitly read ECX. RDTSCP also implicitly writes ECX. I moved the common expansion logic into a helper function with the goal to get rid of code repetition. That helper is now used for the expansion of RDTSC/RDTSCP/RDPMC/XGETBV intrinsics. No functional change intended. Differential Revision: https://reviews.llvm.org/D59547 llvm-svn: 356546	2019-03-20 11:21:15 +00:00
David Stuttard	fc2a747345	[AMDGPU] Allow MIMG with no uses in adjustWritemask in isel Summary: If an MIMG instruction has managed to get through to adjustWritemask in isel but has no uses (and doesn't enable TFC) then prevent an assertion by not attempting to adjust the writemask. The instruction will be removed anyway. Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58964 llvm-svn: 356540	2019-03-20 09:29:55 +00:00
Craig Topper	97d104cbee	[X86] Re-disable cmpxchg16b for 32-bit mode assembly parsing. This was broken recently when I factored the 64 bit mode check into hasCmpxchg16 without thinking about the AssemblerPredicate. llvm-svn: 356531	2019-03-19 23:57:16 +00:00
Eli Friedman	2596e8b3e7	[ARM] Make sure to save/restore LR when we use tBfar. This change does two things. One, it ensures compilation will abort instead of miscompiling if ARMFrameLowering::determineCalleeSaves chooses not to save LR in a case where it's necessary. Two, it changes the way we estimate the size of a function to be more conservative in the presence of constant pool entries and jump tables. EstimateFunctionSizeInBytes probably still isn't really conservative enough, but I'm not sure how we can come up with a reliable estimate before constant islands runs. Differential Revision: https://reviews.llvm.org/D59439 llvm-svn: 356527	2019-03-19 21:48:08 +00:00
Amara Emerson	761ca2e53b	[AArch64][GlobalISel] Add an optimization to select vector DUP instructions. This adds pattern matching for the insert+shufflevector sequence so we can generate dup instructions instead of the current TBL sequence. Differential Revision: https://reviews.llvm.org/D59558 llvm-svn: 356526	2019-03-19 21:43:05 +00:00
Amara Emerson	18e2c5724a	[AArch64][GlobalISel] Make v4s32 G_IMPLICIT_DEF legal. llvm-svn: 356525	2019-03-19 21:43:02 +00:00
Matt Arsenault	cf55a657f0	CodeGen: Refactor regallocator command line and target selection This will allow targets more flexibility to replace the register allocator core passes. In a future commit, AMDGPU will run the core register assignment passes twice, and will also want to disallow using the standard -regalloc option. llvm-svn: 356506	2019-03-19 19:33:12 +00:00
Simon Pilgrim	77482120da	Fix for ABS legalization on PPC buildbot. llvm-svn: 356498	2019-03-19 18:55:46 +00:00
Simon Pilgrim	e744f513c4	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - handle repeated shift amounts If a value with multiple uses is only ever used for SSE shift amounts then we know that only the bottom 64-bits are needed. llvm-svn: 356483	2019-03-19 17:23:25 +00:00
Simon Atanasyan	db4601e60a	[MIPS][microMIPS] Enable dynamic stack realignment Dynamic stack realignment was disabled on micromips by checking if target has standard encoding. We simply change the condition to skip Mips16 only. Patch by Mirko Brkusanin. Differential Revision: http://reviews.llvm.org/D59499 llvm-svn: 356478	2019-03-19 17:01:24 +00:00
Jordan Rupprecht	f74d45a775	[NFC] Fix unused variable in release builds This was introduced in rL356468. llvm-svn: 356477	2019-03-19 16:52:40 +00:00
Simon Pilgrim	7a8e5051f4	Fix unused variable warning. NFCI. llvm-svn: 356474	2019-03-19 16:49:59 +00:00
Simon Pilgrim	a56f2822d0	[SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect These changes are related to PR37743 and include: SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node. Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner. Add promoting the integer ABS node in the LegalizeIntegerType. Expand-based legalization of integer result for the ABS nodes. Expand-based legalization of ABS vector operations. Add some integer abs testcases for different typesizes for Thumb arch Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to: tmp = (SRA, Hi, 31) Lo = (UADDO tmp, Lo) Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1)) Lo = (XOR tmp, Lo) The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern: (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))). Change integer abs testcases for codegen with the ABS node support for AArch64. Indicate that the ABS is legal for the i64 type when the NEON is supported. Change the integer abs testcases to show changing of codegen. Add combine and legalization of ABS nodes for Thumb arch. Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition. For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743 Patch by: @ikulagin (Ivan Kulagin) Differential Revision: https://reviews.llvm.org/D49837 llvm-svn: 356468	2019-03-19 16:24:55 +00:00
Ryan Taylor	00e063ab92	[AMDGPU] Add buffer/load 8/16 bit overloaded intrinsics Summary: Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59265 llvm-svn: 356465	2019-03-19 16:07:00 +00:00
Neil Henning	e85f6bd64f	[AMDGPU] Ban i8 min3 promotion. I found this really weird WWM-related case whereby through the WWM transformations our isel lowering was trying to promote 2 min's into a min3 for the i8 type, which our hardware doesn't support. The new min3_i8.ll test case would previously spew the error: PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68 Before the simple fix to our isel lowering to not do it for i8 MVT's. Differential Revision: https://reviews.llvm.org/D59543 llvm-svn: 356464	2019-03-19 15:50:24 +00:00
Simon Atanasyan	af40d4371d	[mips] Fix crash on recursive using of .set Switch to the `MCParserUtils::parseAssignmentExpression` for parsing assignment expressions in the `.set` directive reduces code and allows to print an error message instead of crashing in case of incorrect recursive using of the `.set`. Fix for the bug https://bugs.llvm.org/show_bug.cgi?id=41053. Differential Revision: http://reviews.llvm.org/D59452 llvm-svn: 356461	2019-03-19 15:15:35 +00:00
Heejin Ahn	c60bc94afc	[WebAssembly] Small improvements in FixIrreducibleControlFlow (NFC) Summary: - Make some class member methods const - Delete unnecessary includes - Use a simpler form of `BuildMI` Reviewers: kripken Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59454 llvm-svn: 356440	2019-03-19 05:26:33 +00:00
Heejin Ahn	34dc1f2483	[WebAssembly] Rename methods according to instruction name changes (NFC) Reviewers: tlively, sbc100 Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59469 llvm-svn: 356438	2019-03-19 05:07:33 +00:00
Thomas Lively	0200d62ec7	[WebAssembly] Lower SIMD nnan setcc nodes Summary: Adds patterns to lower all the remaining setcc modes: lt, gt, le, and ge. Fixes PR40912. Reviewers: aheejin, sbc100, dschuff Reviewed By: dschuff Subscribers: jgravelle-google, hiraditya, sunfish, jdoerfert, llvm-commits, srj Tags: #llvm Differential Revision: https://reviews.llvm.org/D59519 llvm-svn: 356431	2019-03-19 00:55:34 +00:00
Craig Topper	b24bdf626a	[X86] Disable CQTO and CLTQ instructions in the assembly parser outside 64-bit mode. llvm-svn: 356419	2019-03-18 22:06:14 +00:00
Craig Topper	e732bc6bea	[X86] Allow any 8-bit immediate to be used with BT/BTC/BTR/BTS not just sign extended 8-bit immediates. We need to allow [128,255] in addition to [-128, 127] to match gas. llvm-svn: 356413	2019-03-18 21:33:59 +00:00
Sam Clegg	b7708ec87f	[WebAssembly] Don't override default implementation of isOffsetFoldingLegal. NFC. The default implementation does we want and is going to more compatible with dynamic linking (-fPIC) support that is planned. This is NFC because currently we only build wasm with `-relocation-model=static` which in turn means that the default `isOffsetFoldingLegal` always returns true today. Differential Revision: https://reviews.llvm.org/D54661 llvm-svn: 356410	2019-03-18 21:21:12 +00:00
Craig Topper	f086e562f9	[X86] Use relocImm in the ROL8ri/ROL16ri/ROL32ri/ROL64ri patterns to be consistent with the ROR patterns. llvm-svn: 356407	2019-03-18 20:43:15 +00:00
Craig Topper	0b9c640fe0	[X86] Replace uses of i64immSExt32_su with i64relocImmSExt32_su. For the i8, i16, and i32 instructions we were using a relocImm. Presumably we should for i64 as well. llvm-svn: 356406	2019-03-18 20:43:09 +00:00
Michael Liao	efb4f9e568	[AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59501 llvm-svn: 356405	2019-03-18 20:40:09 +00:00
Tim Renouf	cfdfba996b	[AMDGPU] Asm/disasm clamp modifier on vop3 int arithmetic Allow the clamp modifier on vop3 int arithmetic instructions in assembly and disassembly. This involved adding a clamp operand to the affected instructions in MIR and MC, and thus having to fix up several places in codegen and MIR tests. Differential Revision: https://reviews.llvm.org/D59267 Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e llvm-svn: 356399	2019-03-18 19:35:44 +00:00
Tim Renouf	2e94f6e584	[AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled. This does appear to be allowed, even though they are floating point modifiers and the operand type is b32. To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests. Differential Revision: https://reviews.llvm.org/D59191 Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398	2019-03-18 19:25:39 +00:00
Amara Emerson	8627178d46	Revert r356304: remove subreg parameter from MachineIRBuilder::buildCopy() After review comments, it was preferred to not teach MachineIRBuilder about non-generic instructions beyond using buildInstr(). For AArch64 I've changed the buildCopy() calls to buildInstr() + a separate addReg() call. This also relaxes the MachineIRBuilder's COPY checking more because it may not always have a SrcOp given to it. llvm-svn: 356396	2019-03-18 19:20:10 +00:00
Tim Renouf	8723a56551	[MsgPack][AMDGPU] Fix unflushed raw_string_ostream bugs on windows expensive checks bot This fixes a couple of unflushed raw_string_ostream bugs in recent commits that only show up on a bot building on windows with expensive checks. Differential Revision: https://reviews.llvm.org/D59396 Change-Id: I9c6208325503b3ee0786b4b688e13fc24a15babf llvm-svn: 356394	2019-03-18 19:00:46 +00:00
Craig Topper	f07062a798	[X86] Rename imm8_su/imm16_su/imm32_su to relocImm8_su/relocImm16_su/relocImm32_su/ to accurately reflect what they are. llvm-svn: 356393	2019-03-18 18:54:06 +00:00
Adhemerval Zanella	270249de2b	[AArch64] Small fix for getIntImmCost It uses the generic AArch64_IMM::expandMOVImm to get the correct number of instruction used in immediate materialization. Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D58461 llvm-svn: 356391	2019-03-18 18:50:58 +00:00
Adhemerval Zanella	a3cefa5d64	[AArch64] Optimize floating point materialization This patch follows some ideas from r352866 to optimize the floating point materialization even further. It changes isFPImmLegal to considere up to 2 mov instruction or up to 5 in case subtarget has fused literals. The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but the mov+fmov sequence is always better because of the reduced d-cache pressure. The timings are still the same if you consider movw+movk+fmov vs. adrp+ldr will be fused (although one instruction longer). Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D58460 llvm-svn: 356390	2019-03-18 18:45:57 +00:00
Adhemerval Zanella	664c1ef528	[TargetLowering] Add code size information on isFPImmLegal. NFC This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389	2019-03-18 18:40:07 +00:00
Adhemerval Zanella	8a595b1d2e	[AArch64] Refactor floating point materialization. NFC It splits the login of actual instruction emission away from the logic that figures out the appropriate sequence on AArch64ExpandPseudo::expandMOVImm. The new function AArch64_IMM::expandMOVImm, which return the list of the instructions to materialize the immediate constant, is implemented on a separated unit because it will be used in a subsequent patch to optimize floating point materialization. Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D58915 llvm-svn: 356387	2019-03-18 18:23:23 +00:00
Craig Topper	c2b35ebc1d	[X86] Remove the _alt forms of (V)CMP instructions. Use a combination of custom printing and custom parsing to achieve the same result and more Similar to previous change done for VPCOM and VPCMP Differential Revision: https://reviews.llvm.org/D59468 llvm-svn: 356384	2019-03-18 17:59:59 +00:00
Neil Henning	523dab0788	[AMDGPU] Add an experimental buffer fat pointer address space. Add an experimental buffer fat pointer address space that is currently unhandled in the backend. This commit reserves address space 7 as a non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer descriptor + 32-bit offset) that is heavily used in graphics workloads using the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D58957 llvm-svn: 356373	2019-03-18 14:44:28 +00:00
Christof Douma	8cfd91dcc7	[AArch64] Fix bug 35094 atomicrmw on Armv8.1-A+lse Fixes https://bugs.llvm.org/show_bug.cgi?id=35094 The Dead register definition pass should leave alone the atomicrmw instructions on AArch64 (LTE extension). The reason is the following statement in the Arm ARM: "The ST<OP> instructions, and LD<OP> instructions where the destination register is WZR or XZR, are not regarded as doing a read for the purpose of a DMB LD barrier." A good example was given in the gcc thread by Will Deacon (linked in the bugzilla ticket 35094): P0 (atomic_int* y,atomic_int* x) { atomic_store_explicit(x,1,memory_order_relaxed); atomic_thread_fence(memory_order_release); atomic_store_explicit(y,1,memory_order_relaxed); } P1 (atomic_int* y,atomic_int* x) { atomic_fetch_add_explicit(y,1,memory_order_relaxed); // STADD atomic_thread_fence(memory_order_acquire); int r0 = atomic_load_explicit(x,memory_order_relaxed); } P2 (atomic_int* y) { int r1 = atomic_load_explicit(y,memory_order_relaxed); } My understanding is that it is forbidden for r0 == 0 and r1 == 2 after this test has executed. However, if the relaxed add in P1 compiles to STADD and the subsequent acquire fence is compiled as DMB LD, then we don't have any ordering guarantees in P1 and the forbidden result could be observed. Change-Id: I419f9f9df947716932038e1100c18d10a96408d0 llvm-svn: 356360	2019-03-18 09:21:06 +00:00
Craig Topper	ba898da132	[X86] Hopefully fix a tautological compare warning in printVecCompareInstr. llvm-svn: 356359	2019-03-18 07:05:01 +00:00
Craig Topper	b4c49255aa	[X86] Make ADD*_DB post-RA pseudos and expand them in expandPostRAPseudo. These are used to help convert OR->LEA when needed to avoid avoid a copy. They aren't need after register allocation. Happens to remove an ugly goto from X86MCCodeEmitter.cpp llvm-svn: 356356	2019-03-18 05:48:18 +00:00
Craig Topper	860a27208e	[X86] Add tab character to the custom printing of VPCMP and VPCOM instructions. All the other instructions are printed with a preceeding tab. llvm-svn: 356355	2019-03-18 02:53:11 +00:00
Craig Topper	04cc28fe13	[X86] Merge printf32mem/printi32mem into a single printdwordmem. Do the same for all other printing functions. The only thing the print methods currently need to know is the string to print for the memory size in intel syntax. This patch merges the functions based on this string. If we ever need something else in the future, its easy to split them back out. This reduces the number of cases in the assembly printers. It shrinks the intel printer to only use 7 bytes per instruction instead of 8. llvm-svn: 356352	2019-03-17 22:57:21 +00:00
David Green	baa94ef03b	[ARM] Check that CPSR does not have other uses Fix up rL356335 by checking that CPSR is not read between the compare and the branch. llvm-svn: 356349	2019-03-17 21:36:15 +00:00
Matt Arsenault	e0c1f9e76d	AMDGPU: Partially fix default device for HSA There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. llvm-svn: 356347	2019-03-17 21:31:35 +00:00
Craig Topper	affead9ad0	[X86] Remove the _alt forms of AVX512 VPCMP instructions. Use a combination of custom printing and custom parsing to achieve the same result and more Similar to the previous patch for VPCOM. Differential Revision: https://reviews.llvm.org/D59398 llvm-svn: 356344	2019-03-17 21:21:40 +00:00
Craig Topper	12509d87f3	[X86] Remove the _alt forms of XOP VPCOM instructions. Use a combination of custom printing and custom parsing to achieve the same result and more Previously we had a regular form of the instruction used when the immediate was 0-7. And _alt form that allowed the full 8 bit immediate. Codegen would always use the 0-7 form since the immediate was always checked to be in range. Assembly parsing would use the 0-7 form when a mnemonic like vpcomtrueb was used. If the immediate was specified directly the _alt form was used. The disassembler would prefer to use the 0-7 form instruction when the immediate was in range and the _alt form otherwise. This way disassembly would print the most readable form when possible. The assembly parsing for things like vpcomtrueb relied on splitting the mnemonic into 3 pieces. A "vpcom" prefix, an immediate representing the "true", and a suffix of "b". The tablegenerated printing code would similarly print a "vpcom" prefix, decode the immediate into a string, and then print "b". The _alt form on the other hand parsed and printed like any other instruction with no specialness. With this patch we drop to one form and solve the disassembly printing issue by doing custom printing when the immediate is 0-7. The parsing code has been tweaked to turn "vpcomtrueb" into "vpcomb" and then the immediate for the "true" is inserted either before or after the other operands depending on at&t or intel syntax. I'd rather not do the custom printing, but I tried using an InstAlias for each possible mnemonic for all 8 immediates for all 16 combinations of element size, signedness, and memory/register. The code emitted into printAliasInstr ended up checking the number of operands, the register class of each operand, and the immediate for all 256 aliases. This was repeated for both the at&t and intel printer. Despite a lot of common checks between all of the aliases, when compiled with clang at least this commonality was not well optimized. Nor do all the checks seem necessary. Since I want to do a similar thing for vcmpps/pd/ss/sd which have 32 immediate values and 3 encoding flavors, 3 register sizes, etc. This didn't seem to scale well for clang binary size. So custom printing seemed a better trade off. I also considered just using the InstAlias for the matching and not the printing. But that seemed like it would add a lot of extra rows to the matcher table. Especially given that the 32 immediates for vpcmpps have 46 strings associated with them. Differential Revision: https://reviews.llvm.org/D59398 llvm-svn: 356343	2019-03-17 21:21:37 +00:00
Tim Renouf	e30aa6a136	[AMDGPU] Prepare for introduction of v3 and v5 MVTs AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Fixed assumptions of power-of-2 vector type in kernel arg handling, and added v5 kernel arg tests and v3/v5 shader arg tests. * Added v5 tests for cost analysis. * Added vec3/vec5 arg test cases. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58928 Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd llvm-svn: 356342	2019-03-17 21:04:16 +00:00
Tim Renouf	d1477e989c	[ARM] Fixed an assumption of power-of-2 vector MVT I am about to introduce some non-power-of-2 width vector MVTs. This commit fixes a power-of-2 assumption that my forthcoming change would otherwise break, as shown by test/CodeGen/ARM/vcvt_combine.ll and vdiv_combine.ll. Differential Revision: https://reviews.llvm.org/D58927 Change-Id: I56a282e365d3874ab0621e5bdef98a612f702317 llvm-svn: 356341	2019-03-17 20:48:54 +00:00

1 2 3 4 5 ...

51295 Commits