llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	577b9fc543	AMDGPU/GlobalISel: Legalize f64 fadd/fmul llvm-svn: 349014	2018-12-13 08:27:48 +00:00
Matt Arsenault	f38f483bef	AMDGPU/GlobalISel: RegBankSelect some simple operations llvm-svn: 349012	2018-12-13 08:23:51 +00:00
Matt Arsenault	7acf89a21a	AMDGPU/GlobalISel: Test cleanups Remove IR and registers sections llvm-svn: 349011	2018-12-13 08:11:45 +00:00
Stanislav Mekhanoshin	6071e1aa58	[AMDGPU] Simplify negated condition Optimize sequence: %sel = V_CNDMASK_B32_e64 0, 1, %cc %cmp = V_CMP_NE_U32 1, %1 $vcc = S_AND_B64 $exec, %cmp S_CBRANCH_VCC[N]Z => $vcc = S_ANDN2_B64 $exec, %cc S_CBRANCH_VCC[N]Z It is the negation pattern inserted by DAGCombiner::visitBRCOND() in the rebuildSetCC(). Differential Revision: https://reviews.llvm.org/D55402 llvm-svn: 349003	2018-12-13 03:17:40 +00:00
Craig Topper	d1c61861dd	[X86] Don't emit MULX by default with BMI2 MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer. Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it. Differential Revision: https://reviews.llvm.org/D55565 llvm-svn: 348975	2018-12-12 21:21:31 +00:00
Craig Topper	cd7d7ac0fd	[X86] Move stack folding test for MULX to a MIR test. Add a MULX32 case as well A future patch may stop using MULX by default so use MIR to ensure we're always testing MULX. Add the 32-bit case that we couldn't do in the 64-bit mode IR test due to it being promoted to a 64-bit mul. llvm-svn: 348972	2018-12-12 20:50:24 +00:00
Aakanksha Patil	729309cc89	[AMDGPU] Support for "uniform-work-group-size" attribute Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. Differential Revision: https://reviews.llvm.org/D50200 llvm-svn: 348971	2018-12-12 20:49:17 +00:00
Simon Pilgrim	4a641efdc1	[X86] Added missing constant pool checks. NFCI. So the extra checks in D55600 don't look like a regression. llvm-svn: 348966	2018-12-12 19:56:38 +00:00
Scott Linder	f5b36e56fb	[AMDGPU] Emit MessagePack HSA Metadata for v3 code object Continue to present HSA metadata as YAML in ASM and when output by tools (e.g. llvm-readobj), but encode it in Messagepack in the code object. Differential Revision: https://reviews.llvm.org/D48179 llvm-svn: 348963	2018-12-12 19:39:27 +00:00
Craig Topper	4937adf75f	[X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input. I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that. I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful. Differential Revision: https://reviews.llvm.org/D55414 llvm-svn: 348959	2018-12-12 19:20:21 +00:00
Simon Pilgrim	5864ab2dc0	[X86] Added missing constant pool checks. NFCI. So the extra checks in D55600 don't look like a regression. llvm-svn: 348956	2018-12-12 18:53:12 +00:00
Artem Belevich	f802b9324a	[NVPTX] do not rely on cached subtarget info. If a module has function references, but no functions themselves, we may end up never calling runOnMachineFunction and therefore would never initialize nvptxSubtarget field which would eventually cause a crash. Instead of relying on nvptxSubtarget being initialized by one of the methods, retrieve subtarget info directly. Differential Revision: https://reviews.llvm.org/D55580 llvm-svn: 348952	2018-12-12 18:31:04 +00:00
Sanjay Patel	44eaa492b8	[x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. That allows us to combine add ops with register moves and save some instructions. This is another step towards allowing add truncation in generic DAGCombiner (see D54640). Differential Revision: https://reviews.llvm.org/D55494 llvm-svn: 348946	2018-12-12 17:58:27 +00:00
Neil Henning	76504a4c5e	[AMDGPU] Extend the SI Load/Store optimizer to combine more things. I've extended the load/store optimizer to be able to produce dwordx3 loads and stores, This change allows many more load/stores to be combined, and results in much more optimal code for our hardware. Differential Revision: https://reviews.llvm.org/D54042 llvm-svn: 348937	2018-12-12 16:15:21 +00:00
Simon Pilgrim	f6c898e12f	[TargetLowering] Add ISD::AND handling to SimplifyDemandedVectorElts If either of the operand elements are zero then we know the result element is going to be zero (even if the other element is undef). Differential Revision: https://reviews.llvm.org/D55558 llvm-svn: 348926	2018-12-12 13:43:07 +00:00
Simon Pilgrim	125d9b0907	Regenerate knownbits test. NFCI. A future SimplifyDemandedBits patch will affect this code and I want to ensure the codegen diff is obvious. llvm-svn: 348925	2018-12-12 13:21:03 +00:00
Piotr Sobczak	3732b4ce25	[AMDGPU] Set metadata access for explicit section Summary: This patch provides a means to set Metadata section kind for a global variable, if its explicit section name is prefixed with ".AMDGPU.metadata." This could be useful to make the global variable go to an ELF section without any section flags set. Reviewers: dstuttard, tpr, kzhuravl, nhaehnle, t-tye Reviewed By: dstuttard, kzhuravl Subscribers: llvm-commits, arsenm, jvesely, wdng, yaxunl, t-tye Differential Revision: https://reviews.llvm.org/D55267 llvm-svn: 348922	2018-12-12 11:20:04 +00:00
Diana Picus	59720b422a	[ARM GlobalISel] Select load/store for Thumb2 Unfortunately we can't use TableGen for this because it doesn't yet support predicates on the source pattern root. Therefore, add a bit of handwritten code to the instruction selector to handle the most basic cases. Also mark them as legal and extract their legalizer test cases to a new test file. llvm-svn: 348920	2018-12-12 10:32:15 +00:00
Leonard Chan	118e53fd63	[Intrinsic] Signed Fixed Point Multiplication Intrinsic Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D54719 llvm-svn: 348912	2018-12-12 06:29:14 +00:00
Craig Topper	1fe466689b	[X86] Combine vpmovdw+vpacksswb into vpmovdb. This is similar to the combine we already have for vpmovdw+vpackuswb. llvm-svn: 348910	2018-12-12 05:56:01 +00:00
Craig Topper	5b69b5e20a	[X86] Add a few more fptosi test cases to demonstrate -x86-experimental-vector-widening legalization not combining vpacksswb+vpmovdw. We are able to combine vpackuswb+vpmovdw, but we didn't have packsswb+vpmovdw at the time that combine was added. llvm-svn: 348909	2018-12-12 05:55:59 +00:00
Craig Topper	b51283bfd7	Fix not correct imm operand assertion for SUB32ri in X86CondBrFolding::analyzeCompare Summary: When doing X86CondBrFolding::analyzeCompare, it will meet the SUB32ri instruction as below to use the global address for its operand, %733:gr32 = SUB32ri %62:gr32(tied-def 0), @img2buf_normal, implicit-def $eflags JNE_1 %bb.41, implicit $eflags so the assertion "assert(MI.getOperand(ValueIndex).isImm() && "Expecting Imm operand")" is not correct and change the assert to if make X86CondBrFolding::analyzeCompare return false as not finding the compare for this Patch by Jianping Chen Reviewers: smaslov, LuoYuanke, liutianle, Jianping Reviewed By: Jianping Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D54250 llvm-svn: 348853	2018-12-11 15:32:14 +00:00
Clement Courbet	8b6434bbb9	Revert r348843 "[CodeGen] Allow mempcy/memset to generate small overlapping stores." Breaks ARM/memcpy-inline.ll llvm-svn: 348844	2018-12-11 13:38:43 +00:00
Clement Courbet	93b3445770	[CodeGen] Allow mempcy/memset to generate small overlapping stores. Summary: All targets either just return false here or properly model `Fast`, so I don't think there is any reason to prevent CodeGen from doing the right thing here. Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D55365 llvm-svn: 348843	2018-12-11 13:15:56 +00:00
Simon Pilgrim	f6371f5f23	[TargetLowering] Add ISD::EXTRACT_VECTOR_ELT support to SimplifyDemandedBits Let SimplifyDemandedBits attempt to simplify all elements of a vector extraction. Part of PR39689. llvm-svn: 348839	2018-12-11 11:08:40 +00:00
Craig Topper	4bd93fa5bb	[X86] Switch the 64-bit mulx schedule test to use inline assembly. I'm not sure we should always prefer MULX over MUL. So making the MULX guaranteed with inline assembly. llvm-svn: 348833	2018-12-11 07:41:06 +00:00
Heejin Ahn	be5e5874f6	[WebAssembly] Add '.eventtype' directive support Summary: This patch supports `.eventtype` directive printing and parsing in the same syntax with `.functype`. Reviewers: aardappel, sbc100 Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55353 llvm-svn: 348818	2018-12-11 01:11:04 +00:00
Krzysztof Parzyszek	9f003f9262	[Hexagon] Couple of fixes in optimize addressing mode - Check if an operand is an immediate before calling getImm. Some operands that take constant values can actually have global symbols or other constant expressions. - When a load-constant instruction can be folded into users, make sure to only delete it when all users have been successfully converted. llvm-svn: 348802	2018-12-10 21:56:04 +00:00
David Green	bd72be0b44	[Targets] Fixup incorrect targets in codemodel tests llvm-svn: 348796	2018-12-10 20:55:34 +00:00
Krzysztof Parzyszek	c1b2d5905a	Revert "[Hexagon] Check if operand is an immediate before getImm" This reverts r348787. The patch wasn't quite correct. llvm-svn: 348792	2018-12-10 19:30:08 +00:00
Amara Emerson	5ec146046c	[GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes. This patch restricts the capability of G_MERGE_VALUES, and uses the new G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places. This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32> and <2 x s64> vectors. Differential Revisions: https://reviews.llvm.org/D53629 llvm-svn: 348788	2018-12-10 18:44:58 +00:00
Krzysztof Parzyszek	c6e9380a56	[Hexagon] Check if operand is an immediate before getImm llvm-svn: 348787	2018-12-10 18:39:47 +00:00
Simon Pilgrim	fc2c9af99c	[TargetLowering] Add UNDEF folding to SimplifyDemandedVectorElts If all the demanded elements of the SimplifyDemandedVectorElts are known to be UNDEF, we can simplify to an ISD::UNDEF node. Zero constant folding will be handled in a future patch - its a little trickier as we often have bitcasted zero values. Differential Revision: https://reviews.llvm.org/D55511 llvm-svn: 348784	2018-12-10 18:29:46 +00:00
Neil Henning	e448351b77	[AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D. This commit changes which l1 flush instruction is used for AMDPAL and MESA3d workloads to flush the entire l1 cache instead of just the volatile lines. Differential Revision: https://reviews.llvm.org/D55367 llvm-svn: 348771	2018-12-10 16:35:53 +00:00
Sanjay Patel	45ae6b50d8	[x86] add tests for LowerVSETCC with min/max; NFC llvm-svn: 348769	2018-12-10 16:28:30 +00:00
Francis Visoiu Mistrih	0ad1af72cd	[DAGCombiner] Simplify test case from r348759 Thanks Simon for pointing that out. llvm-svn: 348765	2018-12-10 16:04:56 +00:00
Petr Pavlu	84e89ff06f	[GlobalISel] Set stack protector index when translating Intrinsic::stackprotector Record the stack protector index in MachineFrameInfo when translating Intrinsic::stackprotector similarly as is done by SelectionDAG when processing the same intrinsic. Setting this index allows the Prologue/Epilogue Insertion to recognize that the stack protection is enabled. The pass can then make sure that the stack protector comes before local variables on the stack and assigns potentially vulnerable objects first so they are close to the stack protector slot. Differential Revision: https://reviews.llvm.org/D55418 llvm-svn: 348761	2018-12-10 15:15:05 +00:00
Francis Visoiu Mistrih	753efe3584	[DAGCombiner] Use the result value type in visitCONCAT_VECTORS This triggers an assert when combining concat_vectors of a bitcast of merge_values. With asserts disabled, it fails to select: fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8 0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1 0x7ff19d000b50: f64 = Register %1 In function: d Differential Revision: https://reviews.llvm.org/D55507 llvm-svn: 348759	2018-12-10 14:31:34 +00:00
Tim Corringham	4c4d2fe280	[AMDGPU] Add new Mode Register pass A new pass to manage the Mode register. Currently this just manages the floating point double precision rounding requirements, but is intended to be easily extended to encompass all Mode register settings. The immediate motivation comes from the requirement to use the round-to-zero rounding mode for the 16 bit interpolation instructions, where the rounding mode setting is shared between 16 and 64 bit operations. llvm-svn: 348754	2018-12-10 12:06:10 +00:00
Jeremy Morse	045c67769d	[DebugInfo] Emit undef DBG_VALUEs when SDNodes are optimised out This is a fix for PR39896, where dbg.value's of SDNodes that have been optimised out do not lead to "DBG_VALUE undef" instructions being created. Such undef instructions are necessary to terminate earlier variable ranges, otherwise variable values leak past the point where they're valid. The "invalidated" flag of SDDbgValue is currently being abused to mean two things: * The corresponding SDNode is now invalid * This SDDbgValue should not be emitted Of which there are several legitimate combinations of meaning: * The SDNode has been invalidated and we should emit "DBG_VALUE undef" * The SDNode has been invalidated but the debug data was salvaged, don't emit anything for this SDDbgValue * This SDDbgValue has been emitted This patch introduces distinct "Emitted" and "Invalidated" fields to the SDDbgValue class, updates users accordingly, and generates "undef" DBG_VALUEs for invalidated records. Awkwardly, there are circumstances where we emit SDDbgValue's twice, specifically DebugInfo/X86/dbg-addr-dse.ll which I've preserved. Differential Revision: https://reviews.llvm.org/D55372 llvm-svn: 348751	2018-12-10 11:20:47 +00:00
Nikita Popov	e79477895e	[X86] Fix AvoidStoreForwardingBlocks pass for negative displacements Fixes https://bugs.llvm.org/show_bug.cgi?id=39926. The size of the first copy was computed as std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as I can see, this should just be LdDisp2 - LdDisp1. The case where LdDisp1 > LdDisp2 is already handled in the code above, in which case LdDisp2 is set to LdDisp1 and this subtraction will evaluate to Size1 = 0, which is the correct value to skip an overlapping copy. Differential Revision: https://reviews.llvm.org/D55485 llvm-svn: 348750	2018-12-10 10:16:50 +00:00
Craig Topper	02b614abc8	[X86] Merge addcarryx/addcarry intrinsic into a single addcarry intrinsic. Both intrinsics do the exact same thing so we really only need one. Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency. llvm-svn: 348737	2018-12-10 06:07:50 +00:00
Brian Gesiak	b963c5150d	[AMDGPU] Fix discarded result of addAttribute Summary: `llvm::AttributeList` and `llvm::AttributeSet` are immutable, and so methods defined on these classes, such as `addAttribute`, return a new immutable object with the attribute added. In https://reviews.llvm.org/D55217 I attempted to annotate methods such as `addAttribute` with `LLVM_NODISCARD`, since calling these methods has no side-effects, and so ignoring the result that is returned is almost certainly a programmer error. However, committing the change resulted in new warnings in the AMDGPU target. The AMDGPU simplify libcalls pass added in https://reviews.llvm.org/D36436 attempts to add the readonly and nounwind attributes to simplified library functions, but instead calls the `addAttribute` methods and ignores the result. Modify the simplify libcalls pass to actually add the nounwind and readonly attributes. Also update the simplify libcalls test to assert that these attributes are actually being set. Reviewers: rampitec, vpykhtin, rnk Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55435 llvm-svn: 348732	2018-12-09 21:56:50 +00:00
Craig Topper	2b09d17d93	[X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB. Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother. This should go a long way towards fixing PR24545. llvm-svn: 348727	2018-12-09 18:02:37 +00:00
Sanjay Patel	099beb25e4	[x86] regenerate test checks; NFC llvm-svn: 348723	2018-12-09 14:47:53 +00:00
Sanjay Patel	19bc850220	[x86] don't try to convert add with undef operands to LEA The existing code tries to handle an undef operand while transforming an add to an LEA, but it's incomplete because we will crash on the i16 test with the debug output shown below. It's better to just give up instead. Really, GlobalIsel should have folded these before we could get into trouble. # Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected bb.0 (%ir-block.0): liveins: $edi %1:gr32 = COPY killed $edi %0:gr16 = COPY %1.sub_16bit:gr32 %5:gr64_nosp = IMPLICIT_DEF %5.sub_16bit:gr64_nosp = COPY %0:gr16 %6:gr64_nosp = IMPLICIT_DEF %6.sub_16bit:gr64_nosp = COPY %2:gr16 %4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg %3:gr16 = COPY killed %4.sub_16bit:gr32 $ax = COPY killed %3:gr16 RET 0, implicit killed $ax # End machine code for function add_undef_i16. * Bad machine code: Reading virtual register without a def * - function: add_undef_i16 - basic block: %bb.0 (0x7fe6cd83d940) - instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16 - operand 1: %2:gr16 LLVM ERROR: Found 1 machine code errors. Differential Revision: https://reviews.llvm.org/D54710 llvm-svn: 348722	2018-12-09 14:40:37 +00:00
Nikita Popov	3192449412	[X86] Add test for PR39926; NFC The test file shows a case where the avoid store forwarding block pass misses to copy a range (-1..1) when the load displacement changes sign. Baseline test for D55485. llvm-svn: 348712	2018-12-09 12:02:56 +00:00
Sanjay Patel	e767bf4468	[DAGCombiner] re-enable truncation of binops This is effectively re-committing the changes from: rL347917 (D54640) rL348195 (D55126) ...which were effectively reverted here: rL348604 ...because the code had a bug that could induce infinite looping or eventual out-of-memory compilation. The bug was that this code did not guard against transforming opaque constants. More details are in the post-commit mailing list thread for r347917. A reduced test for that is included in the x86 bool-math.ll file. (I wasn't able to reduce a PPC backend test for this, but it was almost the same pattern.) Original commit message for r347917: The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. llvm-svn: 348706	2018-12-08 16:07:38 +00:00
Sanjay Patel	04461ee821	[x86] add 32-bit RUN for tests and test with opaque constants; NFC The opaque constant test is reduced from a Chrome file that infinite-looped with rL347917. llvm-svn: 348705	2018-12-08 15:34:09 +00:00
Craig Topper	531103f622	[X86] Remove the XFAILed test added in r348620 It seems to be unexpectedly passing on some bots probably because it requires asserts to fail, but doesn't say that. But we already have a patch in review to make it not xfail so I'd rather just focus on getting it passing rather than trying to figure out an unexpected pass. llvm-svn: 348661	2018-12-07 22:16:40 +00:00

1 2 3 4 5 ...

26833 Commits