llvm-project

Commit Graph

Author	SHA1	Message	Date
serge_sans_paille	e67cbac812	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with proper LiveIn declaration, better option handling and more portable testing. Differential Revision: https://reviews.llvm.org/D68720	2020-02-09 10:42:45 +01:00
serge-sans-paille	4546211600	Revert "Support -fstack-clash-protection for x86" This reverts commit `0fd51a4554`. Failures: http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/4354	2020-02-09 10:06:31 +01:00
serge_sans_paille	0fd51a4554	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with proper LiveIn declaration, better option handling and more portable testing. Differential Revision: https://reviews.llvm.org/D68720	2020-02-09 09:35:42 +01:00
Simon Pilgrim	835c81923e	Fix test name typo	2020-02-08 21:28:46 +00:00
Simon Pilgrim	f9c28dc9a5	[X86][SSE] Add test cases from PR44379	2020-02-08 21:03:03 +00:00
Simon Pilgrim	4b4fbae24a	[X86] Test showing inability to combine ROTLI/ROTRI rotations into shuffles One of many things necessary to fix PR44379 (lowering shuffles to rotations)	2020-02-08 21:03:02 +00:00
Simon Pilgrim	4aa7b9cc96	[X86] X86InstComments - add FMA4 comments These typically match the FMA3 equivalents, although the multiply operands sometimes get flipped due to the FMA3 permute variants.	2020-02-08 17:02:00 +00:00
Simon Pilgrim	10417ad2e4	[X86] Standardize BROADCAST enum names (PR31079) Tweak EVEX implementation names so it matches the other variants by adding the 'r' prefix. Oddly some of the subvec broadcast ops already matched.	2020-02-08 16:55:00 +00:00
Simon Pilgrim	c8bc89a933	Regenerate FMA tests	2020-02-08 15:23:40 +00:00
Simon Pilgrim	2398752f37	Add missing encoding comments from fma scalar folded intrinsics tests	2020-02-08 15:23:39 +00:00
Simon Pilgrim	0ed79e9b8f	[X86] Standardize VPSLLDQ/VPSRLDQ enum names (PR31079) Tweak EVEX implementation names so it matches the other variants	2020-02-08 14:54:44 +00:00
serge-sans-paille	658495e6ec	Revert "Support -fstack-clash-protection for x86" This reverts commit `e229017732`. Failures: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/2604 http://lab.llvm.org:8011/builders/llvm-clang-win-x-aarch64/builds/4308	2020-02-08 14:26:22 +01:00
Victor Campos	af2a384581	Revert "[ARM] Improve codegen of volatile load/store of i64" This reverts commit `60e0120c91`.	2020-02-08 13:18:45 +00:00
serge_sans_paille	e229017732	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with better option handling and more portable testing Differential Revision: https://reviews.llvm.org/D68720	2020-02-08 13:31:52 +01:00
Simon Pilgrim	ed92ac73af	Add missing encoding comments from fma4 folded intrinsics tests	2020-02-08 11:24:22 +00:00
Simon Pilgrim	7f5b3fa73c	[X86][SSE] Add X86ISD::FRCP handling to isNegatibleForFree Peek through X86ISD::FRCP nodes to see if there is a negatible input.	2020-02-08 10:56:27 +00:00
Simon Pilgrim	63e338be2c	[X86][SSE] Show isNegatibleForFree inability to peek through X86ISD::FRCP We can safely negate the input of RCP but we can't peek through it.	2020-02-08 10:40:49 +00:00
Craig Topper	2af1640f9a	[LegalizeDAG][X86][AMDGPU] Use ANY_EXTEND instead of ZERO_EXTEND when promoting ISD::CTTZ/CTTZ_ZERO_UNDEF. Summary: For CTTZ we place a set bit just past where the non-promoted type stopped so the extended bits won't be used for the count. For CTTZ_ZERO_UNDEF we don't care what happens if no bits are set in the original type and we end up counting into the extended bits. So we can just use ANY_EXTEND for both cases. This matches what is done in type legalization for these operations. We make no effort to force the upper bits to zero. Differential Revision: https://reviews.llvm.org/D74111	2020-02-07 22:25:56 -08:00
Sam Clegg	caeb6cfbc2	[WebAssembly] Fix signature of __powitf2 libcall Add tests for @llvm.powi.f64/f128. See: https://llvm.org/docs/LangRef.html#llvm-powi-intrinsic Differential Revision: https://reviews.llvm.org/D74274	2020-02-07 20:30:47 -08:00
Heejin Ahn	5b5cbfe135	[WebAssembly] Add debug info to insts in Emscripten SjLj Summary: This makes sure all newly create instructions in Emscripten SjLj has appropriate debug info attached. Fixes https://github.com/emscripten-core/emscripten/issues/9797. Reviewers: kripken Subscribers: dschuff, aprantl, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74269	2020-02-07 19:08:39 -08:00
Amara Emerson	35c63d66aa	[GlobalISel][CallLowering] Look through bitcasts from constant function pointers. Calls to ObjC's objc_msgSend function are done by bitcasting the function global to the required function type signature. This patch looks through this bitcast so that we can do a direct call with bl on arm64 instead of using an indirect blr. Differential Revision: https://reviews.llvm.org/D74241	2020-02-07 15:32:54 -08:00
Craig Topper	bb717d3f46	[X86] Correct the implementation of the avx512 masked fmsubadd autoupgrade code to not leave the negate unconnected. This was causing us to generate fmaddsub instead of fmsubadd if rounding control is not 4.	2020-02-07 15:27:05 -08:00
Craig Topper	598d9dd846	[X86] Add more avx512 masked fmaddsub/fmsubadd autoupgrade tests with rounding control not set to 4. The fmsubadd upgrade doesn't insert the negate properly when the rounding control isn't 4.	2020-02-07 15:26:09 -08:00
Nemanja Ivanovic	26bf877ec5	[PowerPC] Fix spilling of vector registers in PEI of EH aware functions On little endian targets prior to Power9, we spill vector registers using a swapping store (i.e. stdxvd2x saves the vector with the two doublewords in big endian order regardless of endianness). This is generally not a problem since we restore them using the corresponding swapping load (lxvd2x). However if the restore is done by the unwinder, the vector register contains data in the incorrect order. This patch fixes that by using Altivec loads/stores for vector saves and restores in PEI (which keep the order correct) under those specific conditions: - EH aware function - Subtarget requires swaps for VSX memops (Little Endian prior to Power9) Differential revision: https://reviews.llvm.org/D73692	2020-02-07 14:41:52 -06:00
Nico Weber	b03c3d8c62	Revert "Support -fstack-clash-protection for x86" This reverts commit `4a1a0690ad`. Breaks tests on mac and win, see https://reviews.llvm.org/D68720	2020-02-07 14:49:38 -05:00
Changpeng Fang	884acbb9e1	AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare Summary: The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp. Also, afn instead of arcp is used to allow inaccurate rcp to be used. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D73588	2020-02-07 11:46:23 -08:00
Jessica Paquette	609a489e05	[AArch64][GlobalISel] Reland SLT/SGT TBNZ optimization The issue in the previous commits was that we swap the LHS and RHS while looking for the constant. In SLT/SGT, the constant must be on the RHS, or the optimization is invalid. Move the swapping logic after the check for the SLT/SGT case and update tests. Original commits: `d78cefb160` `a373841407`	2020-02-07 11:15:25 -08:00
Changpeng Fang	6370c7c13e	AMDGPU: Limit the search in finding the instruction pattern for v_swap generation. Summary: Current implementation of matchSwap in SIShrinkInstructions searches the entire use_nodbg_operands set to find the possible pattern to generate v_swap instruction. This approach will lead to a O(N^3) in compile time for SIShrinkInstructions. But in reality, the matching pattern only exists within nearby instructions in the same basic block. This work limits the search to a maximum of 16 instructions, and has a linear compile time comsumption. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D74180	2020-02-07 11:06:33 -08:00
serge_sans_paille	4a1a0690ad	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with correct option flags set. Differential Revision: https://reviews.llvm.org/D68720	2020-02-07 19:54:39 +01:00
Sean Fertile	88073d40c7	[PowerPC] Create a FixedStack object for CR save in linkage area. hasReservedSpillSlot returns a dummy frame index of '0' on PPC64 for the non-volatile condition registers, which leads to the CalleSavedInfo either referencing an unrelated stack object, or an invalid object if there are no stack objects. The latter case causes the mir-printer to crash due to assertions that checks if the frame index referenced by a CalleeSavedInfo is valid. To fix the problem create an immutable FixedStack object at the correct offset in the linkage area of the previous stack frame (ie SP + positive offset). Differential Revision: https://reviews.llvm.org/D73709	2020-02-07 13:33:44 -05:00
Craig Topper	278578744a	[X86] Handle SETB_C32r/SETB_C64r in flag copy lowering the same way we handle SBB Previously we took the restored flag in a GPR, extended it 32 or 64 bits. Then used as an input to a sub from 0. This requires creating a zero extend and creating a 0. This patch changes this to just use an ADD with 255 to restore the carry flag and keep the SETB_C32r/SETB_C64r. Exactly like we handle SBB which is what SETB becomes. Differential Revision: https://reviews.llvm.org/D74152	2020-02-07 10:31:19 -08:00
Matt Arsenault	cbe0c8299e	AMDGPU/GlobalISel: Fix missing test for select of s64 scalar G_CTPOP	2020-02-07 13:15:48 -05:00
Vedant Kumar	0d0ef315cb	[MachineInstr] Add isCandidateForCallSiteEntry predicate Add the isCandidateForCallSiteEntry predicate to MachineInstr to determine whether a DWARF call site entry should be created for an instruction. For now, it's enough to have any call instruction that doesn't belong to a blacklisted set of opcodes. For these opcodes, a call site entry isn't meaningful. Differential Revision: https://reviews.llvm.org/D74159	2020-02-07 10:10:41 -08:00
Simon Pilgrim	c96001035d	[X86] isNegatibleForFree - allow pre-legalized FMA negation As long as the FMA operation is legal (which we can proxy for the FMA3/FMA4 variants as well), we don't have to wait for the LegalOperations stage.	2020-02-07 17:04:17 +00:00
Amara Emerson	28d22c2c9c	[GlobalISel][IRTranslator] Add special case support for ~memory inline asm clobber. This is a one off special case, since actually implementing full inline asm support will be much more involved. This lets us compile a lot more code as a common simple case. Differential Revision: https://reviews.llvm.org/D74201	2020-02-07 08:55:23 -08:00
Jinsong Ji	01edae1271	[AsmPrinter] Print FP constant in hexadecimal form instead Printing floating point number in decimal is inconvenient for humans. Verbose asm output will print out floating point values in comments, it helps. But in lots of cases, users still need additional work to covert the decimal back to hex or binary to check the bit patterns, especially when there are small precision difference. Hexadecimal form is one of the supported form in LLVM IR, and easier for debugging. This patch try to print all FP constant in hex form instead. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D73566	2020-02-07 16:00:55 +00:00
Matt Arsenault	2f885cbe90	AMDGPU/GlobalISel: Fix move s.buffer.load to VALU We were executing this in a waterfall loop as a placeholder, but this should really be converted to a MUBUF load. Also execute in a waterfall loop if the resource isn't an SGPR. This is a case where the DAG handling was wrong because doing the right thing was too hard. Currently, this will mishandle 96-bit loads. There's currently no way to track the original memory size with an MMO, so these loads will be widened andd the resulting memory size will be 128-bits.	2020-02-07 07:19:01 -08:00
Matt Arsenault	8de2dad9e0	GlobalISel: Fix lowering of G_CTLZ/G_CTTZ The type passed to lower was invalid, so I'm not sure how this was even working before. The source and destination type also do not have to match, so make sure to use the right ones.	2020-02-07 06:54:12 -08:00
Sam Parker	2db5547c01	[NFC][ARM] Update test	2020-02-07 14:20:19 +00:00
Sam Parker	441cafb881	[NFC][ARM] Modified test with update script	2020-02-07 13:43:34 +00:00
serge-sans-paille	f6d98429fc	Revert "Support -fstack-clash-protection for x86" This reverts commit `39f50da2a3`. The -fstack-clash-protection is being passed to the linker too, which is not intended. Reverting and fixing that in a later commit.	2020-02-07 11:36:53 +01:00
serge_sans_paille	39f50da2a3	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html Differential Revision: https://reviews.llvm.org/D68720	2020-02-07 10:56:15 +01:00
Matt Arsenault	6a570dc548	AMDGPU/GlobalISel: Fix non-pow-2 add/sub/mul for 16-bit insts These wouldn't legalize between 16-bits and 32-bits on targets with 16-bit instructions.	2020-02-06 21:43:54 -05:00
Stanislav Mekhanoshin	2863c26968	Revert "AMDGPU: Limit the search in finding the instruction pattern for v_swap generation." This reverts commit `9827806481`.	2020-02-06 17:38:55 -08:00
Changpeng Fang	9827806481	AMDGPU: Limit the search in finding the instruction pattern for v_swap generation. Summary: Current implementation of matchSwap in SIShrinkInstructions searches the entire use_nodbg_operands set to find the possible pattern to generate v_swap instruction. This approach will lead to a O(N^3) in compile time for SIShrinkInstructions. But in reality, the matching pattern only exists within nearby instructions in the same basic block. This work limits the search to a maximum of 16 instructions, and has a linear compile time comsumption. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D74180	2020-02-06 16:40:21 -08:00
Jessica Paquette	3e5d837cda	Revert "[AArch64][GlobalISel] Emit TBNZ with G_BRCOND where the condition is SLT" This reverts commit `a373841407`. It looks like this broke set_shadow_test.c, so I'm reverting until I can fix it. I also reverted the SGT change because it's probably also broken.	2020-02-06 16:30:13 -08:00
Jessica Paquette	df51b685ef	Revert "[AArch64][GlobalISel] Emit TBZ for SGT cond branches against -1" This reverts commit `d78cefb160`. One of this and the SLT change broke set_shadow_test.c, so I'm reverting until I can fix it.	2020-02-06 16:29:00 -08:00
Amara Emerson	ac8a12c874	[GlobalISel] Use G_ZEXTLOAD instead of an anyextending load for non-pow-2 legalization. Fixes PR43288	2020-02-06 14:36:36 -08:00
Craig Topper	f2d7aad1ce	[X86] Add the rest of the tests that were supposed to go with `90c31b0f42` I forgot to git add them when applying the patch from phab.	2020-02-06 13:34:01 -08:00
Craig Topper	ec9a94af4d	[X86] Use MVT::i8 instead of MVT::i64 for shift amount in BuildSDIVPow2 X86 uses i8 for shift amounts. This code can fail on a 32-bit target if it runs after type legalization. This code was copied from AArch64 and modified for X86, but the shift amount wasn't changed to the correct type for X86. Fixes PR44812	2020-02-06 13:32:13 -08:00
Jessica Paquette	d78cefb160	[AArch64][GlobalISel] Emit TBZ for SGT cond branches against -1 When we have a G_BRCOND fed by a sgt compare against -1, we can just emit a TBZ. This is similar to the code in `AArch64TargetLowering::LowerBR_CC`. Also while we're here, properly scope the commutative constant check in `selectCompareBranch`, since it sometimes would call `getConstantVRegValWithLookThrough` twice. Differential Revision: https://reviews.llvm.org/D74149	2020-02-06 12:04:03 -08:00
Craig Topper	600f2e1c4d	[X86] Remove SETB_C8r/SETB_C16r pseudo instructions. Use SETB_C32r and EXTRACT_SUBREG instead. Only 32 and 64 bit SBB are dependency breaking instructons on some CPUs. The 8 and 16 bit forms have to preserve upper bits of the GPR. This patch removes the smaller forms and selects the wider form instead. I had to do this with custom code as the tblgen generated code glued the eflags copytoreg to the extract_subreg instead of to the SETB pseudo. Longer term I think we can remove X86ISD::SETCC_CARRY and use (X86ISD::SBB zero, zero). We'll want to keep the pseudo and select (X86ISD::SBB zero, zero) to either a MOV32r0+SBB for targets where there is no dependency break and SETB_C32/SETB_C64 for targets that have a dependency break. May want some way to avoid the MOV32r0 if the instruction that produced the carry flag happened to def a register that we can use for the dependency. I think the flag copy lowering should be using NEG instead of SUB to handle SETB. That would avoid the MOV32r0 there. Or maybe it should use a ADC with -1 to recreate the carry flag and keep the SETB? That would avoid a MOVZX on the input of the SUB. Differential Revision: https://reviews.llvm.org/D74024	2020-02-06 10:22:24 -08:00
Chris Bowler	b373ec8ce7	[AIX] Implement caller arguments passed in stack memory. This patch implements the caller side of placing function call arguments in stack memory. This removes the current limitation where LLVM on AIX will report fatal error when arguments can't be contained in registers. There is a particular oddity that a float argument that passes in a register and also in stack memory requires that the caller initialize both. From what AIX "ABI" documentation I have it's not clear that this needs to be done, however, it is necessary for compatibility with the AIX XL compiler so I think it's best to implement it the same way. Note a later patch will follow to address the callee side. Differential Revision: https://reviews.llvm.org/D73209	2020-02-06 12:07:34 -05:00
Mikhail Maltsev	2694cc3dca	[ARM][MVE] Add fixed point vector conversion intrinsics Summary: This patch implements the following Arm ACLE MVE intrinsics: * vcvtq_n_* * vcvtq_m_n_* * vcvtq_x_n_* and two corresponding LLVM IR intrinsics: * int_arm_mve_vcvt_fix (vcvtq_n_) int_arm_mve_vcvt_fix_predicated (vcvtq_m_n_, vcvtq_x_n_) Reviewers: simon_tatham, ostannard, MarkMurrayARM, dmgreen Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74134	2020-02-06 16:49:45 +00:00
Jeremy Morse	6531a78ac4	Revert "[DebugInfo] Remove some users of DBG_VALUEs IsIndirect field" This reverts commit `ed29dbaafa`. I'm backing out D68945, which as the discussion for D73526 shows, doesn't seem to handle the -O0 path through the codegen backend correctly. I'll reland the patch when a fix is worked out, apologies for all the churn. The two parent commits are part of this revert too. Conflicts: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/test/DebugInfo/X86/dbg-addr-dse.ll SelectionDAGBuilder conflict is due to a nearby change in `e39e2b4a79` that's technically unrelated. dbg-addr-dse.ll conflicted because `41206b61e3` (legitimately) changes the order of two lines. There are further modifications to dbg-value-func-arg.ll: it landed after the patch being reverted, and I've converted indirection to be represented by the isIndirect field rather than DW_OP_deref.	2020-02-06 14:41:40 +00:00
Jeremy Morse	ece761427f	Revert "[DebugInfo][DAG] Distinguish different kinds of location indirection" This reverts commit `3137fe4d23`. I'm backing out D68945, which this patch is a follow up for. It'll be re-landed when D68945 is fixed. The changes to dbg-value-func-arg.ll occur because our handling of certain kinds of location now mixes up indirection that happens at different points in a DIExpression. While this is a regression, it's a return to the prior behaviour while a better patch is sought.	2020-02-06 14:41:40 +00:00
Sjoerd Meijer	20a1d03d77	[ARM] peephole-bitcast test change. NFC. This test case was XFAIL'ed because the peepholer was missing an optimisation. But the peepholer is now able to handle this case, so enable this test. I will close the corresponding and very old PR11364.	2020-02-06 14:36:48 +00:00
Sam Parker	0a8cae10fe	[ReachingDefs] Make isSafeToMove more strict. Test that we're not moving the instruction through instructions with side-effects. Differential Revision: https://reviews.llvm.org/D74058	2020-02-06 14:06:08 +00:00
Jessica Paquette	a373841407	[AArch64][GlobalISel] Emit TBNZ with G_BRCOND where the condition is SLT When we have a G_ICMP which checks SLT, and the comparison is against 0, we can emit a TBNZ instead of a CBZ. This lets us fold in things into the branch, which can provide some code size savings. This is similar to the case in `AArch64TargetLowering::LowerBR_CC`. https://reviews.llvm.org/D74090	2020-02-05 15:23:54 -08:00
Jessica Paquette	7212f65784	[AArch64][GlobalISel] Fold G_LSHR into test bit calculation Add support for walking through G_LSHR in `getTestBitReg`. Equivalent to the code in `getTestBitOperand` in AArch64ISelLowering. ``` (tbz (lshr x, c), b) -> (tbz x, b+c) when b + c is < # bits in x ``` Differential Revision: https://reviews.llvm.org/D74077	2020-02-05 15:14:12 -08:00
Matt Arsenault	9087ef0765	GlobalISel: Allow CSE of G_IMPLICIT_DEF The legalizer produces a lot of these, and they make reading legalized MIR annoying. For some reason, this does seem to sometimes introduce copies of implicit def, which is dumb.	2020-02-05 17:47:21 -05:00
Jonas Paulsson	4a3760d2ba	[SystemZ] Improve handling of inline asm constraints. The "{=v0}" constraint did not result in the expected error message in the abscence of the vector facility, because 'v0' matches as a string into the AnyRegBitRegClass in common code. This patch adds checks for vector support in case of "{v" and soft-float in case of "{f" to remedy this. Review: Ulrich Weigand.	2020-02-05 17:04:16 -05:00
Matt Arsenault	baafe82b07	AMDGPU/GlobalISel: Remove bitcast legality hack	2020-02-05 16:24:24 -05:00
Matt Arsenault	364326ce66	AMDGPU/GlobalISel: Add mem operand to s.buffer.load intrinsic Really the intrinsic definition is wrong, but work around this here. The DAG lowering introduces an MMO. We have to introduce a new operation to avoid the verifier complaining about the missing mayLoad.	2020-02-05 15:04:42 -05:00
Matt Arsenault	5aa6e246a1	AMDGPU/GlobalISel: Legalize f64 G_FFLOOR for SI Use cmp ord instead of cmp_class compared to the DAG version for the nan check, but mostly try to match the existsing pattern. I think the sign doesn't matter for fract, so we could do a little better with the source modifier matching. I think this is also still broken as in D22898, but I'm leaving it as-is for now while I don't have an SI system to test on.	2020-02-05 14:32:01 -05:00
Shu-Chun Weng	ce9633633c	[GlobalISel][AArch64] Fix contract cross-bank copies with SIMD instructions contractCrossBankCopyIntoStore() finds the instruction defines the source register and uses its output to replace the register. There are, however, instructions that have multiple outputs, e.g. G_UNMERGE_VALUES. Current implementation hardcodes to operand 0 and has no way of knowing which output should be used. This change adds another function to directly return the register that is the source of the register and use that for folding. This fixes https://bugs.llvm.org/show_bug.cgi?id=44783 Differential Revision: https://reviews.llvm.org/D74005	2020-02-05 10:38:35 -08:00
David Green	f64b3466b6	[ARM] Add extra use test for MVE VPT blocks. NFC	2020-02-05 18:32:18 +00:00
Jessica Paquette	292f725711	[AArch64][GlobalISel] Fold G_ASHR into TB(N)Z bit calculation This implements walking over G_ASHR in the same way as `getTestBitOperand` in AArch64ISelLowering. ``` (tbz (ashr x, c), b) -> (tbz x, b+c) or (tbz x, msb) if b+c is > # bits in x ``` Differential Revision: https://reviews.llvm.org/D73933	2020-02-05 10:04:48 -08:00
Matt Arsenault	7bffa97285	AMDGPU/GlobalISel: Prefer merge/unmerge ops to legalize TFE These have a better chance of combining with other operations and are currently much better supported than G_EXTRACT.	2020-02-05 12:56:10 -05:00
Jessica Paquette	a82a28ae12	[AArch64][GlobalISel] Fix one use check in getTestBitReg (1) The check needs to be on the 0th operand of whatever we're folding (2) Checks for validity should happen before we change the bit Fixes a bug which caused MultiSource/Applications/JM/lencod to fail at -O3. Differential Revision: https://reviews.llvm.org/D74002	2020-02-05 09:54:52 -08:00
Matt Arsenault	e65e6d052e	AMDGPU/GlobalISel: Legalize TFE image result loads Rewrite the result register pair into the expected sinigle register format in the legalizer. I'm also operating under the assumption that TFE doesn't apply to stores or atomics, but don't know if this is true or not.	2020-02-05 12:40:20 -05:00
Matt Arsenault	69cc9f3046	AMDGPU/GlobalISel: Legalize llvm.amdgcn.s.buffer.load The 96-bit results need to be widened. I find the interaction between LegalizerHelper and MIRBuilder somewhat awkward. The custom legalization is called by the LegalizerHelper, but then does not have access to the helper. You have to construct a new helper, which then does not own the MachineIRBuilder, but does modify it. Maybe custom legalization should be passed the helper?	2020-02-05 12:01:34 -05:00
Matt Arsenault	dfa9420f09	AMDGPU/GlobalISel: Don't use legal v2s16 G_BUILD_VECTOR If we have s_pack_* instructions, legalize this to G_BUILD_VECTOR_TRUNC from s32 elements. This is closer to how how the s_pack_* instructions really behave. If we don't have s_pack_ instructions, expand this by creating a merge to s32 and bitcasting. This expands to the expected bit operations. I think this eventually should go in a new bitcast legalize action type in LegalizerHelper. We already directly emit the shift operations in RegBankSelect for the vector case. This could possibly be cleaned up, but I also may want to defer doing this expansion to selection anyway. I'll see about that when I try to actually match VOP3P instructions. This breaks the selection of the build_vector since tablegen doesn't know how to match G_BUILD_VECTOR_TRUNC yet, so just xfail it for now.	2020-02-05 11:52:18 -05:00
Sjoerd Meijer	01022af5d5	[ARM][MVE] LowOverheadLoops: DCE on the iteration count setup expression Once we have created a tail-predicated hardware-loop, and thus know the number of elements that are processed, we want to clean-up the iteration count expression of that loop. In D73682, we bailed the analysis on conditionally executed instructions. This adds support for IT-blocks, so that we can handle these cases again. The restriction is that we only support IT blocks containing 1 statement, but that seems to cover most cases and forms of the iteration count expression. Differential Revision: https://reviews.llvm.org/D73947	2020-02-05 15:15:46 +00:00
Sam Parker	564275289d	[ARM][LowOverheadLoops] Fix loop count chain Checking that the use-def chain that performs the loop count isSafeToRemove is not sufficient because it means that we can remove register copies that we need to restore lr to its correct value. This change now prevents the transform from kicking in for the 'remove-elem-moves' test which needs to addressed later on. Differential Revision: https://reviews.llvm.org/D74037	2020-02-05 13:21:51 +00:00
Sam Parker	4c7f819204	[ARM][LowOverheadLoops] Ensure memory predication While validating each MVE instruction, check that all instructions that touch memory are somehow predicated upon the VCTP. Differential Revision: https://reviews.llvm.org/D73616	2020-02-05 13:19:08 +00:00
Sebastian Neubauer	163e33b290	[AMDGPU] Fix lowering a16 image intrinsics scalar_to_vector takes only one argument, not two. The a16 tests now also check the packing of coordinates into registers Differential Revision: https://reviews.llvm.org/D73482	2020-02-05 10:54:34 +01:00
Sebastian Neubauer	3bc7ffdaab	[AMDGPU] Use v3f32 type in image instructions This should lower the amount of used registers for gfx9. I updated some of the changed tests with the update script because changing them by hand is tedious. Differential Revision: https://reviews.llvm.org/D73884	2020-02-05 10:35:41 +01:00
Craig Topper	a3d489e87e	[X86] Add a DAG combine for (i32 (sext (i8 (x86isd::setcc_carry)))) -> (i32 (x86isd::setcc_carry)) and remove isel patterns. Same for any_extend though we don't have coverage for that. The test changes are because isel didn't check one use of the setcc_carry. So in isel we would end up with two different sized setcc_carry instructions. And since it clobbers the flags we would need to recreate the flags for the second instruction. This code handles additional uses by truncating the new wide setcc_carry back to the original size for those uses.	2020-02-04 22:40:36 -08:00
Jan Vesely	e6686adf8a	AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x) The old version might be faster on EG (RECIP_IEEE is Trans only), but it'd need extra corner case checks. This gives correct corner case behaviour and saves a register. Fixes OCL CTS sqrt test (1-thread, scalar) on Turks. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D74017	2020-02-05 00:24:07 -05:00
Thomas Lively	649aba93a2	Revert "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering" Summary: This reverts commit `3ef169e586`. The purpose of this commit was to allow stack machines to perform instruction selection for instructions with variadic defs. However, MachineInstrs fundamentally cannot support variadic defs right now, so this change does not turn out to be useful. Depends on D73927. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73928	2020-02-04 20:04:59 -08:00
Yonghong Song	d96c1bbaa0	[BPF] disable ReduceLoadWidth during SelectionDag phase The compiler may transform the following code ctx = ctx + reloc_offset ... ((u32 )ctx) & 0x8000 ... to ctx = ctx + reloc_offset ... ((u8 )(ctx + 1)) & 0x80 ... where reloc_offset will be replaced with a constant during AsmPrinter phase. The above transformed code will be rejected the kernel verifier as it does not allow (type )((ctx + non_zero_offset1) + non_zero_offset2) style access pattern. It is hard at SelectionDag phase to identify whether a load is related to context or not. Sometime, interprocedure analysis may be needed. So let us simply prevent such optimization from happening. Differential Revision: https://reviews.llvm.org/D73997	2020-02-04 18:37:43 -08:00
Thomas Lively	27748363da	[WebAssembly] Enable recently implemented SIMD operations Summary: Moves a batch of instructions from unimplemented-simd128 to simd128 because they have recently become available in V8. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73926	2020-02-04 18:36:32 -08:00
Craig Topper	016f42e3dc	[X86] Add custom lowering for lrint/llrint to either cvtss2si/cvtsd2si or fist. lrint/llrint are defined as rounding using the current rounding mode. Numbers that can't be converted raise FE_INVALID and an implementation defined value is returned. They may also write to errno. I believe this means we can use cvtss2si/cvtsd2si or fist to convert as long as -fno-math-errno is passed on the command line. Clang will leave them as libcalls if errno is enabled so they won't become ISD::LRINT/LLRINT in SelectionDAG. For 64-bit results on a 32-bit target we can't use cvtss2si/cvtsd2si but we can use fist since it can write to a 64-bit memory location. Though maybe we could consider using vcvtps2qq/vcvtpd2qq on avx512dq targets? gcc also does this optimization. I think we might be able to do this with STRICT_LRINT/LLRINT as well, but I've left that for future work. Differential Revision: https://reviews.llvm.org/D73859	2020-02-04 16:15:40 -08:00
Reid Kleckner	2d89e0a098	[SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr The CATCHPAD node mostly existed to be selected into the EH_RESTORE instruction, which sets the frame back up when 32-bit Windows exceptions return to the parent function. However, creating this MachineInstr early increases the risk that other passes will come along and insert instructions that use the stack before ESP and EBP are restored. That happened in PR44697. Instead of representing these in the instruction stream early, delay it until PEI. Mark the blocks where this needs to happen as EHPads, but not funclet entry blocks. Passes after PEI have to be careful not to hoist instructions that can use stack across frame setup instructions, so this should be relatively reliable. Fixes PR44697 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D73752	2020-02-04 15:13:12 -08:00
Matt Arsenault	12fe9b26ec	AMDGPU/GlobalISel: Select G_SEXT_INREG	2020-02-04 13:23:53 -08:00
Matt Arsenault	0693e827ed	AMDGPU/GlobalISel: Do a better job splitting 64-bit G_SEXT_INREG We don't need to expand to full shifts for the > 32-bit case. This just switches to a sext_inreg of the high half.	2020-02-04 13:23:53 -08:00
Matt Arsenault	05f2a04ba7	AMDGPU/GlobalISel: Legalize G_SEXT_INREG Split the VALU 64-bit case in RegBankSelect.	2020-02-04 13:23:53 -08:00
Austin Kerbow	0f116fd9d8	[AMDGPU] Fix infinite loop with fma combines https://reviews.llvm.org/D72312 introduced an infinite loop which involves DAGCombiner::visitFMA and AMDGPUTargetLowering::performFNegCombine. fma( a, fneg(b), fneg(c) ) => fneg( fma (a, b, c) ) => fma( a, fneg(b), fneg(c) ) ... This only breaks with types where 'isFNegFree' returns flase, e.g. v4f32. Reproducing the issue also needs the attribute 'no-signed-zeros-fp-math', and no source mods allowed on one of the users of the Op. This fix makes changes to indicate that it is not free to negate a fma if it has users with source mods. Differential Revision: https://reviews.llvm.org/D73939	2020-02-04 13:11:09 -08:00
Matt Arsenault	9b0ce8edfa	AMDGPU/GlobalISel: Remove extension legality hacks The legalization has improved since this was added, and the tests relying on this no longer need it.	2020-02-04 12:50:47 -08:00
Matt Arsenault	5d2749938c	AMDGPU/GlobalISel: Custom lower G_FEXP	2020-02-04 11:50:55 -08:00
Matt Arsenault	b461436d01	AMDGPU/GlobalISel: Legalize s16 G_FEXP2	2020-02-04 11:50:55 -08:00
Matt Arsenault	1024b73ef5	AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled	2020-02-04 10:44:21 -08:00
Fangrui Song	8ff86fcf4c	[X86] -fpatchable-function-entry=N,0: place patch label after ENDBR{32,64} Similar to D73680 (AArch64 BTI). A local linkage function whose address is not taken does not need ENDBR32/ENDBR64. Placing the patch label after ENDBR32/ENDBR64 has the advantage that code does not need to differentiate whether the function has an initial ENDBR. Also, add 32-bit tests and test that .cfi_startproc is at the function entry. The line information has a general implementation and is tested by AArch64/patchable-function-entry-empty.mir Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D73760	2020-02-04 09:42:36 -08:00
Yonghong Song	6d07802d63	[BPF] handle typedef of struct/union for CO-RE relocations Linux commit `1cf5b23988 (diff-289313b9fec99c6f0acfea19d9cfd949)` uses "#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)" to apply CO-RE relocations to all records including the following pattern: #pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record) typedef struct { int a; } __t; #pragma clang attribute pop int test(__t *arg) { return arg->a; } The current approach to use struct/union type in the relocation record will result in an anonymous struct, which make later type matching difficult in bpf loader. In fact, current BPF backend will fail the above program with assertion: clang: ../lib/Target/BPF/BPFAbstractMemberAccess.cpp:796: ... Assertion `TypeName.size()' failed. clang will change to use the type of the base of the member access which will preserve the typedef modifier for the preserve_{struct,union}_access_index intrinsics in the above example. Here we adjust BPF backend to accept that the debuginfo type metadata may be 'typedef' and handle them properly. Differential Revision: https://reviews.llvm.org/D73902	2020-02-04 08:53:03 -08:00
Kazushi (Jam) Marukawa	3ed12232b0	[VE] half fptrunc+store&load+fpext Summary: fp16 (half) load+fpext and fptrunc+store isel legalization and tests. Also, ExternalSymbolSDNode operand printing (tested by fp16 lowering). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73899	2020-02-04 17:16:09 +01:00
Jonas Paulsson	e943329ba0	[SystemZ] Add 'REQUIRES:' or '-mtriple' to some newly added tests. Needed to fix buildbots.	2020-02-04 10:52:10 -05:00
Jonas Paulsson	563e84790f	[SystemZ] Support -msoft-float This is needed when building the Linux kernel. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D72189	2020-02-04 10:32:45 -05:00
Sanjay Patel	0cf0be993c	[InstCombine] fix operands of shouldChangeType() for casted phi transform This is a bug noted in the recent D72733 and seen in the similar transform just above the changed source code. I added tests with illegal types and zexts to show the bug - we could transform legal phi ops to illegal, etc. I did not add tests with trunc because we won't see any diffs on those patterns. That is because InstCombiner::SliceUpIllegalIntegerPHI() appears to do those transforms independently of datalayout. It can also create more casts than are present in existing code. There are some existing regression tests that do not include a datalayout that would be altered by this fix. I assumed that the lack of a datalayout in those regression files is an oversight, so I added the minimal layout (make i32 legal) necessary to preserve behavior on those tests. Differential Revision: https://reviews.llvm.org/D73907	2020-02-04 07:45:48 -05:00
David Green	362d00e051	[ARM][VecReduce] Force expand vector_reduce_fmin Under MVE, we do not have any lowering for fminimum, which a vector_reduce_fmin without NoNan will be expanded into. As with the other recent patches, force this to expand in the pre-isel pass. Note that Neon lowering would be OK because the scalar fminimum uses the vector VMIN instruction, but is probably better to just rely on the scalar operations, which is what is done here. Also fixes what appears to be the reversal of INF vs -INF in the vector_reduce_fmin widening code.	2020-02-04 09:36:59 +00:00
Jessica Paquette	9effe38b22	[AArch64][GlobalISel] Fold G_XOR into TB(N)Z bit calculation This ports the existing case for G_XOR from `getTestBitOperand` in AArch64ISelLowering into GlobalISel. The idea is to flip between TBZ and TBNZ while walking through G_XORs. Let's say we have ``` tbz (xor x, c), b ``` Let's say the `b`-th bit in `c` is 1. Then - If the `b`-th bit in `x` is 1, the `b`-th bit in `(xor x, c)` is 0. - If the `b`-th bit in `x` is 0, then the `b`-th bit in `(xor x, c)` is 1. So, then ``` tbz (xor x, c), b == tbnz x, b ``` Let's say the `b`-th bit in `c` is 0. Then - If the `b`-th bit in `x` is 1, the `b`-th bit in `(xor x, c)` is 1. - If the `b`-th bit in `x` is 0, then the `b`-th bit in `(xor x, c)` is 0. So, then ``` tbz (xor x, c), b == tbz x, b ``` Differential Revision: https://reviews.llvm.org/D73929	2020-02-03 15:22:24 -08:00
Jay Foad	2252cac694	[ANDGPU] getMemOperandsWithOffset: support BUF non-stack-access instructions with resource but no vaddr Summary: This enables clustering for many more BUF instructions. Reviewers: rampitec, arsenm, nhaehnle Subscribers: jvesely, wdng, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73868	2020-02-03 22:49:30 +00:00
Jessica Paquette	37910fd0e1	[AArch64][GlobalISel] Fold G_SHL into TB(N)Z bit calculation This implements the following optimization: ``` (tbz (shl x, c), b) -> (tbz x, b-c) ``` Which appears in `getTestBitOperand` in AArch64ISelLowering.cpp. If we test bit `b` of `shl x, c`, we can fold away the `shl` by looking `c` bits to the right of `b` in `x` when this fits in the type. So, we can just test the `b-c`th bit. Differential Revision: https://reviews.llvm.org/D73924	2020-02-03 14:27:08 -08:00
Matt Arsenault	7d3aace3f5	AMDGPU: Add flag to control mem intrinsic expansion GlobalISel doesn't implement the expansion for these yet, so add a flag to force expanding these so it's possible to avoid these for a while.	2020-02-03 14:26:01 -08:00
David Green	d05e4ff4af	[ARM] MVE vector reduction fadd and fmul tests. NFC	2020-02-03 22:03:56 +00:00
Matt Arsenault	cb7b661d3d	AMDGPU: Analyze divergence of inline asm	2020-02-03 12:42:16 -08:00
Matt Arsenault	2758ae41ae	AMDGPU/GlobalISel: Allow selecting s128 load/stores	2020-02-03 12:28:08 -08:00
Matt Arsenault	726446a009	AMDGPU: Fix splitting wide f32 s.buffer.load intrinsics This would witch f32 to i32, and produce an invald concat_vectors from i32 pieces to an f32 vector.	2020-02-03 12:28:08 -08:00
David Tenty	77e71c5217	[AIX] Don't use a zero fill with a second parameter Summary: The AIX assembler .space directive can't take a second non-zero argument to fill with. But LLVM emitFill currently assumes it can. We add a flag to the AsmInfo to check if non-zero fill is supported, and if we can't zerofill non-zero values we just splat the .byte directives. Reviewers: stevewan, sfertile, DiggerLin, jasonliu, Xiangling_L Reviewed By: jasonliu Subscribers: Xiangling_L, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73554	2020-02-03 15:16:08 -05:00
Jessica Paquette	2bd46444d7	[AArch64][GlobalISel] Walk through G_AND in TB(N)Z bit calculation Given ``` tb(n)z (and x, m), b ``` Where the `b`-th bit of `m` is 1, ``` tb(n)z (and x, m), b == tb(n)z x, b ``` So, we can walk past a `G_AND` in this case. Also add test/CodeGen/AArch64/GlobalISel/opt-fold-and-tbz-tbnz.mir to test this. Differential Revision: https://reviews.llvm.org/D73790	2020-02-03 11:53:47 -08:00
Amara Emerson	b911b99052	[AArch64][GlobalISel] Don't reconvert to p0 in convertPtrAddToAdd(). convertPtrAddToAdd improved overall code size and quality by a significant amount, but on -O0 we generate some cross-class copies due to the fact that we emitted G_PTRTOINT and G_INTTOPTR around the G_ADD. Unfortunately at -O0 we don't run any register coalescing, so these cross class copies end up escaping as moves, and we ended up regressing 3 benchmarks on CTMark (though still a winner overall). This patch changes the lowering to instead directly emit the G_ADD into the destination register, and then force changes the dest LLT to s64 from p0. This should be ok, as all uses of the register should now be selected and therefore the LLT doesn't matter for the users. It does however matter for the importer patterns, which will fail to select a G_ADD if there's a p0 LLT. I'm not able to get rid of the G_PTRTOINT on the source yet however. We can't use the same trick of breaking the type system since that could break the selection of the defining instruction. Thus with -O0 we still end up with a cross class copy on source. Code size improvements on -O0: Program baseline new diff test-suite :: CTMark/Bullet/bullet.test 965520 949164 -1.7% test-suite...TMark/7zip/7zip-benchmark.test 1069456 1052600 -1.6% test-suite...ark/tramp3d-v4/tramp3d-v4.test 1213692 1199804 -1.1% test-suite...:: CTMark/sqlite3/sqlite3.test 421680 419736 -0.5% test-suite...-typeset/consumer-typeset.test 837076 833380 -0.4% test-suite :: CTMark/lencod/lencod.test 799712 796976 -0.3% test-suite...:: CTMark/ClamAV/clamscan.test 688264 686132 -0.3% test-suite :: CTMark/kimwitu++/kc.test 1002344 999648 -0.3% test-suite...Mark/mafft/pairlocalalign.test 422296 421768 -0.1% test-suite :: CTMark/SPASS/SPASS.test 656792 656532 -0.0% Geomean difference -0.6% Differential Revision: https://reviews.llvm.org/D73910	2020-02-03 11:50:22 -08:00
Matt Arsenault	cd7650c186	GlobalISel: Implement fewerElementsVector for G_SEXT_INREG Start using a new strategy with a combination of merge and unmerges. This allows scalarizing before lowering, which in cases like <2 x s128> avoids producing giant illegal shifts.	2020-02-03 11:47:33 -08:00
Nikita Popov	1cc4f8d172	[ARM] Expand vector reduction intrinsics on soft float Followup to D73135. If the target doesn't have hard float (default for ARM), then we assert when trying to soften the result of vector reduction intrinsics. This patch marks these for expansion as well. (A bit odd to use vectors on a target without hard float ... but that's where you end up if you expose target-independent vector types.) Differential Revision: https://reviews.llvm.org/D73854	2020-02-03 18:49:12 +01:00
Jay Foad	05297b7cbe	[AMDGPU] getMemOperandsWithOffset: add resource operand for BUF instructions Summary: This prevents unwanted clustering of BUF instructions with the same vaddr but different resource descriptors. Reviewers: rampitec, arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73867	2020-02-03 17:06:09 +00:00
Simon Pilgrim	61621f826a	[TargetLowering] SimplifyDemandedBits - add basic KnownBits ZEXTLoad handling We have to be careful in SimplifyDemandedBits with loads in case we attempt to combine back to a constant (which then gets turned into a constant pool load again), but we can at least set the upper KnownBits for a ZEXTLoad to zero.	2020-02-03 16:50:04 +00:00
Simon Pilgrim	8c0e715eb2	[X86] BEXTR SimplifyDemandedBitsForTargetNode - length == 0 -> result = 0	2020-02-03 16:50:03 +00:00
Kazushi (Jam) Marukawa	be9fe6aa8b	[VE] (fp)trunc+store & load+(fp)ext isel Summary: load+sext/zext/fpext and (fp)trunc+store isel legalization and tests Reviewers: arsenm, craig.topper, rengolin, k-ishizaka Reviewed By: arsenm Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits Tags: #ve, #llvm Differential Revision: https://reviews.llvm.org/D73774	2020-02-03 16:55:44 +01:00
Simon Pilgrim	8ead5df0b1	[X86] computeKnownBitsForTargetNode - add BEXTR support (PR39153) Add a KnownBits::extractBits helper	2020-02-03 15:43:59 +00:00
Kazushi (Jam) Marukawa	07c9f7574d	[VE] vaarg functions callers and callees Summary: Isel patterns and tests for vaarg functions as callers and callees. Reviewers: arsenm, rengolin, k-ishizaka Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits Tags: #ve, #llvm Differential Revision: https://reviews.llvm.org/D73710	2020-02-03 16:26:44 +01:00
Simon Pilgrim	241c9a50b4	[X86] Add some initial BEXTR combine tests	2020-02-03 15:16:40 +00:00
Matt Arsenault	00b22df71d	AMDGPU: Fix extra type mangling on llvm.amdgcn.if.break These have to be the same mask type.	2020-02-03 07:02:05 -08:00
John Brawn	68cf574857	[FPEnv][AArch64] Add lowering of f128 STRICT_FSETCC These get lowered to function calls, like the non-strict versions. Differential Revision: https://reviews.llvm.org/D73784	2020-02-03 14:39:16 +00:00
Matt Arsenault	95a9b828f3	AMDGPU/GlobalISel: Fix mem size in test This wasn't intended to tests an extload.	2020-02-03 05:41:14 -08:00
John Brawn	b37d59353f	[FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS These can be lowered to code sequences using CMPFP and CMPFPE which then get selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain operand isn't handled correctly, but resolving that looks like it would involve changes around FPSCR-handling instructions and how the FPSCR is modelled. The fp-intrinsics test was already testing some of this but as the entire test was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the cases where we aren't generating the right instruction sequences as FIXME. Differential Revision: https://reviews.llvm.org/D73194	2020-02-03 12:59:12 +00:00
Simon Tatham	961530fdc9	[ARM,MVE] Fix vreinterpretq in big-endian mode. Summary: In big-endian MVE, the simple vector load/store instructions (i.e. both contiguous and non-widening) don't all store the bytes of a register to memory in the same order: it matters whether you did a VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register formats of different vector types relate to each other in a different way from the in-memory formats. So, if you want to 'bitcast' or 'reinterpret' one vector type as another, you have to carefully specify which you mean: did you want to reinterpret the //register// format of one type as that of the other, or the //memory// format? The ACLE `vreinterpretq` intrinsics are specified to reinterpret the register format. But I had implemented them as LLVM IR bitcast, which is specified for all types as a reinterpretation of the memory format. So a `vreinterpretq` intrinsic, applied to values already in registers, would code-generate incorrectly if compiled big-endian: instead of emitting no code, it would emit a `vrev`. To fix this, I've introduced a new IR intrinsic to perform a register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's implemented by a trivial isel pattern that expects the input in an MQPR register, and just returns it unchanged. In the clang codegen, I only emit this new intrinsic where it's actually needed: I prefer a bitcast wherever it will have the right effect, because LLVM understands bitcasts better. So we still generate bitcasts in little-endian mode, and even in big-endian when you're casting between two vector types with the same lane size. For testing, I've moved all the codegen tests of vreinterpretq out into their own file, so that they can have a different set of RUN lines to check both big- and little-endian. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73786	2020-02-03 11:20:06 +00:00
Simon Tatham	f8d4afc49a	[ARM,MVE] Add intrinsics for v[id]dupq and v[id]wdupq. Summary: These instructions generate a vector of consecutive elements starting from a given base value and incrementing by 1, 2, 4 or 8. The `wdup` versions also wrap the values back to zero when they reach a given limit value. The instruction updates the scalar base register so that another use of the same instruction will continue the sequence from where the previous one left off. At the IR level, I've represented these instructions as a family of target-specific intrinsics with two return values (the constructed vector and the updated base). The user-facing ACLE API provides a set of intrinsics that throw away the written-back base and another set that receive it as a pointer so they can update it, plus the usual predicated versions. Because the intrinsics return two values (as do the underlying instructions), the isel has to be done in C++. This is the first family of MVE intrinsics that use the `imm_1248` immediate type in the clang Tablegen framework, so naturally, I found I'd given it the wrong C integer type. Also added some tests of the check that the immediate has a legal value, because this is the first time those particular checks have been exercised. Finally, I also had to fix a bug in MveEmitter which failed an assertion when I nested two `seq` nodes (the inner one used to extract the two values from the pair returned by the IR intrinsic, and the outer one put on by the predication multiclass). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73357	2020-02-03 11:20:06 +00:00
Simon Tatham	cf7e98e6f7	[ARM,MVE] Add intrinsics for vdupq. Summary: The unpredicated case of this is trivial: the clang codegen just makes a vector splat of the input, and LLVM isel is already prepared to handle that. For the predicated version, I've generated a `select` between the same vector splat and the `inactive` input parameter, and added new Tablegen isel rules to match that pattern into a predicated `MVE_VDUP` instruction. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73356	2020-02-03 11:20:06 +00:00
Guillaume Chatelet	75d9994a51	Fix broken invariant Summary: A Copy with a source that is zeros is the same as a Set of zeros. This fixes the invariant that SrcAlign should always be non-null. Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73791	2020-02-03 11:01:05 +01:00
Jay Foad	97d9a76afc	[AMDGPU] Don't remove short branches over kills Summary: D68092 introduced a new SIRemoveShortExecBranches optimization pass and broke some graphics shaders. The problem is that it was removing branches over KILL pseudo instructions, and the fix is to explicitly check for that in mustRetainExeczBranch. Reviewers: critson, arsenm, nhaehnle, cdevadas, hakzsam Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73771	2020-02-03 09:26:52 +00:00
Craig Topper	ee85415dbb	[X86] Use MVT::f80 for the result type of the FLD used to convert from SSE register to X87 register in FP_TO_INTHelper.	2020-02-02 13:24:37 -08:00
Craig Topper	246262671f	[X86] Cleanup the lrint/llrint/lround/llround tests a bit. We don't need tests for truncating the result. There's nothing special about those truncates. We can test llrint/llround for 64-bit and 32-bit targets in the same file. Same with lrint/lround with i32 result result. lrint/lround with 64-bit result should only occur on a 64-bit target. Add some missing tests for f80 conversions.	2020-02-02 11:01:05 -08:00
Simon Pilgrim	547a94ffa1	Regenerate bitcast test for upcoming patch.	2020-02-02 18:27:44 +00:00
Simon Pilgrim	0c78b64696	[X86][SSE] Add bitcast <128 x i1> %1 to <2 x i64> test case	2020-02-02 18:00:09 +00:00
Simon Pilgrim	17e91b7dd2	[X86][SSE] combineBitcastvxi1 - add pre-AVX512 v64i1 handling	2020-02-02 18:00:09 +00:00
Fangrui Song	5a56a25b0b	[CodeGenPrepare] Make TargetPassConfig required The code paths in the absence of TargetMachine, TargetLowering or TargetRegisterInfo are poorly tested. As rL285987 said, requiring TargetPassConfig allows us to delete many (untested) checks littered everywhere. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73754	2020-02-02 09:28:45 -08:00
David Green	d50e188a07	Revert "[ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS" This reverts commit `e34801c8e6` and the followup due to multiple problems. I've tried to keep the tests and RDA parts where possible, as those still seem useful.	2020-02-02 13:24:05 +00:00
Fangrui Song	5932f7b8f2	[PatchableFunction] Use an empty DebugLoc The current FirstMI.getDebugLoc() is actually null in almost all cases. If it isn't, the generated .loc will be considered initial. The .loc will have the prologue_end flag and terminate the prologue prematurely. Also use an overload of BuildMI that will not prepend PATCHABLE_FUNCTION_ENTRY to a MachineInstr bundle.	2020-02-01 14:12:06 -08:00
Nicolai Hähnle	ba8110161d	AMDGPU/GFX10: Fix NSA reassign pass when operands are undef Summary: Virtual registers that are undef have an empty LiveInterval at this point, which means beginIndex() and endIndex() cannot be used. We only need those indices to determine the range in which to scan for affected other NSA instructions, and undef operands cannot contribute to that range. Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73831	2020-02-01 22:41:40 +01:00
Craig Topper	a57dd66d5e	[X86] In X86FastEmitSSESelect, fall back to SelectionDAG if the inputs to the compare can't be found in registers. We were checking that the original Value * for the compare operands were null. But that can never happen. I believe we intended to check for 0 registers here instead. Fixes PR44749.	2020-02-01 12:24:55 -08:00
Craig Topper	943b5561d6	[LegalizeTypes][X86] Add a new strategy for type legalizing f16 type that softens it to i16, but promotes to f32 around arithmetic ops. This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes. It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from. This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle. Differential Revision: https://reviews.llvm.org/D73749	2020-02-01 11:21:04 -08:00
Alex Richardson	24ee9c8496	Don't mark MIPS TRAP as isTerminator This was causing machine verifier errors when compiling libunwind. Reviewed By: atanasyan Differential Revision: https://reviews.llvm.org/D73648	2020-02-01 15:50:22 +00:00
Matt Arsenault	c0b12916a7	AMDGPU/GlobalISel: Use more wide vector load/stores This improves the type breakdown for some large vectors. For example, we now get a <4 x s32> and s32 store instead of 5 s32 stores for <5 x s32>.	2020-02-01 10:47:21 -05:00
Matt Arsenault	e3117e5c30	AMDGPU/GlobalISel: Improve legalization of wide stores This fixes legalizations of global stores > 128-bits. It seems work is needed on how this split actually occurs. For example, we get the right code for s160, with an s128 and s32 load, but get 5 s32 loads for <5 x s32>.	2020-02-01 10:47:03 -05:00
Matt Arsenault	bc101ffd77	GlobalISel: Support widening unmerge results with pointer source	2020-02-01 10:47:03 -05:00
Matt Arsenault	98aaed2980	AMDGPU/GlobalISel: Fix forming G_TRUNC with vcc result This somehow got lost when I fixed the boolean handling.	2020-01-31 20:29:41 -05:00
Matt Arsenault	c28f1faaff	AMDGPU: Switch some tests to use generated checks Control flow tests are particularly annoying, and it's probably better to be have comprehensive check lines for them.	2020-01-31 20:29:41 -05:00
Jessica Paquette	b9bf9305d1	[AArch64][GlobalISel] Walk through G_TRUNC in getTestBitReg When you encounter a G_TRUNC, you are moving from a larger type to a smaller type. Asking for the i-th bit on a larger value is the same as asking for the i-th bit on a smaller value. So, we should always be able to walk through G_TRUNC when computing the bit for a TB(N)Z. Differential Revision: https://reviews.llvm.org/D73748	2020-01-31 11:09:55 -08:00
Simon Pilgrim	8fbc7fd567	[DAG] SimplifyMultipleUseDemandedBits - peek through unused ISD::INSERT_SUBVECTOR subvectors If we don't demand any elements of the inserted subvector then just skip it.	2020-01-31 18:57:22 +00:00
Simon Pilgrim	5702dadf6f	[DAG] Enable ISD::INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::INSERT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.	2020-01-31 18:02:34 +00:00
alex-t	5df1ac7846	[AMDGPU] fixed divergence driven shift operations selection Differential Revision: https://reviews.llvm.org/D73483 Reviewers: rampitec	2020-01-31 20:49:56 +03:00
Hiroshi Yamauchi	ac8da31a0f	[PGO][PGSO] Handle MBFIWrapper Some code gen passes use MBFIWrapper to keep track of the frequency of new blocks. This was not taken into account and could lead to incorrect frequencies as MBFI silently returns zero frequency for unknown/new blocks. Add a variant for MBFIWrapper in the PGSO query interface. Depends on D73494.	2020-01-31 09:36:55 -08:00
Jay Foad	2a1b5af299	[GlobalISel] Tidy up unnecessary calls to createGenericVirtualRegister Summary: As a side effect some redundant copies of constant values are removed by CSEMIRBuilder. Reviewers: aemerson, arsenm, dsanders, aditya_nandakumar Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, hiraditya, jrtc27, atanasyan, volkan, Petar.Avramovic, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73789	2020-01-31 17:07:16 +00:00
Danilo Carvalho Grael	44a4f5fc6a	[AArch64][SVE] Add SVE2 mla unpredicated intrinsics. Summary: Add intrinsics for the MLA unpredicated sve2 instructions: - smlalb, smlalt, umlalb, umlalt, smlslb, smlslt, umlslb, umlslt - sqdmlalb, sqdmlalt, sqdmlslb, sqdmlslt - sqdmlalbt, sqdmlslbt Reviewers: efriedma, sdesmalen, cameron.mcinally, c-rhodes, rengolin, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D73746	2020-01-31 11:39:12 -05:00
Matt Arsenault	6fb544d1d2	AMDGPU/GlobalISel: Combine FMIN_LEGACY/FMAX_LEGACY Try out using combine definition rules. This really should be a post-legalizer combine, but the combiner pass is currently pre-legalize. Most of the target combines are really post-legalize, so we should probably move the pass.	2020-01-31 06:58:04 -08:00
Matt Arsenault	49e424e08e	AMDGPU/GlobalISel: Select global MUBUF atomicrmw	2020-01-31 06:05:41 -08:00
Matt Arsenault	0426c2d07d	Reapply "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `6a4acb9d80`.	2020-01-31 06:01:28 -08:00
Kerry McLaughlin	69558c8487	[AArch64][SVE] Add remaining SVE2 intrinsics for uniform DSP operations Summary: Implements the following intrinsics: - @llvm.aarch64.sve.[s\|u]qadd - @llvm.aarch64.sve.[s\|u]qsub - @llvm.aarch64.sve.suqadd - @llvm.aarch64.sve.usqadd - @llvm.aarch64.sve.[s\|u]qsubr - @llvm.aarch64.sve.[s\|u]rshl - @llvm.aarch64.sve.[s\|u]qshl - @llvm.aarch64.sve.[s\|u]qrshl - @llvm.aarch64.sve.[s\|u]rshr - @llvm.aarch64.sve.sqshlu - @llvm.aarch64.sve.sri - @llvm.aarch64.sve.sli - @llvm.aarch64.sve.[s\|u]sra - @llvm.aarch64.sve.[s\|u]rsra - @llvm.aarch64.sve.[s\|u]aba Reviewers: efriedma, sdesmalen, dancgr, cameron.mcinally, c-rhodes, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73551	2020-01-31 10:51:57 +00:00
Sam Parker	e014de3a16	[NFC][ARM] Add test	2020-01-31 10:32:15 +00:00
Fangrui Song	5b22bcc2b7	[X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local For a MC_GlobalAddress reference to a dso_local external GlobalValue with a definition, emit .Lfoo$local to avoid a relocation. -fno-pic and -fpie can infer dso_local but -fpic cannot. In the future, we can explore the possibility of inferring dso_local with -fpic. As the description of D73228 says, LLVM's existing IPO optimization behaviors (like -fno-semantic-interposition) and a previous assembly behavior give us enough license to be aggressive here. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D73230	2020-01-30 17:52:35 -08:00
Amara Emerson	84bd851108	[GlobalISel][IRTranslator] When translating vector geps, splat the base pointer if required. We can have geps that have a scalar base pointer, and a vector index value, which means that the base pointer must be splatted into a vector of pointers. This fixes crashes on arm64 GlobalISel with optimizations enabled.	2020-01-30 16:27:27 -08:00
Matt Arsenault	6a4acb9d80	Revert "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `17dbc6611d`. A test is failing on some bots	2020-01-30 15:39:51 -08:00
Matt Arsenault	17dbc6611d	AMDGPU: Cleanup and fix SMRD offset handling I believe this also fixes bugs with CI 32-bit handling, which was incorrectly skipping offsets that look like signed 32-bit values. Also validate the offsets are dword aligned before folding.	2020-01-30 15:04:21 -08:00
Jessica Paquette	c8c987d310	[AArch64][GlobalISel] Fold in G_ANYEXT/G_ZEXT into TB(N)Z This is similar to the code in getTestBitOperand in AArch64ISelLowering. Instead of implementing all of the TB(N)Z optimizations at once, this patch implements the simplest case first. The way that this is set up should make it fairly easy to add the rest as we go along. The idea here is that after determining that we can use a TB(N)Z, we can continue looking through instructions and perform further folding. In this case, when we have a G_ZEXT or G_ANYEXT where the extended bits are not used, we can fold it into the TB(N)Z. Differential Revision: https://reviews.llvm.org/D73673	2020-01-30 14:51:26 -08:00
David Tenty	809c872aae	[NFC] Fix check prefix add in fcanonicalize-elimination.ll The test fix added by "D39306: Fix CodeGen/AMDGPU/fcanonicalize-elimination.ll on FreeBSD 11.0" uses a test prefix which is not actually used in the FileCheck stanza. Thus the problem originally encountered still exists and the tests fails for host triples that contain "1.0", including AIX 7.1.0.	2020-01-30 17:19:49 -05:00
Fangrui Song	06b8e32d4f	[AArch64] -fpatchable-function-entry=N,0: place patch label after BTI Summary: For -fpatchable-function-entry=N,0 -mbranch-protection=bti, after `9a24488cb6`, we place the NOP sled after the initial BTI. ``` .Lfunc_begin0: bti c nop nop .section __patchable_function_entries,"awo",@progbits,f,unique,0 .p2align 3 .xword .Lfunc_begin0 ``` This patch adds a label after the initial BTI and changes the __patchable_function_entries entry to reference the label: ``` .Lfunc_begin0: bti c .Lpatch0: nop nop .section __patchable_function_entries,"awo",@progbits,f,unique,0 .p2align 3 .xword .Lpatch0 ``` This placement is compatible with the resolution in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 . A local linkage function whose address is not taken does not need a BTI. Placing the patch label after BTI has the advantage that code does not need to differentiate whether the function has an initial BTI. Reviewers: mrutland, nickdesaulniers, nsz, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73680	2020-01-30 11:11:52 -08:00
Danilo Carvalho Grael	0610637aac	[AArch64][SVE] Add remaining SVE2 mla indexed intrinsics. Summary: Add remaining SVE2 mla indexed intrinsics: - sqdmlalb, sqdmlalt, sqdmlslb, sqdmlslt Add suffix _lanes and switch immediate types to i32 for all mla indexed intrinsics to align with ACLE builtin definitions. Reviewers: efriedma, sdesmalen, cameron.mcinally, c-rhodes, rengolin, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D73633	2020-01-30 13:32:11 -05:00
Nikita Popov	70d345e687	[AArch64][ARM] Always expand ordered vector reductions (PR44600) fadd/fmul reductions without reassoc are lowered to VECREDUCE_STRICT_FADD/FMUL nodes, which don't have legalization support. Until that is in place, expand these intrinsics on ARM and AArch64. Other targets always expand the vector reduction intrinsics. Additionally expand fmax/fmin reductions without nonan flag on AArch64, as the backend asserts that the flag is present when lowering VECREDUCE_FMIN/FMAX. This fixes https://bugs.llvm.org/show_bug.cgi?id=44600. Differential Revision: https://reviews.llvm.org/D73135	2020-01-30 18:40:24 +01:00
Yonghong Song	795bbb3662	[BPF] fix a bug in BPFMISimplifyPatchable pass with -O0 The recommended optimization level for BPF programs is O2 since (1). BPF is running inside the kernel and linux kernel won't work at -O0 level, and (2). Verifier is not able to handle O0 code properly, e.g., potential large stack size and a lot of spills. But we should keep -O0 at least compiling. This patch fixed a bug in BPFMISimplifyPatchable phase where with -O0, a segmentation fault will happen for a simple program like: int test(int a, int b) { return a + b; } A test case is added to capture such a case. Differential Revision: https://reviews.llvm.org/D73681	2020-01-30 08:28:39 -08:00
jasonliu	3bbe7a681e	[XCOFF][AIX] Support basic relocation type on AIX Summary: This patch intends to support three most common relocation type on AIX: R_POS, R_TOC, R_RBR. These three relocation type will be needed for object file generation on AIX for small code model. We will have follow up patches to bring relocation support for large code model on AIX. Reviewers: hubert.reinterpretcast, daltenty, DiggerLin Differential Revision: https://reviews.llvm.org/D72027	2020-01-30 15:59:09 +00:00
Stefan Pintilie	9de1241bb2	[PowerPC][Future] Branch Distance Estimation For Prefixed Instructions By adding the prefixed instructions the branch distances are no longer computed correctly. Since prefixed instructions cannot cross a 64 byte boundary we have to assume that a prefixed instruction may have a nop prepended to it. This patch tries to take that nop into consideration when computing the size of basic blocks. Differential Revision: https://reviews.llvm.org/D72572	2020-01-30 08:54:33 -06:00
Hans Wennborg	6be9acdfa8	Drop arm triple from test/CodeGen/AArch64/global-merge-hidden-minsize.ll Because it's in the AArch64/ directory, it runs in cases where the arm target may not be available, see comment on D73235.	2020-01-30 15:02:38 +01:00
John Brawn	0bb9a27c98	[FPEnv][AArch64] Add lowering and instruction selection for strict conversions Strict fp-to-int and int-to-fp conversions can be handled in the same way that the non-strict versions are (by using the appropriate instruction or converting to a function call when we have no instruction). Differential Revision: https://reviews.llvm.org/D73625	2020-01-30 13:50:06 +00:00
Matt Arsenault	ea956685a1	GlobalISel: Implement s32->s64 G_FPTOSI lowering Port directly from DAG version. The lowering for G_FPTOUI used to fail on AMDGPU because it uses G_FPTOSI.	2020-01-30 08:47:07 -05:00
Matt Arsenault	b21571f4d5	AMDGPU/GlobalISel: Handle s64->s64 G_FPTOSI/G_FPTOUI	2020-01-30 08:46:37 -05:00
Matt Arsenault	8184176efd	AMDGPU/GlobalISel: Custom lower G_LOG/G_LOG10 I'm pretty sure this is wrong and we should expand these in a correct way, but this matches the existing behavior.	2020-01-30 08:38:50 -05:00
Matt Arsenault	872e899b75	AMDGPU/GlobalISel: Legalize unpacked d16 image operations On targets that don't have the normal packed f16 layout, handle these during legalization. Directly modify the register types. We can infer this was a d16 load based on the mem operand size during selection. A16 operands should possibly be handled here as well, but don't worry about that yet.	2020-01-30 08:36:11 -05:00
Matt Arsenault	d21182d692	AMDGPU/GlobalISel: Only map VOP operands to VGPRs This trivially avoids violating the constant bus restriction. Previously this was allowing one SGPR in the first source operand, which technically also avoided violating this for most operations (but not for special cases reading vcc). We do need to write some new, smarter operand folds to pick the optimal SGPR to use in some kind of post-isel fold, but that's purely an optimization. I was originally thinking we would pick which operands should be SGPRs in RegBankSelect, but I think this isn't really manageable. There would be additional complexity to handle every G_* instruction, and then any nontrivial instruction patterns would need to know when to avoid violating it, which is likely to be very error prone. I think having all inputs being canonically copies to VGPRs will simplify the operand folding logic. The current folding we do is backwards, and only considers one operand at a time, relative to operands it already has. It therefore poorly handles the case where there is already a constant bus operand user. If all operands are copies, it's somewhat simpler to consider all input operands at once to choose the optimal constant bus user. Since the failure mode for constant bus violations is now a verifier error and not an selection failure, this moves towards a place where we can turn on the fallback mode. The SGPR copy folding optimizations can be left for later.	2020-01-30 08:32:35 -05:00
Matt Arsenault	b4a0766c8d	AMDGPU/GlobalISel: Select llvm.amdgcn.buffer.atomic.cmpswap	2020-01-30 08:22:43 -05:00
John Brawn	258d8dd76a	[FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND This gets selected to the appropriate fcvt instruction. Handling from there on isn't fully correct yet, as we need to model fcvt reading and writing to fpsr and fpcr. Differential Revision: https://reviews.llvm.org/D73201	2020-01-30 12:51:25 +00:00
Sam Parker	06e12893ff	[ARM][LowOverheadLoops] Skip debug values While iterating through the loop, don't inspect any dbg values. Differential Revision: https://reviews.llvm.org/D73688	2020-01-30 11:51:58 +00:00
John Brawn	2224407ef5	Add lowering of STRICT_FSETCC and STRICT_FSETCCS These become STRICT_FCMP and STRICT_FCMPE, which then get selected to the corresponding FCMP and FCMPE instructions, though the handling from there on isn't fully correct as we don't model reads and writes to FPCR and FPSR. Differential Revision: https://reviews.llvm.org/D73368	2020-01-30 10:40:55 +00:00
Connor Abbott	ce06d50756	AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns Summary: The code was assuming in a few places that if there was only one exit from the function that it was a normal return, which is invalid. It could be an infinite loop, in which case we still need to insert the usual fake edge so that the null export happens. This fixes shaders that end with an infinite loop that discards. Reviewers: arsenm, nhaehnle, critson Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71192	2020-01-30 10:55:02 +01:00
Sam Parker	6726d67bfd	[ARM][LowOverheadLoops] Check scalar predicates When trying to remove the loop iteration count, check that the instruction will always execute. Differential Revision: https://reviews.llvm.org/D73682	2020-01-30 09:13:04 +00:00
Amara Emerson	610f1d22f1	[AArch64][GlobalISel] During ISel try to convert G_PTR_ADD to G_ADD. This lowering tries to look for G_PTR_ADD instructions and then converts them to a standard G_ADD with a COPY on the source, and G_INTTOPTR on the result. This is ok for address space 0 on AArch64 as p0 can be treated as s64. The motivation behind this is to expose the add semantics to the imported tablegen patterns. We shouldn't need to check for uses being loads/stores, because the selector works bottom up, uses before defs. By the time we end up trying to select a G_PTR_ADD, we should have already attempted to fold this into addressing modes and were therefore unsuccessful. This gives some performance and code size improvements across the board. Differential Revision: https://reviews.llvm.org/D73673	2020-01-29 23:04:52 -08:00
Matt Arsenault	7f3280ecdd	AMDGPU/GlobalISel: Select permlane16/permlanex16	2020-01-29 17:55:31 -05:00
Amara Emerson	c12f046eb9	[GlobalISel] Add new combine to convert scalar G_MUL to G_SHL. For pow2 constants we should use G_SHL for pattern matching (and perf) purposes later. Vector support not yet implemented. Differential Revision: https://reviews.llvm.org/D73659	2020-01-29 13:39:00 -08:00
Jessica Paquette	050cd443ca	[AArch64][GlobalISel] Fix TBNZ/TBZ opcode selection When the bit is <= 32, we have to use the W register variant for TB(N)Z. This is because of the way the instruction is encoded. Differential Revision: https://reviews.llvm.org/D73660	2020-01-29 13:11:18 -08:00
Matt Arsenault	d3cea95475	AMDGPU/GlobalISel: Fix tests in release build Irritatingly the failure output is different in release vs. debug because of the legality check is removed without asserts, so a register ends up constrained only in release builds.	2020-01-29 12:27:16 -08:00
Amara Emerson	0da937bb5c	[GlobalISel][IRTranslator] Follow convention and put constant offset of getelementptr arithmetic on RHS. We were needlessly putting known constant values on the LHS of a G_MUL, which is suboptimal. Differential Revision: https://reviews.llvm.org/D73650	2020-01-29 11:37:19 -08:00
Fangrui Song	8903e61b66	[AsmPrinter][ELF] Define local aliases (.Lfoo$local) for GlobalObjects For `MC_GlobalAddress` operands referencing certain GlobalObjects, we can lower them to STB_LOCAL aliases to avoid costs brought by assembler/linker's conservative decisions about symbol interposition: * An assembler conservatively assumes a global default visibility symbol interposable (ELF semantics). So relocations in object files are needed even if the code generator assumed the definition exact and non-interposable. * The relocations can cause the creation of PLT entries on some targets for -shared links. A linker conservatively assumes a global default visibility symbol interposable (if not otherwise constrained by -Bsymbolic/--dynamic-list/VER_NDX_LOCAL/etc). "certain" refers to GlobalObjects in the intersection of `hasExactDefinition() and !isInterposable()`: `external`, `appending`, `internal`, `private`. Local linkages (`internal` and `private`) cannot be interposed. `appending` is for very few objects LLVM interpret specially. So the set just includes `external`. This patch emits STB_LOCAL aliases (.Lfoo$local) for such GlobalObjects, so that targets can lower MC_GlobalAddress operands to STB_LOCAL aliases if applicable. We may extend the scope and include GlobalAlias in the future. LLVM's existing -fno-semantic-interposition behaviors give us license to do such optimizations: * Various optimizations (ipconstprop, inliner, sccp, sroa, etc) treat normal ExternalLinkage GlobalObjects as non-interposable. * Before D72197, MC resolved a PC-relative VK_None fixup to a non-local symbol at assembly time (no outstanding relocation), if the target is defined in the same section. Put it simply, even if IR optimizations failed to optimize and allowed interposition for the function call in `void foo() {} void bar() { foo(); }`, the assembler would disallow it. This patch sets up AsmPrinter infrastructure to make -fno-semantic-interposition more so. With and without the patch, the object file output should be identical: `.Lfoo$local` does not take a symbol table entry. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D73228	2020-01-29 10:58:43 -08:00
Austin Kerbow	2605adb69c	[AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73585	2020-01-29 10:42:12 -08:00
Craig Topper	90c31b0f42	[X86] Custom lower ISD::FROUND with SSE4.1 to avoid a libcall. ISD::FROUND is defined to round to nearest with ties rounding away from 0. This mode isn't supported in hardware on X86. But as long as we aren't compiling with trapping math, we can emulate this with floor(X + copysign(nextafter(0.5, 0.0), X)). We have to use nextafter to avoid some corner cases that adding 0.5 would have. For example, if X is nextafter(0.5, 0.0) it should round to 0.0, but adding 0.5 would need one extra bit of mantissa than can be stored so it rounds to 1.0. Adding nextafter(0.5, 0.0) instead will just increase the exponent by 1 and leave the mantissa as all 1s. This would be nextafter(1.0, 0.0) which will floor to 0.0. Techically this requires -fno-trapping-math which isn't our default. But if we care about exceptions we should be using constrained intrinsics. Constrained intrinsics would use STRICT_FROUND which won't go through this code. Fixes PR42195. Differential Revision: https://reviews.llvm.org/D73607	2020-01-29 09:10:02 -08:00
Jay Foad	d07a789579	[AMDGPU] Cluster FLAT instructions with both vaddr and saddr Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73634	2020-01-29 17:01:35 +00:00
Craig Topper	e5edd641fd	[X86] Use a shorter sequence to implement FLT_ROUNDS This code needs to map from the FPCW 2-bit encoding for rounding mode to the 2-bit encoding defined for FLT_ROUNDS. The previous implementation did some clever swapping of bits and adding 1 modulo 4 to do the mapping. This patch instead uses an 8-bit immediate as a lookup table of four 2-bit values. Then we use the 2-bit FPCW encoding to index the lookup table by using a right shift and an AND. This requires extracting the 2-bit value from FPCW and multipying it by 2 to make it usable as a shift amount. But still results in less code. Differential Revision: https://reviews.llvm.org/D73599	2020-01-29 08:56:33 -08:00
Matt Arsenault	62129878a6	AMDGPU/GlobalISel: Fix tablegen selection for scalar bin ops Fixes selection for scalar G_SMULH/G_UMULH. Also switches to using tablegen selected add/sub, which switch to the signed version of the opcode. This matches the current DAG behavior. We can't drop the manual selection for add/sub yet, because it's still both for VALU add/sub and for G_PTR_ADD.	2020-01-29 08:55:54 -08:00
Kazushi (Jam) Marukawa	fef80a2946	[VE] (conditional) branch modification & isel patterns Summary: InstInfo for branch modification, (conditional) branch isel patterns and tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73632	2020-01-29 17:40:57 +01:00
Matt Arsenault	b63629a58d	GlobalISel: Fix mask computation in lowerInsert This is supposed to be the high bit index, not the width. Use the wrapping form of getBitsSet and avoid the bitflip.	2020-01-29 08:25:36 -08:00
Jay Foad	0d7bd34312	[MachineScheduler] Ignore artificial edges when forming store chains Summary: BaseMemOpClusterMutation::apply forms store chains by looking for control (i.e. non-data) dependencies from one mem op to another. In the test case, clusterNeighboringMemOps successfully clusters the loads, and then adds artificial edges to the loads' successors as described in the comment: // Copy successor edges from SUa to SUb. Interleaving computation // dependent on SUa can prevent load combining due to register reuse. The effect of this is that data dependencies from one load to a store are copied as artificial dependencies from a different load to the same store. Then when BaseMemOpClusterMutation::apply looks at the stores, it finds that some of them have a control dependency on a previous load, which breaks the chains and means that the stores are not all considered part of the same chain and won't all be clustered. The fix is to only consider non-artificial control dependencies when forming chains. Subscribers: MatzeB, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71717	2020-01-29 16:23:01 +00:00
Matt Arsenault	96352e0a1b	AMDGPU/GlobalISel: Handle LDS with relocations case	2020-01-29 08:18:55 -08:00
Connor Abbott	87d98c1495	AMDGPU: Fix handling of infinite loops in fragment shaders Summary: Due to the fact that kill is just a normal intrinsic, even though it's supposed to terminate the thread, we can end up with provably infinite loops that are actually supposed to end successfully. The AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because there's no obvious place to make the loop branch to, it just makes it return immediately, which skips the exports that are supposed to happen at the end and hangs the GPU if all the threads end up being killed. While it would be nice if the fact that kill terminates the thread were modeled in the IR, I think that the structurizer as-is would make a mess if we did that when the kill is inside control flow. For now, we just add a null export at the end to make sure that it always exports something, which fixes the immediate problem without penalizing the more common case. This means that we sometimes do two "done" exports when only some of the threads enter the discard loop, but from tests the hardware seems ok with that. This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70781	2020-01-29 17:13:25 +01:00

... 2 3 4 5 6 ...

32711 Commits